# TextWeb brings markdown rendering to LLM web agents

> Developer woheller69 has released TextWeb, an open-source markdown browser designed for LLM-based agents. Rather than capturing screenshots for vision models, TextWeb renders web pages as markdown with full JavaScript execution, annotates interactive elements, and exposes navigation controls through a CLI and an MCP server. The tool is confirmed compatible with the llama.cpp web UI.

- Canonical URL: https://agentry.press/news/textweb-brings-markdown-rendering-to-llm-web-agents/
- Type: News
- Published: 2026-05-29
- By: agentry
- Tags: open-source, llm-agents, mcp-server, web-browsing, llama-cpp, agent-tooling

---

## What TextWeb Is

TextWeb is an open-source web rendering layer built specifically for LLM-based agents. Released by developer woheller69 and published on GitHub, the tool converts web pages into markdown that language models can process directly, rather than relying on visual representations of page content [1].

The project is derived from an earlier tool by chrisrobison, also named textweb, which rendered pages as a text grid rather than markdown. Woheller69's version replaces that text-grid approach with markdown output while retaining the core concept of a text-native browser interface for automated agents [1].

## The Problem It Addresses

The dominant pattern for giving LLM agents access to web content has been to capture screenshots and route them through vision-capable models. That pipeline carries a cost: vision model inference is computationally and financially heavier than text-only inference, and screenshots encode large amounts of visual noise that is irrelevant to most agent tasks.

TextWeb targets that cost directly. By rendering pages as markdown, the tool produces output that text-based language models can reason about without a vision component. Markdown preserves the semantic structure of a page, including headings, links, and lists, in a format that is compact relative to pixel-level image data [1].

## How the Rendering Pipeline Works

TextWeb executes JavaScript before converting page output to markdown. That execution step matters because a large share of modern web content is dynamically generated: without JavaScript execution, a renderer would capture only the initial HTML skeleton, missing content that loads or transforms after page initialization [1].

After JavaScript execution, the tool converts the resulting page state to markdown and annotates interactive elements. Input fields and buttons are flagged within the markdown output so that an agent can identify actionable targets without needing to interpret visual layout cues [1].

## Interaction Capabilities and Interfaces

TextWeb exposes a set of agent actions covering the core operations required for web navigation tasks. Supported actions include navigating to a URL, scrolling up and down within a page, entering text into input fields, and clicking buttons [1].

Two interfaces deliver these capabilities. The first is a command-line interface (CLI) suited to direct invocation and scripted workflows. The second is a Model Context Protocol (MCP) server, which allows MCP-compatible agent frameworks to call TextWeb's rendering and interaction functions as structured tools [1]. The MCP server interface positions TextWeb as a drop-in browsing component for agent pipelines that already use MCP for tool orchestration.

## Compatibility and Deployment Context

Woheller69 has confirmed that TextWeb works with the llama.cpp web UI, making it directly usable in locally hosted inference setups built around llama.cpp [1]. That compatibility is relevant for operators running self-hosted agent stacks who want to avoid cloud-based vision APIs entirely.

The source code is available at github.com/woheller69/textweb under an open-source license. The project is at an early public stage, with the initial announcement made through the LocalLLaMA community on Reddit [1].

## FAQ

**Q. Does TextWeb require a vision-capable model to function?**
No. The tool is explicitly designed to replace screenshot-based vision pipelines. It outputs markdown that text-only language models can process without any image understanding capability [1].

**Q. What JavaScript engine does TextWeb use for page execution?**
The source announcement confirms that full JavaScript execution occurs before markdown conversion, but does not specify the underlying engine or runtime [1].

**Q. Which agent frameworks can use the MCP server interface?**
The announcement describes an MCP server as one of the two delivery mechanisms, but does not enumerate specific compatible frameworks beyond the llama.cpp web UI [1].

**Q. How does TextWeb differ from the chrisrobison/textweb project it is based on?**
The original chrisrobison project renders pages as a text grid. Woheller69's TextWeb replaces that with markdown output and adds interactive element annotation along with the CLI and MCP server interfaces [1].

## Key takeaways

- TextWeb renders web pages as markdown with full JavaScript execution, targeting LLM agents that do not use vision models.
- Interactive elements such as input fields and buttons are annotated within the markdown output so agents can identify actionable targets.
- Supported agent actions include URL navigation, scrolling, text entry, and button clicks.
- The tool ships with both a CLI and an MCP server, supporting direct invocation and structured tool-calling workflows.
- Compatibility with the llama.cpp web UI is confirmed, and the project is available as open-source on GitHub [1].

## Sources

1. https://www.reddit.com/r/LocalLLaMA/comments/1t9tsro/markdown_browser_for_llms/
