Server data from the Official MCP Registry
MCP server fetches URL through a real Chromium browser, returns main content as clean Markdown
MCP server fetches URL through a real Chromium browser, returns main content as clean Markdown
This MCP server is well-architected with solid security practices. Authentication is not required by design (the server fetches public URLs), and permissions are appropriate for a web-fetching tool. Code quality is high with proper error handling, input validation, and no malicious patterns. Minor findings around error message specificity and logging are low-severity quality issues that do not impact security. Supply chain analysis found 5 known vulnerabilities in dependencies (1 critical, 3 high severity). Package verification found 1 issue.
7 files analyzed · 10 issues found
Security scores are indicators to help you make informed decisions, not guarantees. Always review permissions before connecting any MCP server.
This plugin requests these system permissions. Most are normal for its category.
Add this to your MCP configuration file:
{
"mcpServers": {
"io-github-sidney-web-to-markdown-mcp": {
"args": [
"web-to-markdown-mcp"
],
"command": "uvx"
}
}
}From the project's GitHub README.
An MCP server that fetches a URL through a real Chromium browser and returns the main content as clean Markdown.
Most MCP web-fetch tools either:
This server uses a two-tier approach:
Native markdown fast-path. Every request first tries a plain HTTP GET with an Accept: text/markdown header. Servers that support content negotiation — such as Cloudflare-hosted sites with Markdown for Agents enabled — respond with Content-Type: text/markdown, and the body is returned immediately with no browser overhead. Servers that don't recognise the header respond normally and fall through to tier 2.
Browser fallback. When the fast-path doesn't yield markdown, patchright (a Playwright fork with anti-detection patches) drives real Chromium, and trafilatura strips navigation, sidebars, ads, and footers down to the article body as clean Markdown.
After navigation, the server polls the DOM and runs trafilatura, returning as soon as two consecutive polls produce the same extraction. This means it returns within a few hundred milliseconds for typical pages — rather than waiting for analytics, ads, and other late-loading resources to finish — and gives slow SPAs and bot-challenge clearance time to settle without timing out prematurely.
Headless mode is the default and works on standard pages. For sites with active bot detection (Cloudflare challenges and similar), pass headless=False to use a visible Chromium window — slower and visually intrusive, but clears most challenges that block headless mode.
For a typical article, expect roughly 80% fewer tokens than the raw HTML and roughly 90% fewer than a full accessibility-tree snapshot.
Requires Python 3.10+ and a one-time Chromium download (~300 MB).
# Run directly with uv (no install step)
uvx web-to-markdown-mcp
# Or install with pip
pip install web-to-markdown-mcp
# One-time browser download
patchright install chromium
Edit your config file:
~/Library/Application Support/Claude/claude_desktop_config.json%APPDATA%\Claude\claude_desktop_config.json{
"mcpServers": {
"web-to-markdown": {
"command": "uvx",
"args": ["web-to-markdown-mcp"]
}
}
}
Restart Claude Desktop.
Edit ~/.lmstudio/mcp.json (Developer tab → Edit mcp.json) — same JSON block as above. Then enable Allow calling servers from mcp.json in the Developer tab's Server Settings. The server appears in the Integrations tab of any new chat.
Same JSON block, in each client's MCP config location.
claude mcp add web-to-markdown -- uvx web-to-markdown-mcp
The server exposes a single tool:
fetch_url_as_markdown| Parameter | Type | Default | Description |
|---|---|---|---|
url | string | required | The URL to fetch |
wait_until | string | "domcontentloaded" | When navigation completes: "load", "domcontentloaded", "networkidle", "commit" |
timeout_ms | int | 60000 | Navigation-step timeout in milliseconds |
headless | bool | true | false uses a visible browser window — slower but clears more bot detection |
poll_budget_ms | int | 5000 | Max time after navigation to wait for content stabilization |
poll_interval_ms | int | 250 | How often to re-attempt extraction during polling |
Returns Markdown as a string, or a string beginning with "ERROR:" on expected failures (timeout, no extractable content, navigation error).
Example call (from any MCP client's tool-use UI):
fetch_url_as_markdown(url="https://example.com/long-article")
wait_until choice:
"domcontentloaded" (default) — returns when the DOM is built; content-stabilization polling handles the rest"load" — waits for all subresources (images, scripts, stylesheets); rarely needed since polling runs after this"networkidle" — waits for network to quiet; sometimes hangs on pages with persistent background connections"commit" — returns as soon as the response starts; rarely usefulWhen to bump poll_budget_ms: the 5-second default is fine for typical pages but may return a partial extraction on slow SPAs that render content over many seconds, and may time out before a bot-detection challenge clears in headed mode. For headed-mode fetches of bot-protected sites, 10000-15000 is a reasonable budget.
headless=False to use a visible browser window, which clears most of these. The cost is a Chromium window flashing on screen for a couple of seconds per fetch.headless=False on bot-protected sites, the challenge can take 5-15 seconds to clear. Set poll_budget_ms to 10000-15000 for these cases — the 5000 default may return prematurely while the challenge is still resolving.headless=False fails on servers without a graphical environment (cloud VMs, containers, CI). Use a virtual display like Xvfb if you need headed mode in those environments.poll_budget_ms. The default returns the most-recent extraction at budget expiry, which on a still-rendering SPA may be partial.@playwright/mcp (Microsoft) — general-purpose interactive browser automation: navigate, click, fill forms, run JS, take accessibility snapshots. Use it when you need to interact with a page. Use this server when you need to read a page.mcp-server-fetch and similar HTTP-based servers — faster and lighter, but get blocked by Cloudflare and don't render JS. Try those first for compliant sites; reach for this one when they fail.vadimfedenko/visit-website-reworked and npacker/web-tools — same idea inside LM Studio's plugin system. This server runs in any MCP client.Accept: text/markdown fast path for Cloudflare's Markdown for Agentsselector parameter to scope extractionchannel="chrome" option to use installed Google Chrome instead of bundled Chromium (further stealth for the hardest sites)Issues and PRs welcome. For substantive changes, please open an issue first to discuss the approach.
MIT — see LICENSE.
Be the first to review this server!
by Modelcontextprotocol · Developer Tools
Read, search, and manipulate Git repositories programmatically
by Toleno · Developer Tools
Toleno Network MCP Server — Manage your Toleno mining account with Claude AI using natural language.
by mcp-marketplace · Developer Tools
Create, build, and publish Python MCP servers to PyPI — conversationally.
by Microsoft · Content & Media
Convert files (PDF, Word, Excel, images, audio) to Markdown for LLM consumption
by mcp-marketplace · Developer Tools
Scaffold, build, and publish TypeScript MCP servers to npm — conversationally
by mcp-marketplace · Finance
Free stock data and market news for any MCP-compatible AI assistant.