Server data from the Official MCP Registry
Safe, self-hosted web grounding for AI agents and crawlers over a stealth browser
Safe, self-hosted web grounding for AI agents and crawlers over a stealth browser
Valid MCP server (1 strong, 1 medium validity signals). No known CVEs in dependencies. Package registry verified. Imported from the Official MCP Registry.
20 files analyzed · 1 issue found
Security scores are indicators to help you make informed decisions, not guarantees. Always review permissions before connecting any MCP server.
This plugin requests these system permissions. Most are normal for its category.
Set these up before or after installing:
Environment variable: CDP_URL
Environment variable: GROUNDHOG_BLOCK_PRIVATE_IPS
Environment variable: GROUNDHOG_MIN_DELAY_MS
Environment variable: GROUNDHOG_MAX_TOKENS
Environment variable: GROUNDHOG_USER_AGENT
Environment variable: PROXY
Environment variable: GROUNDHOG_AUTO_START_BROWSER
Environment variable: GROUNDHOG_COMPOSE_FILE
Environment variable: WINDOW_SIZE
Environment variable: XVFB_WHD
Add this to your MCP configuration file:
{
"mcpServers": {
"io-github-dmytrome-groundhog-mcp": {
"args": [
"groundhog-mcp"
],
"command": "uvx"
}
}
}From the project's GitHub README.
Safe, self-hosted web grounding for AI agents and crawlers. Groundhog is an MCP server that fetches live web pages through a real, stealth-patched Chrome (over CDP) and returns clean Markdown with provenance — without the SSRF holes of plain fetchers and without getting blocked like plain HTTP clients.
agent / crawler ──MCP──▶ Groundhog (read_url) ──CDP──▶ stealth Chrome ──▶ the web
Prerequisite: the stealth browser must be running. The MCP server is a thin client that drives Chrome over CDP, so start the browser first. If it isn't reachable,
read_urlreturns a plain-language message on how to start it, and thestatustool reports reachability. SetGROUNDHOG_AUTO_START_BROWSER=trueto have Groundhog rundocker compose up -dfor you (requires Docker).
docker compose up --build -d
curl -s http://localhost:9222/json/version # CDP is live
Claude Desktop / Cursor / Windsurf (claude_desktop_config.json or equivalent):
{
"mcpServers": {
"groundhog": {
"command": "uvx",
"args": ["groundhog-mcp"],
"env": { "CDP_URL": "http://127.0.0.1:9222" }
}
}
}
uvx fetches groundhog-mcp from PyPI on first run. To run from source instead:
cd mcp && uv sync && uv run groundhog-mcp.
read_url(url, format="markdown", max_tokens=None)Fetches a page and returns clean content plus provenance.
| Key | Meaning |
|---|---|
markdown | Extracted content (article-first, falls back to full text); format may be markdown or text |
title | Page title |
url | The URL you asked for |
final_url | The URL after redirects (re-checked against the SSRF guard) |
fetched_at | UTC ISO-8601 timestamp |
truncated | Whether the content was cut to fit the token budget |
status()Reports whether Groundhog can reach the stealth browser. Returns browser_reachable,
cdp_url, and a hint with remediation steps when it isn't reachable.
MCP server (mcp/):
| Env var | Default | Purpose |
|---|---|---|
CDP_URL | http://127.0.0.1:9222 | CDP endpoint of the stealth browser |
GROUNDHOG_BLOCK_PRIVATE_IPS | true | Enforce the SSRF guard (resolve + block private ranges) |
GROUNDHOG_MIN_DELAY_MS | 5000 | Minimum delay between requests to the same domain |
GROUNDHOG_MAX_TOKENS | 20000 | Token budget before truncation |
GROUNDHOG_USER_AGENT | Chrome UA | User-Agent for the browser context |
PROXY | (none) | Optional upstream proxy for the browser |
GROUNDHOG_AUTO_START_BROWSER | false | If true, run docker compose up -d when the browser isn't reachable (requires Docker) |
GROUNDHOG_COMPOSE_FILE | (none) | Compose file for auto-start (defaults to docker compose in the current directory) |
Browser container:
| Env var | Default | Purpose |
|---|---|---|
WINDOW_SIZE | 1920,1080 | Initial Chrome window size |
XVFB_WHD | 1920x1080x24 | Virtual display geometry |
169.254.169.254), reserved, multicast, unspecified,
CGNAT 100.64.0.0/10, and IPv4-mapped IPv6 — and re-checks the URL after redirects.
Only http/https, no credentials in URLs. Read-only, per-domain rate limiting.A minimal Docker container running headless Chrome with a remote CDP endpoint. Any CDP-speaking client (Puppeteer, Playwright, Selenium, chromedp, raw DevTools) can drive it — Groundhog is one such client.
--headless=new — modern headless mode (required to load extensions).--disable-blink-features=AutomationControlled — navigator.webdriver reads
false.stealth_ext/stealth.js) — injected at document_start,
restores navigator.deviceMemory and aligns Notification permission with the
Permissions API. Deliberately small: modern Chrome already clears most signals.You must set a User-Agent. The container does not rewrite the UA — out of the box
Chrome reports HeadlessChrome/<version>. Groundhog sets a realistic UA and viewport for
you; other clients must do the same (see examples/).
Measured against a freshly built container (Chrome 149) driven by a client that sets a realistic UA + viewport:
| Detector | Result |
|---|---|
| bot.sannysoft.com | 31 / 31 checks pass, 0 fail |
| areyouheadless | "You are not Chrome headless" |
The live suite in tests/ asserts these automatically.
| Client | Path |
|---|---|
| Puppeteer (Node) | examples/puppeteer |
| Playwright (Node) | examples/playwright-node |
| Playwright (Python) | examples/playwright-python |
| Selenium (Python) | examples/selenium-python |
| chromedp (Go) | examples/go-chromedp |
| Raw CDP (Python) | examples/python-raw-cdp |
See examples/OTHER_TOOLS.md for crawl4ai, Scrapy +
Playwright, go-rod, Crawlee, and nodriver pointers.
The CDP endpoint is unauthenticated — anyone who can reach the port has full control
of the browser. Bind it to localhost or a trusted private network; never expose it to the
public internet. --no-sandbox is used because Chrome's sandbox does not work in an
unprivileged container; keep the container isolated. To report a vulnerability, see
SECURITY.md.
Best-effort, not a guarantee. It defeats common open-source detectors and lets cheap proxies work on many mid-tier targets, but it does not beat sophisticated commercial anti-bot systems that gate on IP reputation, TLS/HTTP2 fingerprints, and behavioral analysis. Use it for legitimate, authorized automation and testing.
Be the first to review this server!
by Modelcontextprotocol · Developer Tools
Web content fetching and conversion for efficient LLM usage
by Modelcontextprotocol · Developer Tools
Read, search, and manipulate Git repositories programmatically
by Toleno · Developer Tools
Toleno Network MCP Server — Manage your Toleno mining account with Claude AI using natural language.