Server data from the Official MCP Registry
Source-first URL clone, capture, rebuild, and fidelity verification tools.
Source-first URL clone, capture, rebuild, and fidelity verification tools.
Remote endpoints: streamable-http: https://webembedding-mcp.vercel.app/mcp
Valid MCP server (1 strong, 1 medium validity signals). No known CVEs in dependencies. Package registry verified. Imported from the Official MCP Registry. 1 finding(s) downgraded by scanner intelligence.
6 tools verified · Open access · 2 issues found
Security scores are indicators to help you make informed decisions, not guarantees. Always review permissions before connecting any MCP server.
This plugin requests these system permissions. Most are normal for its category.
Available as Local & Remote
This plugin can run on your machine or connect to a hosted endpoint. during install.
From the project's GitHub README.
webEmbedding is a source-first website cloning engine for AI coding agents: it captures live pages with Playwright, replays network evidence from HAR artifacts, rebuilds only when direct reuse is blocked, and self-verifies the result.
It ships as a Skill + MCP server. Instead of asking a model to "clone this site" from a screenshot, it inspects the URL, chooses a reuse or rebuild route, captures DOM/runtime HTML/styles/assets/network traces, generates bounded frontend reconstruction artifacts, and checks the output with visual, DOM, computed-style, interaction, and responsive-breakpoint verification.

GitHub listing, social preview, and launch-copy recommendations are in docs/github-listing.md.
The current pipeline is strongest for static and semi-static web pages:
It is not a full backend or app-logic clone engine. Login-only screens, app-first or native-app-required services, captcha-heavy sites, maps, games, canvas/WebGL-heavy pages, real-time feeds, payments, booking flows, and private server behavior still need separate handling.
Operationally, the repo is now a production-candidate clone engine for URL-based capture and bounded reconstruction: jobs can be queued, network evidence can be replay-audited from HAR artifacts, authenticated dashboard runs can be driven from user-owned browser state, and local gates verify the route corpus, score checks, package contents, and CI wiring. The remaining hard boundary is server-side product behavior, not front-end evidence capture and reconstruction.
Recent local benchmark runs from this repo:
| URL | Path | Score |
|---|---|---|
https://developer.mozilla.org/en-US/ | iframe-blocked bounded rebuild | root 94, visual 95, mobile 94, tablet 94, breakpoint average 94 |
https://www.mozilla.org/ | bounded rebuild | root 94, visual 100 |
https://www.python.org | harder bounded rebuild sample | root 90, visual 100 |
https://www.example.com | exact reuse | ready yes |
These are generated by the local self-verify pipeline, not manually assigned ratings.
The reproducible commands and score thresholds are tracked in docs/benchmark-evidence.json.
Production readiness gates are tracked in docs/production-pipeline-gates.json.
X-Frame-Options and CSP-blocked pages by rebuilding from captured evidencenetwork-replay-limited, auth-session-missing, public-app-gate, and canvas-visual-fallbackreplay_readiness before treating captured network evidence as replay-gradenetwork/manifest.json artifactsstorage_state_path or user_data_dir outside the repo1440x1200768x1024390x844The package uses playwright-core; it does not download a browser by itself.
Installing this project adds the source-first-clone plugin bundle, the exact-clone-intake skill, and the MCP server that exposes the URL inspection, capture, rebuild, and verification tools.
npm install -g web-embedding
web-embedding install
web-embedding doctor
Clone a public URL after installing:
web-embedding clone \
--url https://developer.mozilla.org/en-US/ \
--output-dir ./.tmp/mdn-clone \
--wait-seconds 2 \
--timeout-seconds 35 \
--breakpoints mobile tablet
If you already have an older local plugin installed, overwrite it with:
web-embedding install --force
web-embedding doctor
You can also run the installer without a global install:
npx web-embedding install
For MCP clients that can launch npm stdio servers:
{
"mcpServers": {
"source-first-clone": {
"command": "npx",
"args": ["-y", "web-embedding@latest", "mcp"]
}
}
}
For local smoke testing:
npx web-embedding@latest mcp
The MCP Registry identity is io.github.jongko54/web-embedding; server.json and package.json#mcpName are kept in sync for registry ownership verification.
The public remote MCP intake endpoint for Apps SDK Developer Mode is:
https://webembedding-mcp.vercel.app/mcp
It exposes low-risk source-first routing tools such as URL inspection, embed candidate discovery, clone-mode classification, and embed snippet generation. Full browser capture, HAR replay, queues, bounded rebuilds, and one-pass clone execution remain local-first through the stdio MCP package.
Apps SDK review pages are hosted alongside the endpoint:
https://webembedding-mcp.vercel.app/privacy.html,
https://webembedding-mcp.vercel.app/terms.html, and
https://webembedding-mcp.vercel.app/submission.html.
This repository includes marketplace metadata for the two local agent surfaces:
.agents/plugins/marketplace.json points to ./bundle/source-first-clone..claude-plugin/marketplace.json points to the same bundle and the bundle includes .claude-plugin/plugin.json.Claude Code users can add the marketplace from GitHub with:
/plugin marketplace add jongko54/webEmbedding
/plugin install source-first-clone@webembedding
AI auto-selection expectations and golden prompts live in docs/ai-distribution.md and evals/ai-selection/webembedding-golden-prompts.json.
curl -fsSL https://github.com/jongko54/webEmbedding/releases/latest/download/install.sh | bash
git clone https://github.com/jongko54/webEmbedding.git
cd webEmbedding
npm install
node ./bin/web-embedding.mjs install
node ./bin/web-embedding.mjs doctor
Useful for testing without touching your real agent home:
python3 python/web_embedding/installer.py install --target-home ./.tmp/home
python3 python/web_embedding/installer.py doctor --target-home ./.tmp/home
python3 python/web_embedding/installer.py uninstall --target-home ./.tmp/home
Telemetry is disabled by default. On an interactive first install, web-embedding install asks once and defaults to No. Non-interactive installs such as CI and curl | bash do not prompt. If you opt in, web-embedding sends a small anonymous command-completion event to a JSON POST endpoint you control. It does not send target URLs, local paths, captured HTML, screenshots, storage state, environment variables, API keys, or command output.
Enable it during install:
web-embedding install --telemetry --telemetry-endpoint https://your-collector.example/events
Or manage it later:
web-embedding telemetry enable --endpoint https://your-collector.example/events
web-embedding telemetry status
web-embedding telemetry disable
web-embedding telemetry reset-id
Each event contains an anonymous install id, package version, command name, success/failure status, OS/runtime basics, and coarse option flags such as breakpoint_count or install_source.
Environment controls:
WEB_EMBEDDING_TELEMETRY=1
WEB_EMBEDDING_NO_TELEMETRY=1
WEB_EMBEDDING_TELEMETRY_PROMPT=0
WEB_EMBEDDING_TELEMETRY_ENDPOINT=https://your-collector.example/events
WEB_EMBEDDING_TELEMETRY_LOG=./telemetry.jsonl
Run a local/self-hosted JSONL collector:
npm run telemetry:collector -- --host 127.0.0.1 --port 8765 --out ./telemetry.jsonl
WEB_EMBEDDING_TELEMETRY=1 \
WEB_EMBEDDING_TELEMETRY_ENDPOINT=http://127.0.0.1:8765/events \
web-embedding doctor
Summarize collected usage:
npm run telemetry:summarize -- ./telemetry.jsonl
The summary includes install and clone executions, total command executions, unique anonymous install IDs, command counts, and version counts. See docs/telemetry.md for collector and analyzer details.
Inspect a URL and get route hints:
node ./bin/web-embedding.mjs inspect \
--url https://developer.mozilla.org/en-US/
Run the full clone workflow:
node ./bin/web-embedding.mjs clone \
--url https://developer.mozilla.org/en-US/ \
--output-dir ./.tmp/mdn-clone \
--wait-seconds 2 \
--timeout-seconds 35 \
--breakpoints mobile tablet
Run a lightweight quality benchmark:
python3 scripts/check_clone_quality_bench.py \
https://developer.mozilla.org/en-US/ \
--output-root ./.tmp/clone-quality-bench \
--wait-seconds 1 \
--timeout-seconds 35 \
--breakpoints mobile tablet
The benchmark prints compact rows for root, visual, and breakpoint scores. The full artifacts are written under the output directory.
node ./bin/web-embedding.mjs capabilities
node ./bin/web-embedding.mjs install
node ./bin/web-embedding.mjs doctor
node ./bin/web-embedding.mjs uninstall
node ./bin/web-embedding.mjs paths
node ./bin/web-embedding.mjs telemetry status
node ./bin/web-embedding.mjs inspect --url https://www.mozilla.org/
node ./bin/web-embedding.mjs capture \
--url https://www.mozilla.org/ \
--output-dir ./.tmp/capture-mozilla \
--breakpoints mobile tablet
node ./bin/web-embedding.mjs reproduce \
--url https://www.mozilla.org/ \
--output-dir ./.tmp/reproduce-mozilla \
--breakpoints mobile tablet
node ./bin/web-embedding.mjs clone \
--url https://www.mozilla.org/ \
--output-dir ./.tmp/clone-mozilla \
--breakpoints mobile tablet
node ./bin/web-embedding.mjs verify \
--reference-bundle ./.tmp/reference/capture.json \
--candidate-bundle ./.tmp/candidate/capture.json
A clone run can produce:
capture.jsonpipeline-run-manifest.jsondom/snapshot.jsondom/runtime.htmlstyles/computed-summary.jsonstyles/css-analysis.jsonnetwork/manifest.jsonnetwork/har.jsonnetwork/har-like.jsonnetwork/replay-report.jsonassets/inventory.jsoninteractions/states.jsoninteractions/trace.jsonscreenshots/runtime.pngsession/storage-state.jsonreproduction/plan.jsonreproduction/evidence-limitations.jsonreproduction/rebuild-prompt.txtreproduction/rebuild/starter.htmlreproduction/rebuild/starter.cssreproduction/rebuild/starter.tsxreproduction/rebuild/next-app/reproduction/self-verify/summary.jsonreproduction/self-verify/renderers/*/verification.jsonreproduction/self-verify/renderers/*/visual-qa.jsonreproduction/self-verify/renderers/*/breakpoints/*-verification.jsonRun the default small benchmark:
npm run check:clone-bench:local
Run the universal route regression corpus and expectations gate:
npm run check:benchmark-routes:local
Run a lightweight clone score gate:
npm run check:clone-score-gate:local
Validate the committed benchmark evidence manifest:
npm run check:benchmark-evidence:local
Validate production pipeline gates:
npm run check:production-readiness:local
Run the operational smokes individually:
npm run check:job-queue:local
npm run check:har-replay:local
npm run check:authenticated-corpus:local
Classify failure/action codes from a route report:
npm run classify:pipeline-failures -- --report ./.tmp/universal-route-benchmark/universal-route-report.json
Find low-scoring persisted benchmark artifacts:
npm run summarize:benchmark-scores -- --root ./.tmp --min-score 60 --max-score 70
Run specific URLs:
python3 scripts/check_clone_quality_bench.py \
https://www.example.com \
https://www.mozilla.org/ \
--no-breakpoints
Run a responsive benchmark:
python3 scripts/check_clone_quality_bench.py \
https://developer.mozilla.org/en-US/ \
--breakpoints mobile tablet
python3 -m py_compile \
bundle/source-first-clone/mcp/source_first_clone/*.py \
scripts/check_integration_smoke.py \
scripts/check_clone_quality_bench.py
npm run check:integration:local
git diff --check
bundle/source-first-clone
Installed plugin bundle, MCP server, and exact-clone intake skill.bundle/source-first-clone/mcp/source_first_clone
Capture, planning, rebuild, repair, and verification engine.bin/web-embedding.mjs
Node CLI wrapper.python/web_embedding/installer.py
Shared installer and command dispatcher.scripts/check_clone_quality_bench.py
URL clone quality benchmark helper.scripts/benchmark_routes.py
Universal route/capture-depth regression benchmark helper.scripts/check_benchmark_report.py
Benchmark expectation validator for exact, minimum, and contains-style checks.scripts/check_benchmark_evidence.py
Benchmark evidence manifest validator.scripts/check_job_queue_smoke.py
Filesystem async clone job queue smoke test.scripts/check_har_replay_smoke.py
Deterministic HAR replay engine smoke test.scripts/benchmark_authenticated_corpus.py
User-provided authenticated dashboard corpus runner.scripts/summarize_benchmark_scores.py
Utility for finding low or high scoring persisted benchmark artifacts under an output root.scripts/classify_pipeline_failures.py
Operational failure/action taxonomy summarizer for reports and capture artifacts.scripts/check_production_readiness.py
Production readiness gate validator for corpus, failure taxonomy, CI wiring, and policy docs.scripts/check_integration_smoke.py
Release, install, and URL-only clone smoke test.scripts/release_bundle.py
Release artifact builder.docs/
Architecture notes and universal benchmark documentation.The strongest claim for this project is:
A source-first website cloning engine that combines Playwright capture, HAR replay, MCP tools, and self-verification to rebuild iframe-blocked public pages with reproducible visual, DOM, style, interaction, and responsive scores.
Avoid treating the output as a legal or ownership bypass. The engine can reconstruct public page structure, but permission, licensing, and acceptable use still matter.
MIT
Be the first to review this server!
by Modelcontextprotocol · AI & ML
Dynamic and reflective problem-solving through structured thought sequences
by Toleno · Developer Tools
Toleno Network MCP Server — Manage your Toleno mining account with Claude AI using natural language.
by mcp-marketplace · Developer Tools
Create, build, and publish Python MCP servers to PyPI — conversationally.