Is WebEmbedding free?

Yes, WebEmbedding is free to use.

How do I install WebEmbedding?

WebEmbedding supports both local and remote installation. For local use, install it via npm (npx) and add the configuration to your AI app. For remote access, add the server URL to your MCP configuration.

What AI apps work with WebEmbedding?

WebEmbedding uses the Model Context Protocol (MCP) and works with any MCP-compatible AI app, including Claude, ChatGPT / Codex, Gemini, Copilot, Cursor, and more.

Back to Browse

WebEmbedding MCP Server

by Jongko54

AI & MLLow Risk10.0MCP RegistryLocalRemote

Free

Server data from the Official MCP Registry

Source-first URL clone, capture, rebuild, and fidelity verification tools.

About

Source-first URL clone, capture, rebuild, and fidelity verification tools.

Remote endpoints: streamable-http: https://webembedding-mcp.vercel.app/mcp

Security Report

10.0

Low Risk10.0Low Risk

Valid MCP server (1 strong, 1 medium validity signals). No known CVEs in dependencies. Package registry verified. Imported from the Official MCP Registry. 1 finding(s) downgraded by scanner intelligence.

6 tools verified · Open access · 2 issues found

Security scores are indicators to help you make informed decisions, not guarantees. Always review permissions before connecting any MCP server.

Permissions Required

This plugin requests these system permissions. Most are normal for its category.

file_system

Check that this permission is expected for this type of plugin.

HTTP Network Access

Connects to external APIs or services over the internet.

How to Install & Connect

Available as Local & Remote

This plugin can run on your machine or connect to a hosted endpoint. during install.

Documentation

View on GitHub

From the project's GitHub README.

webEmbedding

webEmbedding is a source-first website cloning engine for AI coding agents: it captures live pages with Playwright, replays network evidence from HAR artifacts, rebuilds only when direct reuse is blocked, and self-verifies the result.

It ships as a Skill + MCP server. Instead of asking a model to "clone this site" from a screenshot, it inspects the URL, chooses a reuse or rebuild route, captures DOM/runtime HTML/styles/assets/network traces, generates bounded frontend reconstruction artifacts, and checks the output with visual, DOM, computed-style, interaction, and responsive-breakpoint verification.

webEmbedding Skill and MCP workflow

GitHub listing, social preview, and launch-copy recommendations are in docs/github-listing.md.

Current Status

The current pipeline is strongest for static and semi-static web pages:

company, brand, marketing, and documentation pages
public landing pages
iframe-blocked pages that need capture-based reconstruction
responsive page snapshots across desktop, tablet, and mobile

It is not a full backend or app-logic clone engine. Login-only screens, app-first or native-app-required services, captcha-heavy sites, maps, games, canvas/WebGL-heavy pages, real-time feeds, payments, booking flows, and private server behavior still need separate handling.

Operationally, the repo is now a production-candidate clone engine for URL-based capture and bounded reconstruction: jobs can be queued, network evidence can be replay-audited from HAR artifacts, authenticated dashboard runs can be driven from user-owned browser state, and local gates verify the route corpus, score checks, package contents, and CI wiring. The remaining hard boundary is server-side product behavior, not front-end evidence capture and reconstruction.

Measured Checkpoints

Recent local benchmark runs from this repo:

URL	Path	Score
`https://developer.mozilla.org/en-US/`	iframe-blocked bounded rebuild	root `94`, visual `95`, mobile `94`, tablet `94`, breakpoint average `94`
`https://www.mozilla.org/`	bounded rebuild	root `94`, visual `100`
`https://www.python.org`	harder bounded rebuild sample	root `90`, visual `100`
`https://www.example.com`	exact reuse	ready `yes`

These are generated by the local self-verify pipeline, not manually assigned ratings. The reproducible commands and score thresholds are tracked in docs/benchmark-evidence.json. Production readiness gates are tracked in docs/production-pipeline-gates.json.

Core Features

Source-first routing:
- direct iframe or embed reuse when it is safe and frameable
- original preview, export, remix, or source routes when available
- bounded rebuild only when exact reuse is unavailable
Live browser capture:
- DOM snapshot
- runtime HTML
- full-page screenshot
- computed style summaries
- CSS analysis
- asset inventory
- HAR-like network metadata
- interaction states and replay traces
- storage state export for session-aware flows
Blocked-site rebuild:
- handles X-Frame-Options and CSP-blocked pages by rebuilding from captured evidence
- generates reusable frontend reconstruction artifacts from captured page structure
- preserves custom tags, shadow-root host structure, and semantic document structure where captured
Evidence limitation reporting:
- separates directly captured artifacts from inferred or missing evidence in reproduction results and prompts
- marks app-gated, auth-gated, and native-app-led surfaces as bounded evidence, with recommendations for user screenshots or authenticated session capture
Operational failure classification:
- reports typed pipeline action codes such as network-replay-limited, auth-session-missing, public-app-gate, and canvas-visual-fallback
- exposes HAR/network replay_readiness before treating captured network evidence as replay-grade
Production pipeline helpers:
- filesystem-backed async clone job queue with durable JSON records, worker locks, retry scheduling, cancellation, and manifest annotation
- deterministic HAR replay engine for standard HAR, near-HAR, and captured network/manifest.json artifacts
- authenticated dashboard live corpus runner that accepts user-provided storage_state_path or user_data_dir outside the repo
Self-verification:
- screenshot similarity
- DOM snapshot similarity
- computed-style similarity
- hover/focus/click interaction state parity
- interaction trace parity
- desktop/mobile/tablet breakpoint reports
Responsive benchmark support:
- primary desktop viewport: 1440x1200
- tablet profile: 768x1024
- mobile profile: 390x844
Repair loop:
- bounded self-repair can run when the first scaffold misses the readiness threshold

Install

Requirements

Node.js 18 or newer
Python 3.9 or newer
Chrome or Chromium available locally for Playwright runtime capture

The package uses playwright-core; it does not download a browser by itself.

Installing this project adds the source-first-clone plugin bundle, the exact-clone-intake skill, and the MCP server that exposes the URL inspection, capture, rebuild, and verification tools.

Install From npm

npm install -g web-embedding
web-embedding install
web-embedding doctor

Clone a public URL after installing:

web-embedding clone \
  --url https://developer.mozilla.org/en-US/ \
  --output-dir ./.tmp/mdn-clone \
  --wait-seconds 2 \
  --timeout-seconds 35 \
  --breakpoints mobile tablet

If you already have an older local plugin installed, overwrite it with:

web-embedding install --force
web-embedding doctor

You can also run the installer without a global install:

npx web-embedding install

Use As An MCP Server

For MCP clients that can launch npm stdio servers:

{
  "mcpServers": {
    "source-first-clone": {
      "command": "npx",
      "args": ["-y", "web-embedding@latest", "mcp"]
    }
  }
}

For local smoke testing:

npx web-embedding@latest mcp

The MCP Registry identity is io.github.jongko54/web-embedding; server.json and package.json#mcpName are kept in sync for registry ownership verification.

Hosted Apps SDK Intake Endpoint

The public remote MCP intake endpoint for Apps SDK Developer Mode is:

https://webembedding-mcp.vercel.app/mcp

It exposes low-risk source-first routing tools such as URL inspection, embed candidate discovery, clone-mode classification, and embed snippet generation. Full browser capture, HAR replay, queues, bounded rebuilds, and one-pass clone execution remain local-first through the stdio MCP package.

Apps SDK review pages are hosted alongside the endpoint: https://webembedding-mcp.vercel.app/privacy.html, https://webembedding-mcp.vercel.app/terms.html, and https://webembedding-mcp.vercel.app/submission.html.

Agent Marketplaces

This repository includes marketplace metadata for the two local agent surfaces:

Codex: .agents/plugins/marketplace.json points to ./bundle/source-first-clone.
Claude Code: .claude-plugin/marketplace.json points to the same bundle and the bundle includes .claude-plugin/plugin.json.

Claude Code users can add the marketplace from GitHub with:

/plugin marketplace add jongko54/webEmbedding
/plugin install source-first-clone@webembedding

AI auto-selection expectations and golden prompts live in docs/ai-distribution.md and evals/ai-selection/webembedding-golden-prompts.json.

Install From Release

curl -fsSL https://github.com/jongko54/webEmbedding/releases/latest/download/install.sh | bash

Install From This Checkout

git clone https://github.com/jongko54/webEmbedding.git
cd webEmbedding
npm install
node ./bin/web-embedding.mjs install
node ./bin/web-embedding.mjs doctor

Install Into A Temporary Home

Useful for testing without touching your real agent home:

python3 python/web_embedding/installer.py install --target-home ./.tmp/home
python3 python/web_embedding/installer.py doctor --target-home ./.tmp/home
python3 python/web_embedding/installer.py uninstall --target-home ./.tmp/home

Opt-in Telemetry

Telemetry is disabled by default. On an interactive first install, web-embedding install asks once and defaults to No. Non-interactive installs such as CI and curl | bash do not prompt. If you opt in, web-embedding sends a small anonymous command-completion event to a JSON POST endpoint you control. It does not send target URLs, local paths, captured HTML, screenshots, storage state, environment variables, API keys, or command output.

Enable it during install:

web-embedding install --telemetry --telemetry-endpoint https://your-collector.example/events

Or manage it later:

web-embedding telemetry enable --endpoint https://your-collector.example/events
web-embedding telemetry status
web-embedding telemetry disable
web-embedding telemetry reset-id

Each event contains an anonymous install id, package version, command name, success/failure status, OS/runtime basics, and coarse option flags such as breakpoint_count or install_source.

Environment controls:

WEB_EMBEDDING_TELEMETRY=1
WEB_EMBEDDING_NO_TELEMETRY=1
WEB_EMBEDDING_TELEMETRY_PROMPT=0
WEB_EMBEDDING_TELEMETRY_ENDPOINT=https://your-collector.example/events
WEB_EMBEDDING_TELEMETRY_LOG=./telemetry.jsonl

Run a local/self-hosted JSONL collector:

npm run telemetry:collector -- --host 127.0.0.1 --port 8765 --out ./telemetry.jsonl
WEB_EMBEDDING_TELEMETRY=1 \
WEB_EMBEDDING_TELEMETRY_ENDPOINT=http://127.0.0.1:8765/events \
web-embedding doctor

Summarize collected usage:

npm run telemetry:summarize -- ./telemetry.jsonl

The summary includes install and clone executions, total command executions, unique anonymous install IDs, command counts, and version counts. See docs/telemetry.md for collector and analyzer details.

Quick Start

Inspect a URL and get route hints:

node ./bin/web-embedding.mjs inspect \
  --url https://developer.mozilla.org/en-US/

Run the full clone workflow:

node ./bin/web-embedding.mjs clone \
  --url https://developer.mozilla.org/en-US/ \
  --output-dir ./.tmp/mdn-clone \
  --wait-seconds 2 \
  --timeout-seconds 35 \
  --breakpoints mobile tablet

Run a lightweight quality benchmark:

python3 scripts/check_clone_quality_bench.py \
  https://developer.mozilla.org/en-US/ \
  --output-root ./.tmp/clone-quality-bench \
  --wait-seconds 1 \
  --timeout-seconds 35 \
  --breakpoints mobile tablet

The benchmark prints compact rows for root, visual, and breakpoint scores. The full artifacts are written under the output directory.

CLI Commands

node ./bin/web-embedding.mjs capabilities
node ./bin/web-embedding.mjs install
node ./bin/web-embedding.mjs doctor
node ./bin/web-embedding.mjs uninstall
node ./bin/web-embedding.mjs paths
node ./bin/web-embedding.mjs telemetry status

node ./bin/web-embedding.mjs inspect --url https://www.mozilla.org/

node ./bin/web-embedding.mjs capture \
  --url https://www.mozilla.org/ \
  --output-dir ./.tmp/capture-mozilla \
  --breakpoints mobile tablet

node ./bin/web-embedding.mjs reproduce \
  --url https://www.mozilla.org/ \
  --output-dir ./.tmp/reproduce-mozilla \
  --breakpoints mobile tablet

node ./bin/web-embedding.mjs clone \
  --url https://www.mozilla.org/ \
  --output-dir ./.tmp/clone-mozilla \
  --breakpoints mobile tablet

node ./bin/web-embedding.mjs verify \
  --reference-bundle ./.tmp/reference/capture.json \
  --candidate-bundle ./.tmp/candidate/capture.json

Output Artifacts

A clone run can produce:

capture.json
pipeline-run-manifest.json
dom/snapshot.json
dom/runtime.html
styles/computed-summary.json
styles/css-analysis.json
network/manifest.json
network/har.json
network/har-like.json
network/replay-report.json
assets/inventory.json
interactions/states.json
interactions/trace.json
screenshots/runtime.png
session/storage-state.json
reproduction/plan.json
reproduction/evidence-limitations.json
reproduction/rebuild-prompt.txt
reproduction/rebuild/starter.html
reproduction/rebuild/starter.css
reproduction/rebuild/starter.tsx
reproduction/rebuild/next-app/
reproduction/self-verify/summary.json
reproduction/self-verify/renderers/*/verification.json
reproduction/self-verify/renderers/*/visual-qa.json
reproduction/self-verify/renderers/*/breakpoints/*-verification.json

Quality Benchmark

Run the default small benchmark:

npm run check:clone-bench:local

Run the universal route regression corpus and expectations gate:

npm run check:benchmark-routes:local

Run a lightweight clone score gate:

npm run check:clone-score-gate:local

Validate the committed benchmark evidence manifest:

npm run check:benchmark-evidence:local

Validate production pipeline gates:

npm run check:production-readiness:local

Run the operational smokes individually:

npm run check:job-queue:local
npm run check:har-replay:local
npm run check:authenticated-corpus:local

Classify failure/action codes from a route report:

npm run classify:pipeline-failures -- --report ./.tmp/universal-route-benchmark/universal-route-report.json

Find low-scoring persisted benchmark artifacts:

npm run summarize:benchmark-scores -- --root ./.tmp --min-score 60 --max-score 70

Run specific URLs:

python3 scripts/check_clone_quality_bench.py \
  https://www.example.com \
  https://www.mozilla.org/ \
  --no-breakpoints

Run a responsive benchmark:

python3 scripts/check_clone_quality_bench.py \
  https://developer.mozilla.org/en-US/ \
  --breakpoints mobile tablet

Development Checks

python3 -m py_compile \
  bundle/source-first-clone/mcp/source_first_clone/*.py \
  scripts/check_integration_smoke.py \
  scripts/check_clone_quality_bench.py

npm run check:integration:local

git diff --check

Repo Layout

bundle/source-first-clone Installed plugin bundle, MCP server, and exact-clone intake skill.
bundle/source-first-clone/mcp/source_first_clone Capture, planning, rebuild, repair, and verification engine.
bin/web-embedding.mjs Node CLI wrapper.
python/web_embedding/installer.py Shared installer and command dispatcher.
scripts/check_clone_quality_bench.py URL clone quality benchmark helper.
scripts/benchmark_routes.py Universal route/capture-depth regression benchmark helper.
scripts/check_benchmark_report.py Benchmark expectation validator for exact, minimum, and contains-style checks.
scripts/check_benchmark_evidence.py Benchmark evidence manifest validator.
scripts/check_job_queue_smoke.py Filesystem async clone job queue smoke test.
scripts/check_har_replay_smoke.py Deterministic HAR replay engine smoke test.
scripts/benchmark_authenticated_corpus.py User-provided authenticated dashboard corpus runner.
scripts/summarize_benchmark_scores.py Utility for finding low or high scoring persisted benchmark artifacts under an output root.
scripts/classify_pipeline_failures.py Operational failure/action taxonomy summarizer for reports and capture artifacts.
scripts/check_production_readiness.py Production readiness gate validator for corpus, failure taxonomy, CI wiring, and policy docs.
scripts/check_integration_smoke.py Release, install, and URL-only clone smoke test.
scripts/release_bundle.py Release artifact builder.
docs/ Architecture notes and universal benchmark documentation.

Positioning

The strongest claim for this project is:

A source-first website cloning engine that combines Playwright capture, HAR replay, MCP tools, and self-verification to rebuild iframe-blocked public pages with reproducible visual, DOM, style, interaction, and responsive scores.

Avoid treating the output as a legal or ownership bypass. The engine can reconstruct public page structure, but permission, licensing, and acceptable use still matter.

License

MIT

Reviews

No reviews yet

Be the first to review this server!

More AI & ML MCP Servers

Sequential Thinking

Free

by Modelcontextprotocol · AI & ML

Dynamic and reflective problem-solving through structured thought sequences

Toleno

Free

by Toleno · Developer Tools

Toleno Network MCP Server — Manage your Toleno mining account with Claude AI using natural language.

mcp-creator-python

Free

by mcp-marketplace · Developer Tools

Create, build, and publish Python MCP servers to PyPI — conversationally.

WebEmbedding MCP Server

About

Security Report

Findings (2)

Permissions Required

How to Install & Connect

Documentation

webEmbedding

Current Status

Measured Checkpoints

Core Features

Install

Requirements

Install From npm

Use As An MCP Server

Hosted Apps SDK Intake Endpoint

Agent Marketplaces

Install From Release

Install From This Checkout

Install Into A Temporary Home

Opt-in Telemetry

Quick Start

CLI Commands

Output Artifacts

Quality Benchmark

Development Checks

Repo Layout

Positioning

License

Reviews

No reviews yet

More AI & ML MCP Servers

Sequential Thinking

Toleno

mcp-creator-python

MarkItDown

mcp-creator-typescript

FinAgent