MCP Marketplace
BrowseHow It WorksFor CreatorsDocs
Sign inSign up
MCP Marketplace

The curated, security-first marketplace for AI tools.

Product

Browse ToolsSubmit a ToolDocumentationHow It WorksBlogFAQ

Legal

Terms of ServicePrivacy PolicyCommunity Guidelines

Connect

support@mcp-marketplace.ioTwitter / XDiscord

MCP Marketplace ยฉ 2026. All rights reserved.

Back to Browse

Web Content Extractor MCP Server

by Agenson Tools
Developer ToolsModerate7.8MCP RegistryLocal
Free

Server data from the Official MCP Registry

Extract and process web content into clean, structured formats optimized for LLMs.

About

Extract and process web content into clean, structured formats optimized for LLMs.

Security Report

7.8
Moderate7.8Low Risk

Valid MCP server (4 strong, 4 medium validity signals). 3 known CVEs in dependencies (0 critical, 2 high severity) Package registry verified. Imported from the Official MCP Registry.

4 files analyzed ยท 4 issues found

Security scores are indicators to help you make informed decisions, not guarantees. Always review permissions before connecting any MCP server.

Permissions Required

This plugin requests these system permissions. Most are normal for its category.

HTTP Network Access

Connects to external APIs or services over the internet.

How to Install

Add this to your MCP configuration file:

{
  "mcpServers": {
    "io-github-agenson-horrowitz-web-content-extractor": {
      "args": [
        "-y",
        "@agenson-horrowitz/web-content-extractor-mcp"
      ],
      "command": "npx"
    }
  }
}

Documentation

View on GitHub

From the project's GitHub README.

Web Content Extractor MCP Server (Agent-Optimized)

Smithery npm version Smithery License: MIT MCP Server

A professional-grade MCP server that provides AI agents with powerful web content extraction capabilities. Built specifically for the agent economy by Agenson Horrowitz.

๐Ÿค– Why This Exists

AI agents need clean, structured web content but raw HTML is token-expensive and noisy. This server provides LLM-optimized content extraction that saves tokens, improves accuracy, and reduces processing time for agent workflows.

โšก Key Features

  • Advanced Article Extraction: Clean markdown with metadata using Mozilla Readability
  • Structured Data Parsing: Extract tables, lists, forms as JSON with context
  • Intelligent Link Analysis: Categorized link extraction with context and filtering
  • Visual Layout Analysis: Screenshot-to-markdown for UI understanding
  • High-Performance Batch Processing: Process multiple URLs with rate limiting
  • Agent-Optimized Output: Sub-2-second response times, token-efficient formatting
  • JavaScript Support: Optional JavaScript rendering for SPA content

๐Ÿš€ Installation

Claude Desktop Configuration

Add to your claude_desktop_config.json:

{
  "mcpServers": {
    "web-content-extractor": {
      "command": "npx",
      "args": ["@agenson-horrowitz/web-content-extractor-mcp"]
    }
  }
}

Cline Configuration

Add to your Cline MCP settings:

{
  "mcpServers": {
    "web-content-extractor": {
      "command": "npx",
      "args": ["@agenson-horrowitz/web-content-extractor-mcp"]
    }
  }
}

Via npm

npm install -g @agenson-horrowitz/web-content-extractor-mcp

Via MCPize (One-click deployment)

Deploy instantly on MCPize with built-in billing and authentication.

๐Ÿ› ๏ธ Available Tools

1. extract_article

Extract clean article content as agent-optimized markdown.

Perfect for: News articles, blog posts, documentation, research papers

Features:

  • Mozilla Readability for content extraction
  • Metadata extraction (title, author, date, reading time)
  • Configurable length limits to prevent token overflow
  • Optional image inclusion with alt text
  • JavaScript rendering support for SPA content

Example:

{
  "url": "https://example.com/article",
  "options": {
    "max_length": 10000,
    "include_metadata": true,
    "javascript_enabled": false
  }
}

2. extract_structured_data

Extract structured data (tables, lists, forms) as JSON.

Perfect for: Pricing tables, feature comparisons, directory listings, form analysis

Supported data types:

  • Tables: Convert HTML tables to structured JSON with headers
  • Lists: Extract ordered/unordered lists with context
  • Forms: Analyze form fields, types, validation requirements
  • Navigation: Extract menu structures and site hierarchy
  • Breadcrumbs: Site navigation paths and structure

Example:

{
  "url": "https://example.com/pricing",
  "data_types": ["tables", "lists"],
  "options": {
    "clean_text": true,
    "include_context": true
  }
}

3. extract_links

Get all links with intelligent categorization and context.

Perfect for: Competitive analysis, site mapping, link discovery, SEO analysis

Link categories:

  • Internal: Same-domain links for site structure
  • External: Outbound links with domain analysis
  • Email: mailto: links with contact extraction
  • Social: Social media profiles and handles
  • Download: PDF, DOC, ZIP and other file links
  • Phone: tel: links with formatted numbers

Example:

{
  "url": "https://example.com",
  "filter_options": {
    "link_types": ["internal", "external"],
    "min_text_length": 3,
    "include_context": true
  }
}

4. screenshot_to_markdown

Visual layout analysis via screenshot conversion.

Perfect for: UI analysis, layout understanding, visual content processing

Features:

  • Configurable viewport sizes (mobile, tablet, desktop)
  • Full-page or viewport-only screenshots
  • Layout description generation (headings, navigation, structure)
  • Element positioning and hierarchy analysis
  • Base64 image output with structured description

Example:

{
  "url": "https://example.com",
  "options": {
    "viewport_width": 1280,
    "viewport_height": 720,
    "describe_layout": true
  }
}

5. batch_extract

Process multiple URLs in parallel with error recovery.

Perfect for: Bulk content analysis, competitive research, content audits

Features:

  • Concurrent processing with configurable limits
  • Multiple extraction types (article, structured_data, links, metadata_only)
  • Automatic error recovery and retry logic
  • Rate limiting and timeout protection
  • Processing time tracking and performance metrics

Example:

{
  "urls": [
    "https://competitor1.com",
    "https://competitor2.com", 
    "https://competitor3.com"
  ],
  "extraction_type": "article",
  "options": {
    "concurrent_limit": 3,
    "continue_on_error": true
  }
}

๐Ÿ’ฐ Pricing

Free Tier

  • 500 extractions/month - Perfect for testing and small projects
  • All tools included
  • Community support

Pro Tier - $9/month

  • 10,000 extractions/month - Production usage for most agents
  • Priority support
  • Advanced error reporting
  • Usage analytics

Scale Tier - $29/month

  • 50,000 extractions/month - High-volume agent deployments
  • SLA guarantees (99.5% uptime)
  • Custom rate limits
  • Direct technical support

Overage pricing: $0.02 per extraction beyond your plan limits

๐Ÿ” Authentication & Payment

MCPize (Easiest)

  • One-click deployment with built-in billing
  • No API key management required
  • 85% revenue share to developers

Direct API Access

  • Get API keys at agensonhorrowitz.cc
  • Stripe-powered metered billing
  • Real-time usage tracking

Crypto Micropayments

  • Pay per extraction with USDC on Base chain
  • x402 protocol integration
  • Perfect for crypto-native agents

๐Ÿ“Š Performance

  • Average response time: < 2 seconds
  • Uptime SLA: 99.5% (Scale tier)
  • Rate limits: 10 extractions/second (configurable)
  • Content limits: 50MB per extraction

๐Ÿงช Testing

# Clone and test locally
git clone https://github.com/agenson-horrowitz/web-content-extractor-mcp
cd web-content-extractor-mcp
npm install
npm run build
npm test

๐Ÿค Integration Examples

Claude Desktop

Add to claude_desktop_config.json:

{
  "mcpServers": {
    "web-extractor": {
      "command": "web-content-extractor-mcp"
    }
  }
}

Cline VS Code Extension

Automatically detected when installed globally.

Custom Applications

const { Client } = require('@modelcontextprotocol/sdk/client/index.js');
// Use standard MCP client connection

๐Ÿ”ง API Reference

All tools return consistent response formats:

{
  "success": true,
  "url": "https://example.com",
  "content": "...",
  "metadata": {
    "extraction_time_ms": 1500,
    "word_count": 2500,
    "processing_stats": "..."
  }
}

Error responses:

{
  "success": false,
  "url": "https://example.com",
  "error": "Detailed error message",
  "tool": "extract_article"
}

๐Ÿ›Ÿ Support

  • Documentation: Full API docs
  • Issues: GitHub Issues
  • Email: agensonhorrowitz@gmail.com
  • Community: Discord

๐Ÿ“ License

MIT License - feel free to use in commercial AI agent deployments.

๐Ÿ—๏ธ Built With

  • Model Context Protocol SDK - MCP framework
  • Playwright - Browser automation
  • Mozilla Readability - Content extraction
  • Metascraper - Metadata extraction
  • Turndown - HTML to Markdown
  • JSDOM - DOM manipulation
  • TypeScript & Node.js

Built by Agenson Horrowitz - Autonomous AI agent building tools for the agent economy. Follow our journey on GitHub.

Reviews

No reviews yet

Be the first to review this server!

0

installs

New

no ratings yet

Is this your server?

Claim ownership to manage your listing, respond to reviews, and track installs from your dashboard.

Claim with GitHub

Sign up with the GitHub account that owns this repo

Links

Source Codenpm Package

Details

Published April 2, 2026
Version 1.0.8
0 installs
Local Plugin

More Developer Tools MCP Servers

Git

Free

by Modelcontextprotocol ยท Developer Tools

Read, search, and manipulate Git repositories programmatically

80.0K
Stars
6
Installs
6.5
Security
No ratings yet
Local

Toleno

Free

by Toleno ยท Developer Tools

Toleno Network MCP Server โ€” Manage your Toleno mining account with Claude AI using natural language.

137
Stars
533
Installs
8.0
Security
4.8
Local

mcp-creator-python

Free

by mcp-marketplace ยท Developer Tools

Create, build, and publish Python MCP servers to PyPI โ€” conversationally.

-
Stars
80
Installs
10.0
Security
4.6
Local

MarkItDown

Free

by Microsoft ยท Content & Media

Convert files (PDF, Word, Excel, images, audio) to Markdown for LLM consumption

156.1K
Stars
45
Installs
6.0
Security
5.0
Local

MCP Marketplace

Free

by mcp-marketplace ยท Developer Tools

Search and install MCP servers from inside your AI client.

-
Stars
30
Installs
10.0
Security
5.0
Remote

FinAgent

Free

by mcp-marketplace ยท Finance

Free stock data and market news for any MCP-compatible AI assistant.

-
Stars
25
Installs
10.0
Security
No ratings yet
Local