Server data from the Official MCP Registry
Local OCR & image analysis via Apple Vision — no cloud, no API keys, ~97% fewer tokens on PDFs.
Local OCR & image analysis via Apple Vision — no cloud, no API keys, ~97% fewer tokens on PDFs.
Valid MCP server (2 strong, 3 medium validity signals). 3 known CVEs in dependencies (0 critical, 3 high severity) Package registry verified. Imported from the Official MCP Registry.
4 files analyzed · 4 issues found
Security scores are indicators to help you make informed decisions, not guarantees. Always review permissions before connecting any MCP server.
Add this to your MCP configuration file:
{
"mcpServers": {
"io-github-woladi-macos-vision-mcp": {
"args": [
"-y",
"macos-vision-mcp"
],
"command": "npx"
}
}
}From the project's GitHub README.
Local OCR & image analysis for any MCP client — private, offline, no API keys.
Pre-extracts text and image data locally before your AI ever sees it — cutting token usage by ~97% on real documents and returning structured paragraphs, lines, and bounding boxes so the model can reconstruct the document into Markdown, HTML, DOCX, or any other format. Files never leave your Mac: no cloud API, no API keys, no network requests.
npm install — powered by Apple Vision Framework, same engine as Live Text in Photos.app.❌ Without macos-vision-mcp:
✅ With macos-vision-mcp:
macos-vision-mcp acts as a local pre-processing layer between your documents and the cloud. Useful for:
Instead of sending the raw document to your AI, you extract the text and structure locally first. The model then works only with the extracted text — never the original file.
Step 1 — Install the package:
npm install -g macos-vision-mcp
Step 2 — Add to your MCP client (example for Claude Code):
claude mcp add macos-vision-mcp -- macos-vision-mcp
Restart your client. The tools appear automatically.
Note: The native module
macos-visioncompiles against your local Node.js at install time. If you switch Node versions, runnpm rebuildinside the package directory.
| Tool | What it does | Example prompt |
|---|---|---|
ocr_image | Extract text from an image or PDF (JPG, PNG, HEIC, TIFF, PDF). Returns plain text, or per-page paragraphs + text blocks with lineId / paragraphId and bounding boxes. | "Read the text from ~/Desktop/screenshot.png" |
detect_faces | Detect human faces and return their count and positions. | "How many people are in this photo?" |
detect_barcodes | Read QR codes, EAN, UPC, Code128, PDF417, Aztec, and other 1D/2D codes. | "What does the QR code in /tmp/qr.jpg say?" |
classify_image | Classify image content into 1000+ categories with confidence scores. | "What is in this image?" |
analyze_document | Returns structured JSON with reading-order paragraphs, raw text blocks (bbox / confidence), faces, barcodes, and rectangles — ready for the model to reconstruct into Markdown, HTML, or anything else. | "Reconstruct ~/Desktop/scan.pdf as clean Markdown" |
Use the tool name explicitly in your prompt to guarantee local processing:
Extract text from an image or PDF:
Use ocr_image to extract text from ~/Desktop/invoice.pdf
Detect faces in a photo:
Use detect_faces on ~/Photos/team.jpg and tell me how many people are in it
Classify image content:
Use classify_image on ~/Downloads/unknown.jpg
Full document analysis + reconstruction:
Use analyze_document on ~/Desktop/report.pdf and reconstruct it as clean Markdown
The tool returns structured JSON; the model picks the output format you ask for (Markdown, HTML, DOCX outline, etc.) without any extra dependencies — no Ollama, no cloud LLM, no extra tooling.
{
"source": { "path": "...", "pageCount": 1, "isPdf": false },
"pages": [
{
"page": 0,
// primary surface for reconstruction — reading-order paragraphs joined with "\n"
"paragraphs": [
{ "paragraphId": 0, "lineIds": [0], "text": "ACME COFFEE" },
{ "paragraphId": 1, "lineIds": [1, 2], "text": "12 Main St\nPortland, OR" },
],
// spatial fallback — raw blocks with page-local 0–1 bbox, confidence, line/paragraph membership
"textBlocks": [
{
"text": "ACME COFFEE",
"lineId": 0,
"paragraphId": 0,
"confidence": 0.99,
"bbox": { "x": 0.21, "y": 0.04, "width": 0.58, "height": 0.06 },
},
],
"faces": [],
"barcodes": [],
"rectangles": [],
},
],
"summary": {
"totalTextBlocks": 8,
"totalParagraphs": 2,
"totalFaces": 0,
"totalBarcodes": 0,
"totalRectangles": 0,
},
}
Use paragraphs[].text for the 95% case (rebuild Markdown/HTML/plain text directly). Reach for textBlocks[] when you need spatial context — multi-column layouts, tables, forms, IDs.
Notes:
ocr_image in blocks mode returns the same per-page shape minus the detection sections: { pages: [{ page, paragraphs, textBlocks }] }.paragraphId / lineId reset on every page.textBlocks[] and reconstruct from the bounding boxes.claude mcp add macos-vision-mcp -- macos-vision-mcp
Edit ~/Library/Application Support/Claude/claude_desktop_config.json:
{
"mcpServers": {
"macos-vision-mcp": {
"command": "macos-vision-mcp"
}
}
}
Add to ~/.cursor/mcp.json:
{
"mcpServers": {
"macos-vision-mcp": {
"command": "macos-vision-mcp"
}
}
}
If you installed with npx rather than globally, replace "command": "macos-vision-mcp" with "command": "npx", "args": ["macos-vision-mcp"].
Contributions are welcome. Please follow Conventional Commits for commit messages — this project uses release-it with @release-it/conventional-changelog to automate releases.
git clone <repo>
cd macos-vision-mcp
npm install
npm run dev # watch mode
MIT — Adrian Wolczuk
Be the first to review this server!
by Modelcontextprotocol · Developer Tools
Web content fetching and conversion for efficient LLM usage
by Toleno · Developer Tools
Toleno Network MCP Server — Manage your Toleno mining account with Claude AI using natural language.
by mcp-marketplace · Developer Tools
Create, build, and publish Python MCP servers to PyPI — conversationally.