Server data from the Official MCP Registry
Track what LLMs cite: AI citation data from Perplexity, Claude, ChatGPT, Gemini, and Bing.
Track what LLMs cite: AI citation data from Perplexity, Claude, ChatGPT, Gemini, and Bing.
Valid MCP server (2 strong, 3 medium validity signals). No known CVEs in dependencies. Package registry verified. Imported from the Official MCP Registry.
4 files analyzed · 1 issue found
Security scores are indicators to help you make informed decisions, not guarantees. Always review permissions before connecting any MCP server.
This plugin requests these system permissions. Most are normal for its category.
Add this to your MCP configuration file:
{
"mcpServers": {
"io-github-automatelab-tech-citation-intelligence": {
"args": [
"-y",
"@automatelab/citation-intelligence"
],
"command": "npx"
}
}
}From the project's GitHub README.
A free, self-hosted MCP server that tells your agent what LLMs cite - across Perplexity, Google AI Overviews, ChatGPT, Claude, Gemini, and Bing.
An MCP server for agents and developers who need to know which URLs get cited by AI search engines for any query. Install once, query from any MCP-compatible client (Claude Desktop, Cursor, Claude Code, Continue, Cline, n8n, LangGraph). Self-hosted, no account, no centralized backend. Bring your own API keys; nothing is stored on a remote server.
Install this if you're:
Do NOT install this if you want:
The AI citation tracking market is dominated by VC-funded dashboards starting at $295/mo. None ships MCP-first. If you're an agent or developer who wants citation data piped directly into your workflow - not into a SaaS login - there isn't a tool for you. This is that tool.
Start with citation_provenance or am_i_cited. Single-engine results (check_citations with a pinned engine) are directional; multi-engine consensus is the honest signal. A URL cited by 4 of 5 engines is a very different finding than one cited by 1.
| Tool | Purpose |
|---|---|
citation_provenance | Recommended first tool. Fan a query across engines; per-URL cross-engine consensus matrix. Returns interpretation_note per engine. |
am_i_cited | Domain citation check. With engine=auto (default): fans across all available LLM engines, returns per-engine breakdown + cross-engine consensus. Pin engine= to reduce cost. |
check_citations | URLs cited by Perplexity / Claude / ChatGPT / Gemini / Google AI Mode for a query; or web rank via bing_serp / brave_serp |
ai_overview | Google AI Overview presence + cited sources |
cited_for | Queries the domain has been cited for, from local cache |
predict_citation | Citation likelihood from public signals - no LLM fired |
track_queries | Save / load / list named query panels (editorial watchlists) |
run_panel | Run a panel through am_i_cited and snapshot to disk |
citation_trend | Time-series report of citation rate + per-query gained/lost deltas |
compare_domains | Side-by-side predict_citation across 2-10 URLs |
wikipedia_mentions | List Wikipedia articles referencing a domain (zero keys) |
audit_sitemap | Bulk predict_citation across every URL in a sitemap, worst-first |
gsc_citation_gap | Join Google Search Console performance with AI citation status |
compete_for_query | End-to-end competitive snapshot: your URL vs top cited competitors |
citation_freshness_score | Recency score (halflife=365d) for the pages an engine cites |
cited_for_diff | Diff of cited_for between two time windows for a domain |
schema_audit | Deep schema.org validation - required fields per @type, malformed JSON-LD |
llms_txt_generator | Generate an llms.txt (https://llmstxt.org) from a sitemap |
answer_box_position | Bin each citation's first mention in raw_answer into early/middle/late thirds |
citation_provenance | Fan a query across engines, report per-URL cross-engine consensus |
citation_evidence | Extract the cited snippet from raw_answer for each citation (why, not just that) |
crawler_access_audit | Verify GPTBot / ClaudeBot / PerplexityBot / CCBot / Google-Extended etc. can fetch a URL |
sitemap_citation_map | Cross-reference sitemap URLs with cached citations (inverse of audit_sitemap) |
canonical_competitor_set | Top cited domains per query, aggregated across engines |
Server-side prompt templates the client can offer end users (call via the MCP prompt list):
audit_citation_readiness(url) - chains predict_citation + schema_auditcompetitor_snapshot(query, your_url?) - chains canonical_competitor_set + compete_for_queryai_crawler_checkup(url) - runs crawler_access_audit and writes a remediation listcitation_gap_analysis(domain, days?) - drives gsc_citation_gap and suggests next movessitemap_coverage_review(sitemap_url) - runs sitemap_citation_map and recommends prioritiesCache views the client can read or subscribe to (no tool call required):
citation://cache/summary - entry counts by type/engine, unique queries/URLs, oldest/newestcitation://panels - saved panels + per-panel snapshot countscitation://docs/llms-txt - llms.txt primer (markdown)citation://docs/ai-crawlers - AI crawlers cheatsheet (markdown)citation://domain/{domain}/cited-for - dynamic template: citations for {domain}Every response includes a surface field that tells you exactly how the data was collected. Understanding this is important before drawing conclusions.
| Surface | Engines | What it means |
|---|---|---|
consumer_scrape | perplexity, google_ai_mode | Proxied through a real consumer-facing AI search product. Closest to what your users see. |
api_proxy | claude, openai, gemini | API call to a search-enabled LLM. May differ from consumer product behavior — different model versions, no UI-level ranking logic, no personalization. Use as a directional proxy, not as ground truth. |
web_rank | bing_serp, brave_serp | Traditional web search rank (not LLM citation). Measures whether a URL appears in SERP results, not whether an LLM cites it. |
static_signal | predict_citation, wikipedia_mentions | Offline signal computed from public data. No live LLM query. |
perplexity (consumer_scrape) — Sonar Pro via the Perplexity API with a consumer-equivalent system prompt. Reasonably close to Perplexity.ai. Citations come from search_results in the response; the citations fallback contains URL-only entries without title.
claude (api_proxy) — Claude Sonnet via the Anthropic Messages API with web_search tool enabled. The consumer Claude.ai product uses different routing and ranking logic. Citation behavior can differ, especially for recent/time-sensitive queries.
openai (api_proxy) — gpt-4o-search-preview via the OpenAI Responses API. This is the model OpenAI ships to mirror SearchGPT behavior — closer to consumer than gpt-4o-mini, but still API-tier.
gemini (api_proxy) — Gemini 2.5 Pro via the Generative Language API with google_search grounding. Consumer Gemini uses the same grounding index but different re-ranking. Results are directional.
google_ai_mode (consumer_scrape) — Google AI Mode results via SerpAPI. Closest to what users see in Google Search. Requires SERPAPI_KEY.
bing_serp / brave_serp (web_rank) — Traditional SERP rank. Does NOT measure LLM citations. Use check_citations with these engines to compare organic web rank against LLM citation rank. am_i_cited refuses these engines — it only measures LLM behavior.
The proxy nature of api_proxy engines is a feature, not a bug: it lets you run citation checks without consuming expensive consumer-product quota. Just don't report API-proxy numbers as "ChatGPT cites you" without the caveat.
Every tool response includes an interpretation_note field that summarizes the fidelity in one sentence. Full per-engine fidelity ratings: docs/surface-fidelity.md.
npx -y @automatelab/citation-intelligence
Requires Node 20 or later.
Add to %APPDATA%\Claude\claude_desktop_config.json (Windows) or ~/Library/Application Support/Claude/claude_desktop_config.json (macOS):
{
"mcpServers": {
"citation-intelligence": {
"command": "npx",
"args": ["-y", "@automatelab/citation-intelligence"],
"env": {
"PERPLEXITY_API_KEY": "pplx-...",
"SERPAPI_KEY": "...",
"ANTHROPIC_API_KEY": "sk-ant-...",
"OPENAI_API_KEY": "sk-...",
"GEMINI_API_KEY": "..."
}
}
}
}
Set only the keys you have. Any MCP client that supports stdio transport works - same command / args pattern.
~/.config/citation-intelligence/cache.json. Repeated queries hit cache, not API. Default TTL: 7 days.~/.config/citation-intelligence/cache.json. Delete it any time.| Var | Purpose | Free tier? |
|---|---|---|
PERPLEXITY_API_KEY | check_citations (perplexity — consumer_scrape) | Yes |
SERPAPI_KEY | ai_overview + check_citations (google_ai_mode — consumer_scrape) | 100/month free |
ANTHROPIC_API_KEY | check_citations (claude — api_proxy) | Paid only |
OPENAI_API_KEY | check_citations (openai — api_proxy) | Paid only |
GEMINI_API_KEY | check_citations (gemini — api_proxy) | Yes |
BING_API_KEY | check_citations (bing_serp — web_rank) | Yes |
BRAVE_API_KEY | check_citations (brave_serp — web_rank) | Yes (2000/month) |
CITATION_CACHE_TTL_DAYS | Cache TTL for citation_check entries (default 7) | n/a |
CITATION_AI_OVERVIEW_TTL_DAYS | Cache TTL for ai_overview entries (default 1) | n/a |
CITATION_CONFIG_DIR | Override config dir (default ~/.config/citation-intelligence) | n/a |
You: For the queries "best AI citation tracker", "MCP for AI search", "self-hosted GEO tool",
is automatelab.tech cited?
(agent invokes am_i_cited)
Result:
{
"domain": "automatelab.tech",
"engine": "perplexity",
"results": [
{ "query": "best AI citation tracker", "cited": true, "rank": 4 },
{ "query": "MCP for AI search", "cited": true, "rank": 1 },
{ "query": "self-hosted GEO tool", "cited": false, "matching_urls": [] }
],
"summary": {
"queries_total": 3,
"queries_cited": 2,
"citation_rate": 0.67,
"average_rank": 2.5
}
}
You: How likely is https://example.com/blog/post to be cited by AI?
(agent invokes predict_citation)
Result:
{
"url": "https://example.com/blog/post",
"score": 62,
"grade": "C",
"signals": {
"wikipedia_linked": false,
"github_referenced": false,
"reddit_referenced": true,
"llms_txt_present": true,
"https": true,
"has_article_schema": true,
"has_faq_schema": false,
"has_breadcrumb_schema": true,
"canonical_clean": true,
"word_count": 1850,
"reading_time_minutes": 8,
"h2_count": 7,
"h2_question_count": 1,
"authority_link_count": 2,
"external_link_count": 6,
"internal_link_count": 11,
"last_modified_days_ago": 42,
"has_open_graph": true
},
"fixes": [
{ "signal": "has_faq_schema", "suggestion": "Page already has question-style H2s. Wrap them in FAQPage JSON-LD - high-leverage win.", "estimated_lift": "high" },
{ "signal": "h2_question_count", "suggestion": "Reframe at least 2 H2s as questions users actually ask...", "estimated_lift": "medium" }
]
}
The Wikipedia signal is measured (it correlates with citation) but no "go get a Wikipedia article" suggestion is emitted - the advice would be non-actionable. Scoring is split across six buckets - domain authority, structured data, content depth, link graph, freshness, metadata - so a thin page and a deep page on the same domain get meaningfully different scores.
Concrete patterns that compose the 12 tools into something useful. Costs assume ChatGPT or Perplexity at ~$0.01-0.03/query.
The single highest-ROI pattern. Pick 20-30 queries from your editorial backlog, snapshot weekly, watch the rate trend.
# One-time setup
track_queries name="editorial-watchlist" domain="example.com" action="save"
queries=["best widget tutorial", "how to set up X", ...]
# Weekly cron (5 min, ~$0.20-0.60 per run)
run_panel name="editorial-watchlist"
# Anytime
citation_trend panel="editorial-watchlist"
citation_trend returns per-query deltas: which queries flipped from cited: false to cited: true since the first snapshot. That's your real editorial-impact metric.
Before publishing a post, find out who owns the citation slot and whether the slot is worth competing for.
# 1. Is there an AI Overview to compete for?
ai_overview query="<target query>"
# 2. Who is cited today?
check_citations query="<target query>"
# 3. After publish + 14 days: did the post break in?
am_i_cited domain="example.com" queries=["<target query>"]
If check_citations returns 5+ strong incumbents on a low-volume query, pick a different angle. If ai_overview_present: false, the query has no AI surface - reconsider.
Catch site-wide structural issues across every page in one pass. Zero API spend.
audit_sitemap sitemap_url="https://example.com/sitemap.xml" limit=200
Returns worst_first sorted by citation-likelihood score. Surfaces missing schema, conflicting canonicals, missing /llms.txt, broken HTTPS.
You're not cited; they are. Why?
# 1. Find the top-cited URLs for your target query
check_citations query="<query>"
# 2. Compare your URL to theirs signal-by-signal
compare_domains urls=[
"https://example.com/your-post",
"https://competitor-1.com/their-post",
"https://competitor-2.com/their-post"
]
diverging_signals is the list of where you're losing. Usually obvious once you see it - they have FAQ schema, GitHub references, Wikipedia links - you don't.
The closest editorial wins are queries where you already rank in Google's top 10 but are invisible to AI. Requires a GCP service account with webmasters.readonly scope.
gsc_citation_gap
domain="example.com"
queries=["...editorial watchlist..."]
start_date="2026-04-01"
end_date="2026-05-01"
closest_wins returns queries with position <= 10 and ai_cited: false, sorted by impressions desc. Push citation signals on those specific URLs first.
Wikipedia is the top-correlation signal but the advice "get on Wikipedia" is useless. So instead: watch when it happens organically.
wikipedia_mentions domain="example.com" limit=50
Returns Wikipedia article URLs that already link to the domain. Re-run quarterly; the diff is your "we got a Wikipedia citation" alert.
{
"@context": "https://schema.org",
"@type": "SoftwareApplication",
"name": "Citation Intelligence MCP",
"applicationCategory": "DeveloperApplication",
"operatingSystem": "Cross-platform",
"description": "Self-hosted MCP server for querying AI citation data from Perplexity, Claude, ChatGPT, Gemini, Bing, and Google AI Overviews.",
"offers": { "@type": "Offer", "price": "0" },
"url": "https://github.com/AutomateLab-tech/citation-intelligence"
}
Bug reports, feature ideas, and PRs welcome. See CONTRIBUTING.md.
Report a vulnerability via SECURITY.md.
MIT - see LICENSE.
Built by automatelab.tech
Be the first to review this server!
by Toleno · Developer Tools
Toleno Network MCP Server — Manage your Toleno mining account with Claude AI using natural language.
by mcp-marketplace · Developer Tools
Create, build, and publish Python MCP servers to PyPI — conversationally.
by Microsoft · Content & Media
Convert files (PDF, Word, Excel, images, audio) to Markdown for LLM consumption