Server data from the Official MCP Registry
MCP server for the Semantic Scholar API: search 200M+ papers, citations, authors, recommendations.
MCP server for the Semantic Scholar API: search 200M+ papers, citations, authors, recommendations.
Valid MCP server (1 strong, 3 medium validity signals). 1 known CVE in dependencies Package registry verified. Imported from the Official MCP Registry.
3 files analyzed · 2 issues found
Security scores are indicators to help you make informed decisions, not guarantees. Always review permissions before connecting any MCP server.
This plugin requests these system permissions. Most are normal for its category.
Set these up before or after installing:
Environment variable: SEMANTIC_SCHOLAR_API_KEY
Add this to your MCP configuration file:
{
"mcpServers": {
"io-github-smaniches-semantic-scholar-mcp": {
"env": {
"SEMANTIC_SCHOLAR_API_KEY": "your-semantic-scholar-api-key-here"
},
"args": [
"s2-mcp-server"
],
"command": "uvx"
}
}
}From the project's GitHub README.
A 14-tool Semantic Scholar MCP server for academic research workflows. Direct access to 200M+ papers from Semantic Scholar — paper search, citation graph traversal, author profiles, and recommendations — from any Model Context Protocol client (e.g., Claude Desktop, Claude Code, Cursor, Cline, Continue, and others).
Every release ships verifiable supply-chain provenance: Sigstore-signed SLSA build-provenance attestations on the wheel, sdist, and container image; PEP 740 attestations on the PyPI upload; and a CycloneDX SBOM — so you can prove the artifact you installed was built from this repo. See Provenance & supply chain.
Author: Santiago Maniches · ORCID 0009-0005-6480-1987 · TOPOLOGICA LLC
A research tool is only as trustworthy as the chain from its source to the binary you run. Every release of this server ships cryptographically verifiable supply-chain evidence, all generated in CI from the tagged commit:
| Guarantee | What it proves | Where it is produced |
|---|---|---|
| SLSA build provenance (wheel + sdist) | the published distributions were built by this repo's publish.yml from the released tag, not hand-uploaded | publish.yml — actions/attest-build-provenance (lines 56–59) |
| SLSA build provenance (container image) | the ghcr.io image digest was built by this repo's docker.yml | docker.yml — actions/attest-build-provenance, push-to-registry (lines 110–116) |
| PEP 740 attestations | the PyPI upload itself carries Sigstore-backed attestations under Trusted Publishing | publish.yml — attestations: true (line 97) |
| CycloneDX SBOM | a machine-readable bill of materials, generated then attested against the distributions | publish.yml — cyclonedx-py + actions/attest-sbom (lines 46–64) |
| SHA-pinned Actions | every CI action is pinned to a commit SHA, so the release pipeline itself cannot silently change | all jobs in .github/workflows/ (e.g. publish.yml, docker.yml) |
Verify the wheel and the container image against their attestations with the GitHub CLI:
# Wheel / sdist (download from the PyPI project or the release assets first)
gh attestation verify s2_mcp_server-*.whl --repo smaniches/semantic-scholar-mcp
# Container image
gh attestation verify oci://ghcr.io/smaniches/semantic-scholar-mcp:latest \
--repo smaniches/semantic-scholar-mcp
The full supply-chain posture, including the known-limitations list, is in SECURITY.md. This is release-time provenance (proving how the artifact was built); the server does not currently attach a per-response receipt to individual API results.
There is no public Semantic Scholar MCP standard, so the most useful comparison is against the obvious alternative: calling the Semantic Scholar REST API yourself from an agent. Everything in the right-hand column is plumbing this server already owns and the caller would otherwise reimplement.
| This server | Raw S2 REST API from an agent | |
|---|---|---|
| Tool surface | 14 typed MCP tools (search, retrieval, recommendations, status) | caller composes raw HTTP requests |
| Citation graph | both directions (citations and references) in get_paper | manual paging over two endpoints |
| Bulk operations | papers (≤500) and authors (≤1000) in one call | caller batches and paginates |
| Full-text snippet search | snippet_search with surrounding context | separate endpoint, caller-assembled |
| Paper-ID resolution | seven formats — Semantic Scholar ID, DOI, ArXiv, PubMed, Corpus ID, ACL, URL — validated pre-flight (validators.py) | caller normalizes and validates IDs |
| Rate limiting | client-side per-tier limiter, never exceeds the interval (client.py) | caller throttles by hand |
| Retry / backoff | bounded, jittered retry on 429/503/timeout, honors Retry-After (client.py) | caller implements retry |
| Errors | typed exception hierarchy, branchable by caller (errors.py) | parse HTTP status strings |
| Output | chat-tuned Markdown or JSON per call (formatters.py) | raw JSON |
| Supply-chain provenance | SLSA + PEP 740 + CycloneDX SBOM per release (see above) | n/a |
| Citability | minted Zenodo DOI, MIT licensed | n/a |
# No cloning needed — runs directly from PyPI
uvx s2-mcp-server
claude mcp add semantic-scholar -- uvx s2-mcp-server
Add to %APPDATA%\Claude\claude_desktop_config.json:
{
"mcpServers": {
"semantic-scholar": {
"command": "uvx",
"args": ["s2-mcp-server"],
"env": {
"SEMANTIC_SCHOLAR_API_KEY": "your-key-here"
}
}
}
}
Add to ~/Library/Application Support/Claude/claude_desktop_config.json:
{
"mcpServers": {
"semantic-scholar": {
"command": "uvx",
"args": ["s2-mcp-server"],
"env": {
"SEMANTIC_SCHOLAR_API_KEY": "your-key-here"
}
}
}
}
pip install s2-mcp-server
# or
git clone https://github.com/smaniches/semantic-scholar-mcp.git
cd semantic-scholar-mcp && pip install -e .
docker pull ghcr.io/smaniches/semantic-scholar-mcp:latest
docker run -e SEMANTIC_SCHOLAR_API_KEY=your-key ghcr.io/smaniches/semantic-scholar-mcp
Note: Get a free API key at semanticscholar.org/product/api. Without a key, you get rate-limited public access (1 req/sec).
flowchart LR
Client["MCP client<br/>(Claude Desktop, Claude Code,<br/>Cursor, Cline, Continue, …)"]
subgraph Server ["s2-mcp-server (this package)"]
direction TB
FastMCP["FastMCP runtime<br/>(stdio transport, lifespan)"]
Tools["14 @mcp.tool functions<br/>(server.py)"]
Models["Pydantic input models<br/>+ field sets (models.py)"]
Validators["Paper-ID validator<br/>(validators.py)"]
Cache["TTL cache<br/>(cache.py)"]
Fmt["Markdown formatters<br/>(formatters.py)"]
HTTP["httpx client<br/>+ rate limit + retry/backoff<br/>(client.py)"]
Errors["Typed exceptions<br/>(errors.py)"]
Log["Structured JSON logger<br/>(logging_config.py)"]
end
S2Graph["Semantic Scholar<br/>Graph API"]
S2Recs["Semantic Scholar<br/>Recommendations API"]
Client <-- "stdio (JSON-RPC)" --> FastMCP
FastMCP --> Tools
Tools --> Models
Tools --> Validators
Tools --> Cache
Tools --> HTTP
Tools --> Fmt
HTTP --> Errors
HTTP --> Log
HTTP -- "GET / POST<br/>x-api-key" --> S2Graph
HTTP -- "GET / POST<br/>x-api-key" --> S2Recs
Module responsibilities (src/semantic_scholar_mcp/):
| Module | Responsibility |
|---|---|
server.py | FastMCP instance, 14 @mcp.tool registrations, lifespan, main() entry. Re-exports the helper surface for back-compat. |
client.py | Shared httpx.AsyncClient singleton, per-tier rate limiter (1 req/s public, 10 req/s keyed), retry loop with exponential backoff + jitter on 429/503/timeout, HTTP→typed-exception mapping. |
models.py | Pydantic input models per tool, ResponseFormat enum, the four tiered field-set constants (PAPER_SEARCH_FIELDS, …_LITE, PAPER_BULK_SEARCH_FIELDS, PAPER_DETAIL_FIELDS, AUTHOR_FIELDS). |
validators.py | Pre-flight paper-ID validation. Rejects NUL bytes, ?, #, path traversal; accepts the seven canonical ID formats. |
cache.py | In-memory TTL cache (5 min, 200 entries, oldest-first eviction) for paper/author lookups within a session. |
formatters.py | Markdown renderers for paper and author dicts, tuned for chat-surface readability. |
errors.py | SemanticScholarError hierarchy: AuthenticationError, RateLimitError, NotFoundError, ValidationError, ServerError. |
logging_config.py | One-JSON-per-line StructuredFormatter on stderr; safe to ship through any log aggregator. |
Design choices worth knowing
httpx.AsyncClient per process. Created lazily, closed in the FastMCP lifespan teardown. Amortizes connection setup; respects keep-alive limits.MAX_RETRIES = 3, base 1 s, capped at 30 s. Honors Retry-After when present.AuthenticationError vs RateLimitError vs NotFoundError instead of parsing strings.__version__ is derived from importlib.metadata.version("s2-mcp-server"), so bumping pyproject.toml is sufficient; release-please bumps the manifest, server.json (×2 paths), CITATION.cff, and .zenodo.json in lockstep on every release.You can provide your API key in two ways:
Environment Variable (recommended for persistent use):
export SEMANTIC_SCHOLAR_API_KEY="your-api-key-here"
Per-Request Parameter (overrides env var):
{
"api_key": "your-api-key-here"
}
Deprecated: per-request
api_keyis deprecated and will be removed in v2.0.0. Tool-call arguments may be visible in MCP transcripts, client logs, and the LLM's tool-call history. Use theSEMANTIC_SCHOLAR_API_KEYenvironment variable instead. See SECURITY.md for details.
Get a free API key at: https://www.semanticscholar.org/product/api
Add to your Claude Desktop config file:
Windows: %APPDATA%\Claude\claude_desktop_config.json
macOS: ~/Library/Application Support/Claude/claude_desktop_config.json
Linux: ~/.config/Claude/claude_desktop_config.json
{
"mcpServers": {
"semantic-scholar": {
"command": "python",
"args": ["-m", "semantic_scholar_mcp"],
"env": {
"SEMANTIC_SCHOLAR_API_KEY": "your-api-key-here"
}
}
}
}
Then restart Claude Desktop.
The server accepts the following paper identifier formats:
| Format | Pattern | Example |
|---|---|---|
| Semantic Scholar ID | 40-character hex | 649def34f8be52c8b66281af98ae884c09aef38b |
| DOI | DOI:xxx | DOI:10.1038/s41586-021-03819-2 |
| ArXiv | ARXIV:xxx | ARXIV:2106.15928 or ARXIV:2106.15928v2 |
| PubMed | PMID:xxx | PMID:32908142 |
| Corpus ID | CorpusId:xxx | CorpusId:215416146 |
| ACL | ACL:xxx | ACL:P19-1285 |
| URL | URL:xxx | URL:https://arxiv.org/abs/2106.15928 |
semantic_scholar_search_papersSearch for academic papers with advanced filters.
Parameters:
| Parameter | Type | Required | Description |
|---|---|---|---|
query | string | Yes | Search query (supports AND, OR, NOT operators and "phrase search") |
year | string | No | Year filter: "2024", "2020-2024", or "2020-" |
fields_of_study | string[] | No | Filter by fields: ["Computer Science", "Biology"] |
publication_types | string[] | No | Filter by type: ["Review", "JournalArticle"] |
open_access_only | boolean | No | Only return open access papers (default: false) |
min_citation_count | integer | No | Minimum citation count |
limit | integer | No | Max results 1-100 (default: 10) |
offset | integer | No | Pagination offset (default: 0) |
response_format | string | No | "markdown" or "json" (default: markdown) |
api_key | string | No | Override environment API key |
Example:
Search for "transformer attention mechanism" papers from 2023 with at least 100 citations
JSON Example:
{
"query": "transformer attention mechanism",
"year": "2023",
"min_citation_count": 100,
"fields_of_study": ["Computer Science"],
"limit": 20
}
semantic_scholar_get_paperGet detailed information about a specific paper.
Parameters:
| Parameter | Type | Required | Description |
|---|---|---|---|
paper_id | string | Yes | Paper ID in any supported format |
include_citations | boolean | No | Include citing papers (default: false) |
include_references | boolean | No | Include referenced papers (default: false) |
citations_limit | integer | No | Max citations to return 1-100 (default: 10) |
references_limit | integer | No | Max references to return 1-100 (default: 10) |
response_format | string | No | "markdown" or "json" (default: markdown) |
api_key | string | No | Override environment API key |
Example:
Get details for DOI:10.1038/s41586-021-03819-2 including its top 20 citations
JSON Example:
{
"paper_id": "DOI:10.1038/s41586-021-03819-2",
"include_citations": true,
"citations_limit": 20
}
semantic_scholar_search_authorsSearch for academic authors by name.
Parameters:
| Parameter | Type | Required | Description |
|---|---|---|---|
query | string | Yes | Author name to search |
limit | integer | No | Max results 1-100 (default: 10) |
offset | integer | No | Pagination offset (default: 0) |
response_format | string | No | "markdown" or "json" (default: markdown) |
api_key | string | No | Override environment API key |
Example:
Find author "Yoshua Bengio"
JSON Example:
{
"query": "Yoshua Bengio",
"limit": 5
}
semantic_scholar_get_authorGet author profile with publications.
Parameters:
| Parameter | Type | Required | Description |
|---|---|---|---|
author_id | string | Yes | Semantic Scholar author ID |
include_papers | boolean | No | Include publications (default: true) |
papers_limit | integer | No | Max papers to return 1-100 (default: 20) |
response_format | string | No | "markdown" or "json" (default: markdown) |
api_key | string | No | Override environment API key |
Example:
Get author profile for author ID 1741101 with their top 50 publications
JSON Example:
{
"author_id": "1741101",
"include_papers": true,
"papers_limit": 50
}
semantic_scholar_recommendationsGet AI-powered paper recommendations based on a seed paper.
Parameters:
| Parameter | Type | Required | Description |
|---|---|---|---|
paper_id | string | Yes | Seed paper ID in any supported format |
from_pool | string | No | Recommendation pool: "recent" (default) or "all-cs" |
limit | integer | No | Max recommendations 1-100 (default: 10) |
response_format | string | No | "markdown" or "json" (default: markdown) |
api_key | string | No | Override environment API key |
Example:
Get recommendations based on paper 649def34f8be52c8b66281af98ae884c09aef38b
JSON Example:
{
"paper_id": "ARXIV:1706.03762",
"limit": 15
}
semantic_scholar_bulk_papersRetrieve multiple papers in a single request (max 500).
Parameters:
| Parameter | Type | Required | Description |
|---|---|---|---|
paper_ids | string[] | Yes | List of paper IDs (max 500) |
response_format | string | No | "markdown" or "json" (default: json) |
api_key | string | No | Override environment API key |
Example:
Retrieve these papers: DOI:10.1038/nature12373, ARXIV:2106.15928, PMID:32908142
JSON Example:
{
"paper_ids": [
"DOI:10.1038/nature12373",
"ARXIV:2106.15928",
"PMID:32908142"
]
}
semantic_scholar_bulk_searchSearch papers with sorting and cursor-based pagination for large result sets.
Unlike search_papers, supports a sort order and returns a token for
paging through all results.
Parameters:
| Parameter | Type | Required | Description |
|---|---|---|---|
query | string | Yes | Search query |
sort | string | No | Sort order, e.g. "citationCount:desc", "publicationDate:asc" |
token | string | No | Continuation token from a previous bulk_search response |
year | string | No | Year filter: "2024", "2020-2024", "2020-" |
fields_of_study | string[] | No | Filter by fields: ["Computer Science"] |
publication_types | string[] | No | Filter by type: ["Review", "JournalArticle"] |
min_citation_count | integer | No | Minimum citation count |
limit | integer | No | Max results per page 1-1000 (default: 100) |
response_format | string | No | "markdown" or "json" (default: markdown) |
api_key | string | No | Override environment API key |
JSON Example:
{
"query": "graph neural networks",
"sort": "citationCount:desc",
"year": "2020-2024",
"limit": 100
}
Returns: total result count, the page of papers, and a token for the
next page (when more results exist).
semantic_scholar_export_citationExport a citation for a paper in BibTeX format.
Parameters:
| Parameter | Type | Required | Description |
|---|---|---|---|
paper_id | string | Yes | Paper ID in any supported format |
format | string | No | Citation format (currently only "bibtex") |
api_key | string | No | Override environment API key |
JSON Example:
{
"paper_id": "DOI:10.1038/s41586-021-03819-2",
"format": "bibtex"
}
Returns: the BibTeX string for the requested paper.
semantic_scholar_match_paperFind the single best paper matching a title string. Returns a numeric
matchScore alongside the matched paper.
Parameters:
| Parameter | Type | Required | Description |
|---|---|---|---|
query | string | Yes | Paper title to match (1-500 chars) |
response_format | string | No | "markdown" or "json" (default: markdown) |
api_key | string | No | Override environment API key |
JSON Example:
{
"query": "Attention Is All You Need"
}
Returns: the best-matching paper plus its matchScore, or "No matching
paper found." if no match.
semantic_scholar_paper_authorsGet full author profiles for a paper's authors (richer than the abbreviated
author list returned by get_paper).
Parameters:
| Parameter | Type | Required | Description |
|---|---|---|---|
paper_id | string | Yes | Paper ID in any supported format |
limit | integer | No | Max authors to return 1-1000 (default: 100) |
response_format | string | No | "markdown" or "json" (default: markdown) |
api_key | string | No | Override environment API key |
JSON Example:
{
"paper_id": "ARXIV:1706.03762",
"limit": 25
}
Returns: the list of full author records for the paper.
semantic_scholar_author_batchRetrieve multiple authors in a single request (max 1000).
Parameters:
| Parameter | Type | Required | Description |
|---|---|---|---|
author_ids | string[] | Yes | List of author IDs (1-1000) |
response_format | string | No | "markdown" or "json" (default: json) |
api_key | string | No | Override environment API key |
JSON Example:
{
"author_ids": ["1741101", "40348417", "144749327"]
}
Returns: counts of requested / retrieved, the retrieved author
records, and a not_found list of IDs the API did not return.
semantic_scholar_multi_recommendGet recommendations using multiple positive (and optional negative) example papers.
Parameters:
| Parameter | Type | Required | Description |
|---|---|---|---|
positive_paper_ids | string[] | Yes | Papers to find similar results for (1-100) |
negative_paper_ids | string[] | No | Papers to dissimilate from (0-100) |
limit | integer | No | Max recommendations 1-500 (default: 10) |
response_format | string | No | "markdown" or "json" (default: markdown) |
api_key | string | No | Override environment API key |
JSON Example:
{
"positive_paper_ids": ["ARXIV:1706.03762", "ARXIV:1810.04805"],
"negative_paper_ids": ["DOI:10.1038/nature14539"],
"limit": 20
}
Returns: the recommended papers plus an echo of the positive/negative seeds used.
semantic_scholar_snippet_searchSearch within paper full text and return text snippets with surrounding context. Heavily rate-limited without an API key.
Parameters:
| Parameter | Type | Required | Description |
|---|---|---|---|
query | string | Yes | Search query for paper text (1-500 chars) |
paper_ids | string[] | No | Limit search to specific papers (max 100) |
year | string | No | Year filter: "2024", "2020-2024", "2020-" |
fields_of_study | string[] | No | Filter by fields: ["Computer Science"] |
min_citation_count | integer | No | Minimum citation count |
limit | integer | No | Max results 1-100 (default: 10) |
response_format | string | No | "markdown" or "json" (default: markdown) |
api_key | string | No | Override environment API key |
JSON Example:
{
"query": "scaling laws for language models",
"year": "2022-2024",
"limit": 20
}
Returns: matching snippets, each with the source paper title, section, and a short text excerpt.
semantic_scholar_statusCheck server health and API connectivity status.
Parameters: None
Example:
Check Semantic Scholar API status
Response:
{
"server": "semantic-scholar-mcp",
"version": "<current package version>",
"api_key_configured": true,
"rate_tier": "authenticated (10 req/sec)",
"timestamp": "2026-04-06T12:00:00.000000+00:00",
"api_reachable": true,
"rate_limited": false,
"retry_after": null
}
| Tier | Requests/Second | How to Get |
|---|---|---|
| No API Key | 1 req/sec | Default |
| API Key | 10 req/sec | Sign up (free) |
| Academic Partner | 10-100 req/sec | Apply via S2 |
Note: The client-side rate limiter enforces the intervals above. The upstream Semantic Scholar API may impose stricter limits during high-traffic periods.
The server automatically handles rate limiting with:
# Clone
git clone https://github.com/smaniches/semantic-scholar-mcp.git
cd semantic-scholar-mcp
# Install dev dependencies
pip install -e ".[dev]"
# Run tests
pytest
# Run tests with coverage
pytest --cov=src/semantic_scholar_mcp --cov-report=term-missing
# Type checking
mypy src/
API keys are never persisted to disk by the server. The MCP server runs
locally on your machine; when it makes authenticated requests, the key is
sent only to api.semanticscholar.org over HTTPS as the x-api-key
header. No telemetry is sent to any third party.
Prefer the SEMANTIC_SCHOLAR_API_KEY environment variable over the
per-request api_key tool parameter. The per-request parameter is
deprecated (removal planned for v2.0.0) because tool-call arguments may
be visible in MCP transcripts and client logs. See SECURITY.md
for vulnerability reporting and the known-limitations list.
alphafold-sovereign-mcp — Model Context Protocol server for AlphaFold DB and 13 other biomedical data sources, with a local SQLite knowledge graph (pip install --pre alphafold-sovereign-mcp).uniprot-mcp — Model Context Protocol server for UniProt Swiss-Prot and TrEMBL (pip install uniprot-mcp-server).MIT License - see LICENSE file.
Santiago Maniches
Contributions welcome! Please read our Contributing Guidelines.
Be the first to review this server!
by Toleno · Developer Tools
Toleno Network MCP Server — Manage your Toleno mining account with Claude AI using natural language.
by mcp-marketplace · Developer Tools
Create, build, and publish Python MCP servers to PyPI — conversationally.
by Microsoft · Content & Media
Convert files (PDF, Word, Excel, images, audio) to Markdown for LLM consumption