Server data from the Official MCP Registry
Find & fetch research datasets across Zenodo, DataCite, NCBI omics, and literature.
Find & fetch research datasets across Zenodo, DataCite, NCBI omics, and literature.
This MCP server provides multi-source research data aggregation with generally sound architecture and appropriate permission scoping. Authentication is optional (NCBI API key improves rate limits but is not required), and the server properly handles per-source errors without silent failures. However, there are code quality concerns around broad exception handling, insufficient input validation on user-supplied query strings passed to external APIs, and the lack of rate limiting on the client side despite documented upstream limits. These issues pose moderate risks in production but do not indicate malicious intent. Supply chain analysis found 3 known vulnerabilities in dependencies (0 critical, 3 high severity). Package verification found 1 issue.
5 files analyzed Β· 11 issues found
Security scores are indicators to help you make informed decisions, not guarantees. Always review permissions before connecting any MCP server.
This plugin requests these system permissions. Most are normal for its category.
Set these up before or after installing:
Environment variable: NCBI_API_KEY
Add this to your MCP configuration file:
{
"mcpServers": {
"io-github-musharna-data-aggregator-mcp": {
"env": {
"NCBI_API_KEY": "your-ncbi-api-key-here"
},
"args": [
"data-aggregator-mcp"
],
"command": "uvx"
}
}
}From the project's GitHub README.
One MCP server to find and fetch research data across archives, omics registries, and literature β behind a single normalized model.
search one query across Zenodo, DataCite (Dryad / Figshare / Dataverse /
OSF / Mendeley), NCBI omics (GEO / SRA / BioProject), and literature
(PubMed / OpenAIRE) β deduplicated, normalized, and cross-linked. resolve any
hit to its file manifest, citation, and the data it points at. fetch it to
disk with checksum verification.
mcp-name: io.github.musharna/data-aggregator-mcp
Most data MCPs wrap a single source. This one unifies them behind four tools
and one DataResource model, so an agent searches once and gets back comparable
records:
organism="Orobanche aegyptiaca" also matches
Phelipanche aegyptiaca (NCBI Taxonomy), so a species rename doesn't cost you
results.resolve.Run with no install:
uvx data-aggregator-mcp
Register with Claude Code:
claude mcp add data-aggregator -- uvx data-aggregator-mcp
A typical agent flow:
search("drought stress RNA-seq", organism="Sorghum bicolor")
β [ geo:GSE..., sra:SRX..., zenodo:..., pubmed:... ] # deduped, taxa-normalized
resolve("sra:SRX079566")
β DataResource{ files: [ENA FASTQ urlsβ¦], access: "open", taxa: [...] }
fetch("sra:SRX079566", dest="./data")
β ["./data/SRX079566_1.fastq.gz", β¦] # md5-verified
pip install data-aggregator-mcp
data-aggregator-mcp # or: python -m data_aggregator_mcp
Add to a client's MCP config (e.g. Claude Desktop claude_desktop_config.json):
{
"mcpServers": {
"data-aggregator": {
"command": "uvx",
"args": ["data-aggregator-mcp"],
"env": { "NCBI_API_KEY": "your-optional-key" }
}
}
}
| Source | Discover | Fetch | Checksum |
|---|---|---|---|
| Zenodo | β | β | md5 |
| DataCite β Figshare | β | β | md5 |
| DataCite β Dataverse | β | β | md5 |
| DataCite β OSF | β | β | md5 |
| DataCite β Dryad | β | manifest onlyΒΉ | sha-256 (listed) |
| DataCite β Mendeley & others | β | β | β |
| NCBI SRA | β | β (ENA FASTQ) | md5 |
| NCBI GEO | β | β
(suppl/) | noneΒ² |
| NCBI BioProject | β | β SRA links | β |
| PubMed / OpenAIRE | β | β (OA full text) | noneΒ² |
ΒΉ Dryad downloads are token / bot-challenge gated, so fetch fails loud;
resolve still lists the files.
Β² No upstream checksum β fetch verifies content-type instead (rejects an HTML
page served in place of a binary).
search(query, size?, sources?, organism?)Fan out across all wired sources in parallel and return compact DataResource
records, deduped by DOI. Per-source failures land in errors{} β never silently
dropped.
organism β expand the query with NCBI-Taxonomy synonyms; the expansion is
echoed in taxon_expansion, and results carry normalized taxa[]
({taxid, name}) plus a described_in link to plant-genomics-mcp for plant
taxa.sources β restrict the fan-out, e.g. ["omics"].size β max results (1β50).resolve(id)Full record + files manifest. Routes by id shape β zenodo:7654321, a bare DOI,
datacite:10.5061/dryad.x, an omics id (sra:SRX079566, geo:GSE332789,
bioproject:PRJNA1468572), or a literature id (pubmed:34320281,
openaire:<id>). Attaches, where available:
files[] β ENA FASTQ manifest (SRA), GEO suppl/, or the host repo's
native manifest (Figshare / Dataverse / OSF / Dryad).links[] β paper β data: pubmed: β sra: / geo: / bioproject: (NCBI
elink); openaire: β datacite: (ScholeXplorer Scholix).access / license β normalized status
(open / embargoed / restricted / closed / unknown) and license where
the source exposes it.identifiers β normalized {pmid, pmcid, doi}, plus an open-access
full-text FileEntry (EuropePMC XML, or an Unpaywall PDF fallback) for papers.citation β pass cite=<format>: bibtex, ris, csl-json, or any CSL
style name (apa, mla, vancouver, β¦). DOI records use content
negotiation; others render CSL-JSON from metadata. Off by default; failures
degrade quietly.fetch(id, dest?, files?, max_bytes?, force?, extract?)Download files to disk and return their paths. Streams under a max_bytes guard
(force to override) with md5 verification wherever a checksum exists.
files β restrict to a subset of the resolved manifest.extract β unpack downloaded zip / tar archives in place, guarded against
path traversal and runaway extracted size. Off by default.suppl/, literature full text) get a content-type
sniff that fails loud if a declared binary is actually an HTML page.FetchNotSupportedError.list_sources()Wired sources with their capabilities β layer, kinds, supported filters, fetchability, id examples, auth, and rate limits.
Both optional, set via environment variables:
NCBI_API_KEY β raises the NCBI E-utilities rate limit (3 β 10 req/s) used by
the omics, literature, and taxonomy lookups.UNPAYWALL_EMAIL β enables the Unpaywall fallback leg of literature full-text
retrieval (the EuropePMC leg works without it).uv venv && uv pip install -e ".[dev]"
uv run pytest -q
uv run ruff check src tests
DATA_AGGREGATOR_MCP_LIVE=1 uv run pytest -k live -q # real-API probes
The README demo (examples/assets/demo.svg) is recorded network-free from
examples/_demo_stdio.py β see the header of that file to re-record.
MIT β see LICENSE.
Be the first to review this server!
by Modelcontextprotocol Β· Developer Tools
Read, search, and manipulate Git repositories programmatically
by Toleno Β· Developer Tools
Toleno Network MCP Server β Manage your Toleno mining account with Claude AI using natural language.
by mcp-marketplace Β· Developer Tools
Create, build, and publish Python MCP servers to PyPI β conversationally.