Server data from the Official MCP Registry
Verify multi-hop relational claims before an agent asserts them — zero tokens, with proof paths.
Verify multi-hop relational claims before an agent asserts them — zero tokens, with proof paths.
Valid MCP server (1 strong, 3 medium validity signals). No known CVEs in dependencies. Package registry verified. Imported from the Official MCP Registry.
6 files analyzed · 1 issue found
Security scores are indicators to help you make informed decisions, not guarantees. Always review permissions before connecting any MCP server.
This plugin requests these system permissions. Most are normal for its category.
Set these up before or after installing:
Add this to your MCP configuration file:
{
"mcpServers": {
"io-github-alexaquarius-grounded-reasoning": {
"args": [
"grounded-reasoning"
],
"command": "uvx"
}
}
}From the project's GitHub README.
TL;DR. LLMs hallucinate on multi-hop relational reasoning. This is a relation-algebra verifier an agent calls to check a claim before asserting it: zero model tokens, precision-guaranteed (accepts a claim iff a grounded proof path exists), language-agnostic, and provider-agnostic. Plugs in as a library, a function-calling tool, or an MCP server. Validated on real LLMs (DeepSeek et al.) and the public CLUTRR benchmark. See docs/integration.md.
📄 Full paper: PAPER.md · Integration guide: docs/integration.md · Try it in 30 seconds: quickstart notebook
Đọc bằng tiếng Việt: README.vi.md
LLMs are solid on one-hop facts but collapse on composition — chaining several correct facts into a multi-step conclusion. On CLUTRR (kinship reasoning), DeepSeek's accuracy falls off with depth, while a grounded operator-composition solver holds ~100% flat — at zero tokens:
acc
100% ●─────●─────●─────●─────●─────●─────● ● Grounded solver (algebra, 0 tokens)
90% |
80% ○
70% | ╲
60% | ╲
50% | ╲
40% | ○ ○ ○ DeepSeek (LLM)
30% | ╲ ╱ ╲
20% | ○─────○ ╲
10% | ○─────○
0% +──┴─────┴─────┴─────┴─────┴─────┴─────┴─
hop 2 3 4 5 6 7 8 (composition steps)
hop: 2 3 4 5 6 7 8
DeepSeek: 83% 42% 25% 25% 42% 17% 8%
Solver: 100% 100% 100% 100% 100% 100% 100%
(CLUTRR/v1 gen_train234_test2to10, clean-chain, n=12/hop; full test set n=635: solver
covers 99.5%, accuracy 99.2%. src/experiments/clutrr_eval.py.)
Is: a guaranteed reasoning-verification layer built on relation operator algebra.
Is not: an "unprecedented breakthrough." The Katz index, the Neumann series, graph reachability, and neuro-symbolic grounding are all classical math and technique. The contribution here is unification, a measured guarantee, and benchmark numbers — not a new primitive. The guard needs a relation graph (supplied, or extracted from LLM facts); flexibility is bounded (see PAPER §5).
| Approach | Extra tokens | Guarantee | Needs an external KB |
|---|---|---|---|
| LLM self-verification (2nd call) | +110% | none (measured 34% precision) | no |
| Self-consistency / majority vote | multiplies with sample count | none, statistical only | no |
| RAG / external KG grounding | varies | only as good as retrieval | yes |
| This guard | +0 | precision = 1.0 (Theorem G) | no |
| This guard, self-grounded (SGDC) | +0 | precision = 1.0 given sound atomic facts (Theorem I) | no |
| This guard, conformal | +0 | coverage ≥ 1−α, distribution-free (Theorem K) | no |
The reasoning core rests on a single unification (numerically verified, zero error):
| View | Theorem | Content |
|---|---|---|
| Fuzzy diffusion inference | F | conf(a→b) = Σ αᵏ(Pᵏ)[a,b], calibrated + grounded |
| Relation operator algebra | G | composition = operator product, transitive closure = Σ powers |
| Spectral analysis (Katz) | H | engine.infer = resolvent (I−αP)⁻¹−I (matches 0.0 error) |
⟹ fuzzy inference is spectral analysis of the relation operator. src/reasoning/.
Four further theorems extend this core: I (two-sided precision/recall guarantee for a self-grounded, no-external-KB variant), J (closure-learning completeness, validated on CLUTRR), K (conformal reasoning — distribution-free coverage under a noisy relation graph, including one extracted by an LLM from raw text), and L (Horn forward-chaining, generalizing transitive closure to conjunctive rules). All seven are stated, proved, and numerically verified in PAPER.md.
| Experiment | Result |
|---|---|
| Hallucination guard (kinship) | precision 33% → 100%, catches 94/94, 0 false rejects |
| Guard token cost | +0 tokens (vs. LLM self-verify: +110% tokens, 34% precision) |
| SGDC (self-grounded, no external KB) | precision 78% → 100% from internal consistency alone |
| CLUTRR (public benchmark) | solver ~100% at every hop vs. DeepSeek 83%→8% |
| Hard passage (9-step chain) | DeepSeek fabricates 2/10 (wrong direction); grounded system 10/10, with proofs — examples/hallucination_demo.py |
The guard/solver needs a clean graph. But if you let an LLM extract relations from natural-language text, the graph is noisy (missing/spurious edges). Conformal Reasoning (Theorem K) fixes exactly that: use operator confidence as a score, calibrate a threshold ⟹ distribution-free coverage ≥ 1−α, even on a noisy graph.
End-to-end demo: DeepSeek extracts an "is a" graph from text → conformal runs on that extracted graph (ground truth is used only for scoring):
| Text | LLM extraction (P / R) | Coverage (target ≥90%) | Efficiency (FPR) |
|---|---|---|---|
| Easy | 100% / 99.7% | 91.3% | 0.0 |
| Hard (nested clauses + near-miss distractors) | 99.5% / 68.5% | 93.0% | 0.77 |
The LLM's extraction drops 31% of the edges (a genuinely noisy graph) → the coverage guarantee still holds (93% ≥ 90%), only efficiency degrades. Validity always holds; efficiency scales with graph quality.
⟹ A path to guaranteed reasoning over natural-language relations — where the hard
guard can't reach. src/experiments/conformal_llm_eval.py.
pip install grounded-reasoning
# or, for development (tests + lint):
git clone https://github.com/ALEXaquarius/grounded-reasoning
cd grounded-reasoning && pip install -e ".[dev]"
pytest tests/ # every theorem + offline-locked logic, no network needed
# Use it right now (no LLM/network needed):
python -c "from grounded_reasoning import GroundedReasoner as G; r=G(); r.add_facts([('a','p','b'),('b','p','c')]); print(r.verify('a','c',via='p'))"
# Real-LLM experiments (need a key — read from an env var, NEVER hardcoded):
export DEEPSEEK_API_KEY=sk-... # bring your own; .env is gitignored
python -m src.experiments.guard_llm_eval # hallucination guard
python -m src.experiments.self_grounded_eval # SGDC
python -m src.experiments.clutrr_eval # public CLUTRR benchmark
python -m src.experiments.conformal_llm_eval # end-to-end conformal (LLM-extracted graph)
src/agent/)A relation-reasoning verifier for agents: check a multi-hop claim before asserting it — zero model tokens, precision guaranteed (accepts iff a grounded proof path exists).
from grounded_reasoning import GroundedReasoner
gr = GroundedReasoner()
gr.add_facts([("alice","parent","bob"),("bob","parent","carol")])
gr.verify("alice","carol", via="parent") # Verdict(grounded=True, proof=['alice','bob','carol'])
gr.verify("alice","zed", via="parent") # Verdict(grounded=False, proof=None) ← hallucination blocked
Three integration paths (details: docs/integration.md):
GroundedReasoner.verify / filter_claims / contradictions.TOOL_SPEC (Anthropic) / openai_tool_spec() (OpenAI) + run_tool — a stateless verify_relation tool.python -m src.agent.mcp_server — plugs into Claude or any MCP-compatible agent.Multi-provider (not just DeepSeek): LLMClient(provider=...) for DeepSeek / OpenAI /
Groq / OpenRouter / Together / Mistral / Ollama (local) — all OpenAI-compatible, switch
providers without changing code. Multilingual: entities/relations are opaque
Unicode strings ⟹ works with any language (cha, 父, والد…) with zero configuration.
A real function-calling demo (agent verifies itself, blocks hallucination):
python -m src.experiments.agent_demo. When the graph is noisy (relations
extracted by an LLM from text), use ConformalReasoner for a coverage ≥1−α
guarantee instead of hard precision.
| Path | Content |
|---|---|
grounded_reasoning/ | Public package — GroundedReasoner, verify_relation, TOOL_SPEC, ConformalReasoner, LLMClient |
src/agent/{verifier,tool,mcp_server}.py | Public API implementation — HallucinationGuard, function-calling tool, MCP server |
src/reasoning/abstract_inference.py | FuzzyInferenceEngine, TypedInferenceEngine, HallucinationGuard (Theorem F) |
src/reasoning/operator_algebra.py | Relation operator algebra (Theorem G) |
src/reasoning/relation_spectrum.py | Spectrum, nilpotency, Katz resolvent (Theorem H) |
src/reasoning/conformal_reasoning.py | Conformal — coverage guarantee under noise (Theorem K) |
src/reasoning/composition_algebra.py | Composition-table learning, validated on CLUTRR (Theorem J) |
src/reasoning/horn.py | Horn forward-chaining, least-model semantics (Theorem L) |
src/reasoning/llm_client.py | Provider-agnostic LLM client (key read from an env var) |
src/theory/theorems.py | Seven theorems (F–L) with numerical verification |
src/experiments/{guard_llm,self_grounded,nl_ontology,guard_cost,clutrr,conformal_llm,inference}_eval.py | Real-LLM and benchmark experiments backing every claim above |
examples/hallucination_demo.py | End-to-end function-calling demo |
examples/quickstart.ipynb | Runnable tour of the library (offline, Colab-ready) |
This project began as an attempt to invent an embedding-free retrieval algorithm that could compete with dense/RAG retrieval. That research question reached a rigorous, fully honest negative conclusion (ties BM25, loses significantly to dense embeddings — with a proof of why). The same mathematical toolkit — operator algebra, spectral analysis — turned out to have real, measurable value on a different problem: guaranteeing multi-hop relational reasoning. This repository ships only that validated, tested reasoning system; the full retrieval research trail (including every failed attempt, honestly recorded) lives in a separate research repository and is not part of this package. See PAPER.md §1 for the full framing.
Principle: proof before code, formal definitions, falsifiability, and honest reporting of negative results — see CONTRIBUTING.md.
Be the first to review this server!
by Modelcontextprotocol · Developer Tools
Web content fetching and conversion for efficient LLM usage
by Modelcontextprotocol · Developer Tools
Read, search, and manipulate Git repositories programmatically
by Toleno · Developer Tools
Toleno Network MCP Server — Manage your Toleno mining account with Claude AI using natural language.