Server data from the Official MCP Registry
Tracks conversation health in real time — drift, desync, and causal collapse — for any AI agent.
Tracks conversation health in real time — drift, desync, and causal collapse — for any AI agent.
Remote endpoints: sse: https://horizon.leocelis.com/sse
Valid MCP server (2 strong, 1 medium validity signals). No known CVEs in dependencies. Imported from the Official MCP Registry.
Endpoint verified · Requires authentication · 1 issue found
Security scores are indicators to help you make informed decisions, not guarantees. Always review permissions before connecting any MCP server.
This plugin requests these system permissions. Most are normal for its category.
Remote Plugin
No local installation needed. Your AI client connects to the remote endpoint directly.
Add this to your MCP configuration to connect:
{
"mcpServers": {
"io-github-leocelis-horizon-fidelity-monitor": {
"url": "https://horizon.leocelis.com/sse"
}
}
}From the project's GitHub README.
"Quality is not a model property — it is a conversation property."
Horizon is a real-time conversation health monitor for AI agents. It tracks the structural dynamics of multi-turn conversations — semantic drift, information gain, ontological gap width, temporal desynchronisation, circadian cognitive load, conversation velocity, and causal reachability — dimensions that every LLM is architecturally blind to.
Based on the Trans-Horizon Communication Protocol (THCP) research. Three independent no-go theorems prove why no LLM can self-monitor these properties from the inside.
Multi-turn AI agents lose accuracy. Our market research puts the number at 39% accuracy degradation after 5 turns — a structural property of conversations that standard observability tools (LangSmith, RAGAS, DeepEval) cannot see because they measure responses, not conversations.
Horizon was built to close that gap. In A/B experiments across four scenarios, adding Horizon monitoring produced +15.7% composite quality lift and 87% fewer hallucination events when Horizon events triggered interventions. The math is grounded in 173 academic references across information theory, cognitive science, category theory, and Lorentzian geometry.
docs/research/market-demand.mddocs/content/naming-the-category-conversation-dynamics-monitoring.mddocs/content/why-every-production-agent-needs-conversation-dynamics-monitoring.mdThree paths — pick the one that fits your workflow:
The fastest way to add Horizon to any Cursor or Claude Desktop workspace. No Python required.
Request an alpha key → GitHub Discussions, then add to ~/.cursor/mcp.json:
{
"mcpServers": {
"horizon": {
"url": "https://horizon.leocelis.com/sse",
"headers": { "Authorization": "Bearer YOUR_KEY_HERE" }
}
}
}
In Claude Desktop (~/Library/Application Support/Claude/claude_desktop_config.json):
{
"mcpServers": {
"horizon": {
"url": "https://horizon.leocelis.com/sse",
"headers": { "Authorization": "Bearer YOUR_KEY_HERE" }
}
}
}
That's it. Reload your MCP client and three tools appear: new_conversation, process_turn, configure_session.
Alpha access: Horizon's hosted endpoint is in private alpha. Keys are distributed to agent developers who want to monitor real projects. Open a GitHub Discussion to request one.
pip install horizon-monitor
Verify your install (exercises the full pipeline on 5 canonical scenarios, ~25s):
horizon-validate
pip install 'horizon-monitor[mcp]'
horizon serve # stdio — for Cursor, Claude Desktop
horizon serve --transport sse --port 3847 # SSE — for web/team deployments
Add to ~/.cursor/mcp.json:
{
"mcpServers": {
"horizon": { "command": "horizon", "args": ["serve"] }
}
}
Full Cursor and Claude Desktop setup guides: docs/integrations/
Standard observability tools evaluate individual response quality. Horizon evaluates conversation quality — a structurally different problem:
| Tool | What it sees | What it misses |
|---|---|---|
| LangSmith, Braintrust | Latency, cost, per-response quality | Drift across turns |
| RAGAS, DeepEval | Faithfulness, relevance per turn | Temporal desync, cognitive load |
| Human raters | Subjective quality | Systematic structural decay |
| Horizon | Conversation dynamics | Intentionally nothing |
Horizon does not replace per-response quality tools. It adds the dimension they all lack.
from horizon import FidelityMonitor
from datetime import datetime, timezone
monitor = FidelityMonitor()
session_id = monitor.new_conversation(metadata={"domain": "technical"})
result = monitor.process_turn(
session_id,
human_message="How does Python handle memory management?",
agent_response="Python uses reference counting and a cyclic garbage collector...",
timestamp=datetime.now(timezone.utc).isoformat(),
)
print(f"Fidelity: {result.fidelity_score:.2f}")
print(f"Health: {result.health_status}")
print(f"Circadian factor: {result.circadian_factor:.2f}")
print(f"Causal horizon: {result.reachable_turns} reachable turns")
for event in result.events:
print(f" Event: {event.type} (confidence={event.confidence:.2f})")
from openai import OpenAI
from horizon import FidelityMonitor
monitor = FidelityMonitor()
session_id = monitor.new_conversation()
client = monitor.wrap(OpenAI(), session_id)
response = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": "Tell me about quantum computing."}]
)
traj = monitor.get_trajectory(session_id)
print(f"Fidelity: {traj.current_fidelity:.2f} T*: {traj.estimated_t_star}")
monitor.wrap() accepts custom timestamp and context providers for testing and replay.
from anthropic import Anthropic
from horizon import FidelityMonitor
monitor = FidelityMonitor()
session_id = monitor.new_conversation()
client = monitor.wrap(Anthropic(), session_id)
response = client.messages.create(
model="claude-3-5-sonnet-20241022",
max_tokens=1024,
messages=[{"role": "user", "content": "Explain RLHF."}]
)
from langchain_openai import ChatOpenAI
from horizon import FidelityMonitor
from horizon.integrations.langchain import HorizonCallback
monitor = FidelityMonitor()
session_id = monitor.new_conversation()
callback = HorizonCallback(monitor, session_id)
llm = ChatOpenAI(callbacks=[callback])
llm.invoke("Explain the CAP theorem.")
print(f"Fidelity: {callback.last_result.fidelity_score:.2f}")
from openai_agents import Agent, Runner
from horizon import FidelityMonitor
monitor = FidelityMonitor()
session_id = monitor.new_conversation()
agent = Agent(name="assistant", model="gpt-4o-mini", instructions="You are helpful.")
for user_message in conversation:
result = Runner.run_sync(agent, user_message)
monitor.process_turn(session_id, human_message=user_message,
agent_response=result.final_output, timestamp=datetime.now(timezone.utc).isoformat())
Every process_turn() returns a TurnResult with 29 fields across five signal families:
| Signal | Description |
|---|---|
fidelity_score | Composite conversation health [0, 1] |
igt_value | Information Gain per Turn — semantic novelty |
divergence_score | Jensen-Shannon proxy for intent/response gap |
twr_value | Token Waste Ratio — semantic redundancy |
consistency_score | Bipredictability — structural coherence |
epsilon_t | Estimated ontological gap width [0, 1] |
health_status | healthy / degrading / critical / converged |
conversation_mode | execute / explore / refine / learn (auto-detected) |
timestamp)| Signal | Description |
|---|---|
gap_seconds | Wall-clock gap since last turn |
estimated_retention | Human memory retention (Ebbinghaus half-life model) |
circadian_factor | Human cognitive capacity at this hour [0.3, 1.0] |
temporal_asymmetry | Penalty for temporal desync |
resumption_cost | none / low / medium / high / extreme |
temporal_references | Resolved deictic expressions ("yesterday", "last week") |
timestamp + turn ≥ 2)| Signal | Description |
|---|---|
conversation_velocity | Semantic displacement / proper time |
conversation_acceleration | Velocity delta (requires turn ≥ 3) |
timestamp + turn ≥ 2)| Signal | Description |
|---|---|
spacetime_interval | ds² with Minkowski-like signature (−,+,+,+) |
interval_class | timelike / spacelike / lightlike |
timestamp)| Signal | Description |
|---|---|
reachable_turns | Turns still inside the causal light cone |
reachable_fraction | Fraction of history still causally reachable |
client_context)| Signal | Description |
|---|---|
location_class | home / office / mobile_transit / unknown |
spatial_constraint | Attention budget, screen capacity, max response length |
spatial_frame_shift | Context switch magnitude |
All events default to observe mode (emitted, not acted on). Enable active mode via configure() once your event achieves ≥ 0.7 precision/recall on your domain.
| Event | Fires when |
|---|---|
checkpoint.clarification | D_JS above clarification threshold |
checkpoint.comprehension | Consistency drops below threshold |
alert.drift | Fidelity declining for drift_window consecutive turns |
alert.contradiction | Bipredictability below consistency threshold |
alert.verbosity | Token Waste Ratio above verbosity threshold |
signal.convergence | IGT trend consistently low — natural endpoint approaching |
signal.optimal_length | T* (estimated optimal length) reached |
signal.horizon_widening | IGT trend strongly positive — conversation expanding |
signal.session_reset | Large temporal gap with low retention |
signal.temporal_desync | Gap + retention drop below desync threshold |
signal.broken_reference | Reachable fraction drops below broken-reference threshold |
signal.frame_shift | Spatial constraint shifts significantly |
signal.pace_shift | Conversation acceleration above pace threshold |
signal.light_cone_collapse | Reachable fraction below light-cone threshold |
# Per-session override
monitor.configure(
session_id=session_id,
clarification_threshold=0.25, # tighter D_JS gate
event_modes={"alert.drift": "active"}, # activate one event
)
# Compound weight override
monitor.configure(
fidelity_weights={"alpha": 0.35, "lambda_r": 0.12, "lambda_i": 0.28, "beta": 0.25},
temporal_weights={"gamma": 0.08, "delta": 0.04},
spacetime_coefficients={"alpha": 1.0, "beta": 1.0, "gamma": 0.8, "delta_st": 0.5},
)
# JSON
result = monitor.export_to(session_id, target="json")
# LangSmith / Langfuse / OpenTelemetry / Arize
result = monitor.export_to(session_id, target="langsmith",
connection={"api_key": "ls__..."})
pip install horizon-monitor[langsmith] # or langfuse, otel, arize
Input: plain strings (human_message, agent_response, optional timestamp, optional client_context)
Core pipeline (< 50ms on CPU):
1. Embed both turns (local sentence-transformers, lazy-loaded)
2–6. IGT · D_JS · TWR · Bipredictability · Epsilon
7. Temporal signals — gap, retention, circadian, deictic
8. Fidelity dynamics — composite score
9. Health classification
10. Pace signals — velocity, acceleration
11. Spacetime interval — ds² and interval class
12. Causal reachability — light-cone membership
13. Spatial signals — device, location, frame shift
14. Mode detection — auto-classify conversation type
15. Event evaluation — 14 threshold checks
16. Optional: SQLite persistence
Output: TurnResult dataclass (29 fields)
Design constraints (all test-enforced):
All four IVD validation gates pass on a 5,602-record labelled corpus:
| Gate | Constraint | v0.2.0 |
|---|---|---|
| V1 — proxy correlation | per-conv ρ ≥ 0.6, per-turn ρ ≥ 0.5 | 0.685 / 0.659 |
| V2 — per-event P/R | every event P ≥ 0.7 AND R ≥ 0.7 | all 16 events ≥ 0.70 / 0.70 |
| V3 — beats heuristics | rho lift > 25%, structural P ≥ 0.6 | +202.4% lift, P=R=1.00 |
| V5 — cross-domain | per-turn ρ ≥ 0.4 AND per-conv ρ ≥ 0.48 | min 0.517 / 0.718 |
Cross-embedding stability: ρ_conv spread 0.026, ρ_turn spread 0.018 across three sentence-transformer backends (22M / 33M / 110M params). The fidelity signal lives in conversational structure, not in the embedding manifold.
Full evidence pack: docs/reviews/V0_2_0_EVIDENCE.md
cd deploy/docker
docker compose up
Horizon serves the MCP API via SSE. Point .cursor/mcp.json to http://localhost:3847/sse. The Dockerfile pre-caches the all-MiniLM-L6-v2 weights at build time — zero cold start.
The official hosted endpoint is live at https://horizon.leocelis.com. It runs on DigitalOcean App Platform (single instance, Redis-backed session resumability) and requires a Bearer token. See Path 1 above.
git clone https://github.com/leocelis/horizon.git
cd horizon
python -m venv .venv && source .venv/bin/activate
pip install -e ".[dev]"
pytest tests/ -v # full suite
pytest tests/unit tests/integration tests/e2e -v # fast path (~6 min)
ruff check src/ tests/
black --check src/ tests/
horizon/
├── src/horizon/ # package source (PEP 517/518 src/ layout)
│ ├── engines/ # IGT, D_JS, TWR, coherence, fidelity, epsilon, mode
│ ├── spacetime/ # temporal, circadian, deictic, velocity, interval, light cone, spatial
│ ├── events/ # 14-event evaluator
│ ├── integrations/ # OpenAI, Anthropic, LangChain, export targets
│ ├── mcp/ # MCP server + CLI
│ └── storage/ # optional SQLite persistence
├── tests/ # unit / integration / e2e / perf / validation
├── examples/ # runnable framework demos
├── deploy/ # Procfile, build.sh, runtime.txt, docker/
├── docs/
│ ├── research/ # market-demand.md + THCP theoretical framework
│ ├── content/ # published pieces on conversation dynamics monitoring
│ ├── integrations/ # Cursor / Claude Desktop / Copilot setup guides
│ ├── spec/ # HORIZON_TECH_SPEC.md + intent.yaml
│ └── reviews/ # E2E reviews, validation evidence
└── pyproject.toml
Horizon is grounded in the Trans-Horizon Communication Protocol (THCP), a theoretical framework for human–AI communication. Five conjectures establish:
The monitor instruments all five conjectures as computable signals.
MIT — see LICENSE.
Be the first to review this server!
by Toleno · Developer Tools
Toleno Network MCP Server — Manage your Toleno mining account with Claude AI using natural language.
by mcp-marketplace · Developer Tools
Create, build, and publish Python MCP servers to PyPI — conversationally.
by Microsoft · Content & Media
Convert files (PDF, Word, Excel, images, audio) to Markdown for LLM consumption