Server data from the Official MCP Registry
Unified MCP server for Kubernetes, ArgoCD, Prometheus, PagerDuty, and Loki
Unified MCP server for Kubernetes, ArgoCD, Prometheus, PagerDuty, and Loki
This DevOps MCP server provides useful infrastructure management capabilities but has several security concerns that warrant user awareness. The code exhibits moderate input validation gaps, missing credential scope restrictions, and potential for credential exposure through logging. While authentication mechanisms exist for some providers, several operations lack proper authorization controls, and the dry-run safety model relies on AI agent compliance rather than enforcement. The permissions align reasonably with the server's purpose, but the combination of broad infrastructure access (Kubernetes, ArgoCD, PagerDuty) with inadequate input validation and logging controls elevates the risk profile. Supply chain analysis found 5 known vulnerabilities in dependencies (0 critical, 2 high severity). Package verification found 1 issue.
4 files analyzed · 16 issues found
Security scores are indicators to help you make informed decisions, not guarantees. Always review permissions before connecting any MCP server.
This plugin requests these system permissions. Most are normal for its category.
Set these up before or after installing:
Environment variable: KUBECONFIG
Environment variable: K8S_CONTEXT
Environment variable: ARGOCD_SERVER
Environment variable: ARGOCD_TOKEN
Environment variable: PROMETHEUS_URL
Environment variable: PROMETHEUS_BEARER_TOKEN
Environment variable: PAGERDUTY_TOKEN
Environment variable: LOKI_URL
Environment variable: LOKI_TOKEN
Add this to your MCP configuration file:
{
"mcpServers": {
"io-github-notharshhaa-devops-mcp": {
"env": {
"LOKI_URL": "your-loki-url-here",
"KUBECONFIG": "your-kubeconfig-here",
"LOKI_TOKEN": "your-loki-token-here",
"K8S_CONTEXT": "your-k8s-context-here",
"ARGOCD_TOKEN": "your-argocd-token-here",
"ARGOCD_SERVER": "your-argocd-server-here",
"PROMETHEUS_URL": "your-prometheus-url-here",
"PAGERDUTY_TOKEN": "your-pagerduty-token-here",
"PROMETHEUS_BEARER_TOKEN": "your-prometheus-bearer-token-here"
},
"args": [
"-y",
"@notharshhaa/devops-mcp"
],
"command": "npx"
}
}
}From the project's GitHub README.
Unified MCP server for DevOps engineers — query and manage Kubernetes, ArgoCD, Prometheus, and PagerDuty from any MCP-compatible AI agent.
devops-mcp is an open source Model Context Protocol server that gives AI agents (Claude, etc.) real-time read and write access to your infrastructure stack — all from a single install.
Instead of copy-pasting kubectl output into a chat window, you can ask:
"Why is the payments deployment in CrashLoopBackOff?" "What changed in the last ArgoCD sync for the auth app?" "Show me the p99 latency for the API gateway over the last hour." "Who's on call right now and what incidents are open?" "Debug the payments service - what's wrong with it?"
...and get live answers, sourced directly from your cluster and tooling.
Providers included:
| Prefix | Provider | Transport |
|---|---|---|
k8s__* | Kubernetes (via kubeconfig or in-cluster SA) | client-go |
argo__* | ArgoCD | REST API |
prom__* | Prometheus | HTTP API (PromQL) |
pd__* | PagerDuty | REST API v2 |
helm__* | Helm | CLI (helm binary) |
devops__* | Cross-provider incident debugging | Aggregates all providers |
logs__* | Loki | HTTP API (LogQL) |
Add this to ~/.config/claude/claude_desktop_config.json (macOS: ~/Library/Application Support/Claude/claude_desktop_config.json):
{
"mcpServers": {
"devops": {
"command": "npx",
"args": ["-y", "@notharshhaa/devops-mcp@latest"],
"env": {
"KUBECONFIG": "/home/you/.kube/config",
"ARGOCD_SERVER": "https://argocd.company.com",
"ARGOCD_TOKEN": "your-argocd-token",
"PROMETHEUS_URL": "http://prometheus.monitoring:9090",
"PAGERDUTY_TOKEN": "your-pd-api-token",
"LOKI_URL": "http://loki.monitoring:3100",
"LOKI_TOKEN": "your-loki-token"
}
}
}
}
Restart Claude Desktop. The devops server will appear in the tools list.
claude mcp add devops-mcp -e KUBECONFIG=$HOME/.kube/config \
-e ARGOCD_SERVER=https://argocd.company.com \
-e ARGOCD_TOKEN=... \
-e PROMETHEUS_URL=http://prometheus:9090 \
-e PAGERDUTY_TOKEN=... \
-e LOKI_URL=http://loki.monitoring:3100 \
-e LOKI_TOKEN=... \
-- npx -y @notharshhaa/devops-mcp@latest
npx @notharshhaa/devops-mcp
# or clone and run:
git clone https://github.com/NotHarshhaa/devops-mcp
cd devops-mcp
npm install
cp .env.example .env # fill in your values
npm run dev
All config is via environment variables. Only set the ones for providers you actually use — providers with missing config are silently skipped.
# ── Kubernetes ────────────────────────────────────────────────
KUBECONFIG=/home/user/.kube/config # omit to use in-cluster service account
K8S_CONTEXT=my-prod-context # optional: pin a specific context
K8S_ALLOWED_NAMESPACES=default,backend # optional: restrict namespace access
# ── ArgoCD ───────────────────────────────────────────────────
ARGOCD_SERVER=https://argocd.company.com
ARGOCD_TOKEN=eyJhbGci... # argocd account generate-token
# ── Prometheus ───────────────────────────────────────────────
PROMETHEUS_URL=http://prometheus:9090
PROMETHEUS_BEARER_TOKEN= # optional: for authenticated Prometheus
# ── PagerDuty ────────────────────────────────────────────────
PAGERDUTY_TOKEN=your-api-v2-token
# ── Loki ───────────────────────────────────────────────────
LOKI_URL=http://loki.monitoring:3100
LOKI_TOKEN=your-loki-token
# ── Transport ────────────────────────────────────────────────
# For stdio mode (default): no transport config needed
# For SSE mode: set these env vars
PORT=3000 # SSE mode only
MCP_AUTH_TOKEN=shared-secret # Bearer token for SSE authentication
# ── Safety ───────────────────────────────────────────────────
DEVOPS_MCP_DRY_RUN=false # true = block all mutations globally
DEVOPS_MCP_AUDIT_LOG=/var/log/devops-mcp-audit.jsonl
All tools follow a three-tier safety model:
dry_run: true; set dry_run: false to executeconfirm: true as an explicit parameterk8s__*)| Tool | Tier | Description |
|---|---|---|
k8s__list_pods | read | List pods with status, restarts, node, age |
k8s__get_pod_logs | read | Tail or stream logs from a pod container |
k8s__describe_resource | read | Full describe for any resource type |
k8s__get_events | read | Cluster or namespace events, filterable by reason |
k8s__list_deployments | read | Deployments with replica counts and rollout health |
k8s__get_resource_usage | read | CPU/mem usage per pod via metrics-server |
k8s__get_node_status | read | Node health, conditions, capacity, allocatable resources, taints |
k8s__get_network_policies | read | Network policies with pod selectors and ingress/egress rules |
k8s__get_ingresses | read | Ingress resources with hosts, paths, backends, TLS config |
k8s__list_cronjobs | read | CronJobs with schedule, last run, active jobs, suspend status |
k8s__get_cronjob_status | read | Detailed CronJob status with recent job history |
k8s__diff_resource | read | Compare current resource state vs last-applied-configuration |
k8s__get_hpa | read | HorizontalPodAutoscaler with current/target metrics and scaling status |
k8s__list_pvcs | read | PersistentVolumeClaims with status, capacity, storage class |
k8s__list_services | read | Services with type, ports, selectors, clusterIP, endpoints |
k8s__list_contexts | read | All kubeconfig contexts and the active one |
k8s__switch_context | mutate | Switch active context (session-scoped) |
k8s__scale_deployment | mutate | Scale replicas with dry-run diff preview |
k8s__apply_manifest | mutate | Apply a manifest string with server-side dry-run |
k8s__rollout_restart | mutate | Trigger rolling restart of a deployment or statefulset |
k8s__delete_resource | destructive | Delete a named resource — requires confirm: true |
argo__*)| Tool | Tier | Description |
|---|---|---|
argo__list_apps | read | All apps with health, sync status, source repo |
argo__get_app | read | Full spec and status for one application |
argo__get_app_diff | read | Live diff between git and cluster state |
argo__get_app_history | read | Deployment history with git SHAs and timestamps |
argo__get_resource_tree | read | Full owned resource tree for an app |
argo__sync_app | mutate | Trigger sync — supports dry-run, prune, force |
argo__rollback_app | mutate | Roll back to a specific history revision |
argo__terminate_op | mutate | Cancel an in-progress sync operation |
prom__*)| Tool | Tier | Description |
|---|---|---|
prom__query | read | Instant PromQL query with label + value output |
prom__query_range | read | Range query with step, returns time-series data |
prom__list_alerts | read | All alert rules with state (firing / pending / inactive) |
prom__get_firing_alerts | read | Only currently firing alerts with duration |
prom__list_targets | read | All scrape targets with health and last scrape |
prom__label_values | read | Enumerate values for a given label name |
prom__metric_metadata | read | Type, help text, and unit for a metric |
prom__compare_periods | read | 📈 Compare metrics between two time windows — detect before/after deployment changes |
prom__slo_status | read | 🎯 SLO compliance — error budget remaining, burn rate, time to exhaustion |
prom__summarize_service_health | read | 📊 Smart summary - human-readable service health metrics including latency changes, error rate vs SLO, and traffic patterns |
Example usage:
# Get a human-readable health summary
prom__summarize_service_health(service="payments", timeframeMinutes=30, sloThreshold=0.05)
What it outputs:
Why this matters: Instead of raw PromQL numbers that require interpretation, this tool provides actionable insights that AI agents can use directly in responses, making monitoring data actually useful for incident investigation and communication.
logs__*)| Tool | Tier | Description |
|---|---|---|
logs__get_recent_errors | read | Get recent error logs from Loki for debugging incidents |
logs__search | read | Search logs in Loki with custom query for root cause analysis |
Example usage:
# Get recent error logs
logs__get_recent_errors(service="payments", namespace="default", minutes=30, limit=50)
# Search logs with custom query
logs__search(query='{service="payments"} |= level="error"', limit=100)
Why this matters:
Output format:
This makes incident investigation complete by combining the "what" (metrics) with the "why" (logs).
pd__*)| Tool | Tier | Description |
|---|---|---|
pd__list_incidents | read | Open incidents with severity, status, assignee |
pd__get_incident | read | Full detail with alerts, notes, timeline |
pd__who_is_oncall | read | Current on-call per schedule or escalation policy |
pd__list_services | read | All services with integration keys and status |
pd__get_log_entries | read | Audit log for an incident (all state changes) |
pd__acknowledge_incident | mutate | Acknowledge — suppresses further notifications |
pd__add_note | mutate | Append a note to an incident timeline |
pd__escalate_incident | destructive | Escalate to a different policy — requires confirm: true |
pd__summarize_incident | read | 🚨 Incident auto-summary - what happened, affected services, probable root cause, current status |
pd__summarize_incidentExample usage:
# Get an auto-summary of an incident
pd__summarize_incident(id="ABC123")
What it outputs:
Output format:
{
"what_happened": {
"title": "API Gateway High Error Rate",
"description": "5xx error rate exceeded 5% threshold",
"severity": "high",
"urgency": "high",
"status": "acknowledged",
"createdAt": "2025-01-15T10:30:00Z",
"updatedAt": "2025-01-15T11:45:00Z",
"duration": "1h 15m"
},
"affected_services": [
{
"id": "P123456",
"name": "API Gateway",
"status": "critical"
}
],
"probable_root_cause": "Triggered by: High 5xx error rate from API Gateway pods",
"current_status": {
"status": "acknowledged",
"lastUpdated": "2025-01-15T11:45:00Z",
"assignees": ["john.doe@company.com"],
"acknowledgements": 2,
"notes": 3
}
}
Why this matters: Instead of manually piecing together incident details from multiple API calls, this tool provides a comprehensive, human-readable summary perfect for:
helm__*)| Tool | Tier | Description |
|---|---|---|
helm__list_releases | read | List Helm releases with status, chart, app version |
helm__get_status | read | Full status of a Helm release |
helm__get_values | read | User-supplied or computed values for a release |
helm__get_history | read | Revision history of a release |
helm__rollback | mutate | Rollback to a previous revision (dry-run by default) |
Requirements: Helm CLI binary must be available in PATH.
Example usage:
# List all releases in a namespace
helm__list_releases(namespace="production")
# Check what values a release is using
helm__get_values(name="api-gateway", all_values=true)
# Rollback after a bad deploy
helm__rollback(name="api-gateway", revision=5, dry_run=false)
devops__*)| Tool | Tier | Description |
|---|---|---|
devops__debug_service | read | 🔥 Cross-provider incident debugging - aggregates Kubernetes, ArgoCD, Prometheus, and PagerDuty data to diagnose service issues in one command |
devops__explain_change | read | 🧠 Explain what changed - combines ArgoCD history, Kubernetes rollout history, and Prometheus anomaly window to identify cause of issues |
devops__runbook | read | 📋 Automated runbook - symptom-based diagnostic that runs targeted checks (crashloop, high-latency, oom, 5xx, pod-pending) |
devops__health_report | read | 🏥 Cluster health report - one-shot assessment across all providers with overall status (healthy/degraded/critical) |
devops__incident_timeline | read | 🕐 Incident timeline - unified event timeline across K8s, ArgoCD, Prometheus, and PagerDuty sorted chronologically |
devops__debug_serviceExample usage:
# Debug a service across all providers
devops__debug_service(service="payments", namespace="default")
What it checks:
Output format:
This is the most powerful tool for incident investigation - it gives you a complete picture of what's wrong with a service in seconds.
devops__explain_changeExample usage:
# Explain what changed in the last hour
devops__explain_change(service="payments", namespace="default", timeframeMinutes=60)
What it analyzes:
Output format:
Problem it solves: "Everything was working yesterday… what changed?"
This tool answers that question by correlating deployment events with metric anomalies, helping you quickly identify whether a recent deployment, config change, or external factor caused the issue.
devops__runbookExample usage:
# Diagnose a crashlooping service
devops__runbook(symptom="crashloop", service="payments", namespace="default")
# Investigate high latency
devops__runbook(symptom="high-latency", service="api-gateway")
Supported symptoms:
| Symptom | What it checks |
|---|---|
crashloop | Pod status → logs (tail 50) → BackOff events → deployment health |
high-latency | p95 latency → resource usage → firing alerts → recent deploys |
oom | OOMKilled events → memory usage → pod describe → resource limits |
5xx | Error rate → Loki error logs → deployment health |
pod-pending | Scheduling events → pending pods → node capacity |
Output: Structured JSON with steps_executed[], findings[], and recommended_actions[].
devops__health_reportExample usage:
# Get a full cluster health assessment
devops__health_report(namespace="production")
What it gathers:
Output: Overall status (healthy / degraded / critical), per-provider sections, and summary. Perfect for morning standup checks or shift handoffs.
The MCP host launches devops-mcp as a subprocess and communicates over stdin/stdout. Zero network config. Auth comes from the local environment (kubeconfig, env vars). Process lifecycle tied to Claude Desktop.
npx @notharshhaa/devops-mcp
# or with env vars
KUBECONFIG=~/.kube/config npx @notharshhaa/devops-mcp
Server runs as a persistent HTTP service. Claude connects over Server-Sent Events. Enables multiple users sharing one server. Needs TLS + a bearer token or mTLS in front. Deploy via Docker on an internal bastion.
npx @notharshhaa/devops-mcp-sse
# or with env vars
PORT=3000 MCP_AUTH_TOKEN=your-secret npx @notharshhaa/devops-mcp-sse
For team use, put it behind a TLS-terminating reverse proxy (Caddy, nginx, Traefik). A minimal docker-compose.yml is in the examples/ directory.
Run @notharshhaa/devops-mcp with WebSocket transport for real-time bidirectional communication (not in reference implementation).
TRANSPORT=websocket PORT=3000 MCP_AUTH_TOKEN=your-secret npx @notharshhaa/devops-mcp
Connect to ws://localhost:3000/ws with the auth token in the Authorization header.
devops-mcp is designed for internal use inside a trusted network. That said:
@kubernetes/client-node. Supports exec plugins (AWS EKS, GKE). In-cluster: auto-mounts SA token. Add RBAC rules scoped to your desired permissions — run devops-mcp under a dedicated ServiceAccount with minimal verbs.argocd account generate-token --account devops-mcp. Create a dedicated account in argocd-cm with apiKey capability and a role limited to read + sync.dry_run: true. The AI must explicitly pass dry_run: false — it won't do this unless the user clearly requests an action.confirm: true. This parameter is never passed by default; it requires the user to explicitly approve.DEVOPS_MCP_AUDIT_LOG to a file path. Every tool call is written as a JSONL line with timestamp, tool name, parameters, and outcome. Mutations and destructive calls are flagged.DEVOPS_MCP_DRY_RUN=true to prevent all mutations — useful for read-only team deployments.Client / UI agents (Claude Desktop, Claude Code, etc.)
│
▼
Transport Layer
┌──────────────────────────────┐
│ stdio | SSE | WebSocket │ ← Multiple transport support
│ Authentication (token/JWT) │ ← Dynamic auth system
└──────────────────────────────┘
│
▼
Server & Auth/Registry
┌──────────────────────────────┐
│ Tool registry & routing │
│ Dynamic auth manager │ ← Session-based auth
│ Request multiplexing │ ← Concurrent request handling
│ Audit logging │
└──────────────────────────────┘
│
▼
┌────┬────┬────┐
k8s argo prom pd ← Provider modules
│ │ │ │
K8s Argo Prom PD ← API clients
API API HTTP API
│
▼
Cross-cutting Concerns
┌──────────────────────────────┐
│ Dry-run guard │
│ Audit logger │
│ Error normalization │
│ Config loader │
└──────────────────────────────┘
Key architectural features:
Contributions are welcome. The most useful areas:
k8s__get_node_pressure, argo__get_app_logs)src/providers/yourprovider/ with index.ts, client.ts, and one file per resource group.src/server.ts..env.example and src/config.ts.git clone https://github.com/NotHarshhaa/devops-mcp
cd devops-mcp
npm install
cp .env.example .env
npm run dev # tsx watch — restarts on file change
Run against a local kind/minikube cluster for Kubernetes testing. Use DEVOPS_MCP_DRY_RUN=true to prevent accidental mutations during development.
grafana__*) — dashboards, annotations, datasourcesflux__*) — kustomizations, helm releases, image automationtfc__*) — workspace runs, state, variablesvault__*) — secret read (never write), lease statusdd__*) — metrics, monitors, eventsMIT — see LICENSE.
Built for DevOps and platform engineers who want AI that actually knows what's happening in their cluster.
Be the first to review this server!
by Modelcontextprotocol · Developer Tools
Web content fetching and conversion for efficient LLM usage
by Toleno · Developer Tools
Toleno Network MCP Server — Manage your Toleno mining account with Claude AI using natural language.
by mcp-marketplace · Developer Tools
Create, build, and publish Python MCP servers to PyPI — conversationally.