How do I install Devops?

Devops is a local plugin. Install it using npm package: @notharshhaa/devops-mcp and add the generated configuration snippet to your AI app's MCP config file. Then restart your AI app.

Is Devops safe to use?

Devops was flagged by MCP Marketplace's security scan, scoring 3.3/10 (high risk). It has 5 high or critical findings to review. Review the security report on this page carefully before installing it.

What credentials does Devops need?

Devops requires the following credentials or environment variables: KUBECONFIG, K8S_CONTEXT, ARGOCD_SERVER, ARGOCD_TOKEN, PROMETHEUS_URL, PROMETHEUS_BEARER_TOKEN, PAGERDUTY_TOKEN, LOKI_URL, LOKI_TOKEN. You can find setup instructions on the server detail page.

What AI apps work with Devops?

Devops uses the Model Context Protocol (MCP) and works with any MCP-compatible AI app, including Claude, ChatGPT / Codex, Gemini, Copilot, Cursor, and more.

Back to Browse

Devops MCP Server

by NotHarshhaa

Developer ToolsUse Caution3.3MCP RegistryLocal

Free

Server data from the Official MCP Registry

Unified MCP server for Kubernetes, ArgoCD, Prometheus, PagerDuty, and Loki

About

Unified MCP server for Kubernetes, ArgoCD, Prometheus, PagerDuty, and Loki

Security Report

3.3

Use Caution3.3High Risk

This DevOps MCP server provides useful infrastructure management capabilities but has several security concerns that warrant user awareness. The code exhibits moderate input validation gaps, missing credential scope restrictions, and potential for credential exposure through logging. While authentication mechanisms exist for some providers, several operations lack proper authorization controls, and the dry-run safety model relies on AI agent compliance rather than enforcement. The permissions align reasonably with the server's purpose, but the combination of broad infrastructure access (Kubernetes, ArgoCD, PagerDuty) with inadequate input validation and logging controls elevates the risk profile. Supply chain analysis found 5 known vulnerabilities in dependencies (0 critical, 2 high severity). Package verification found 1 issue.

4 files analyzed · 16 issues found

Security scores are indicators to help you make informed decisions, not guarantees. Always review permissions before connecting any MCP server.

Permissions Required

This plugin requests these system permissions. Most are normal for its category.

File System Read

Reads files on your machine. Normal for tools that analyze or process local data.

env_vars

Check that this permission is expected for this type of plugin.

HTTP Network Access

Connects to external APIs or services over the internet.

process_spawn

Check that this permission is expected for this type of plugin.

What You'll Need

Set these up before or after installing:

Path to Kubernetes kubeconfig fileOptional

Environment variable: KUBECONFIG

Kubernetes context to useOptional

Environment variable: K8S_CONTEXT

ArgoCD server URLOptional

Environment variable: ARGOCD_SERVER

ArgoCD authentication tokenRequired

Environment variable: ARGOCD_TOKEN

Prometheus server URLOptional

Environment variable: PROMETHEUS_URL

Prometheus bearer tokenRequired

Environment variable: PROMETHEUS_BEARER_TOKEN

PagerDuty API tokenRequired

Environment variable: PAGERDUTY_TOKEN

Loki server URLOptional

Environment variable: LOKI_URL

Loki authentication tokenRequired

Environment variable: LOKI_TOKEN

How to Install

Add this to your MCP configuration file:

{
  "mcpServers": {
    "io-github-notharshhaa-devops-mcp": {
      "env": {
        "LOKI_URL": "your-loki-url-here",
        "KUBECONFIG": "your-kubeconfig-here",
        "LOKI_TOKEN": "your-loki-token-here",
        "K8S_CONTEXT": "your-k8s-context-here",
        "ARGOCD_TOKEN": "your-argocd-token-here",
        "ARGOCD_SERVER": "your-argocd-server-here",
        "PROMETHEUS_URL": "your-prometheus-url-here",
        "PAGERDUTY_TOKEN": "your-pagerduty-token-here",
        "PROMETHEUS_BEARER_TOKEN": "your-prometheus-bearer-token-here"
      },
      "args": [
        "-y",
        "@notharshhaa/devops-mcp"
      ],
      "command": "npx"
    }
  }
}

Documentation

View on GitHub

From the project's GitHub README.

devops-mcp

Unified MCP server for DevOps engineers — query and manage Kubernetes, ArgoCD, Prometheus, and PagerDuty from any MCP-compatible AI agent.

What is this?

devops-mcp is an open source Model Context Protocol server that gives AI agents (Claude, etc.) real-time read and write access to your infrastructure stack — all from a single install.

Instead of copy-pasting kubectl output into a chat window, you can ask:

"Why is the payments deployment in CrashLoopBackOff?" "What changed in the last ArgoCD sync for the auth app?" "Show me the p99 latency for the API gateway over the last hour." "Who's on call right now and what incidents are open?" "Debug the payments service - what's wrong with it?"

...and get live answers, sourced directly from your cluster and tooling.

Providers included:

Prefix	Provider	Transport
`k8s__*`	Kubernetes (via kubeconfig or in-cluster SA)	client-go
`argo__*`	ArgoCD	REST API
`prom__*`	Prometheus	HTTP API (PromQL)
`pd__*`	PagerDuty	REST API v2
`helm__*`	Helm	CLI (helm binary)
`devops__*`	Cross-provider incident debugging	Aggregates all providers
`logs__*`	Loki	HTTP API (LogQL)

Quick start

Claude Desktop (stdio — recommended)

Add this to ~/.config/claude/claude_desktop_config.json (macOS: ~/Library/Application Support/Claude/claude_desktop_config.json):

{
  "mcpServers": {
    "devops": {
      "command": "npx",
      "args": ["-y", "@notharshhaa/devops-mcp@latest"],
      "env": {
        "KUBECONFIG": "/home/you/.kube/config",
        "ARGOCD_SERVER": "https://argocd.company.com",
        "ARGOCD_TOKEN": "your-argocd-token",
        "PROMETHEUS_URL": "http://prometheus.monitoring:9090",
        "PAGERDUTY_TOKEN": "your-pd-api-token",
        "LOKI_URL": "http://loki.monitoring:3100",
        "LOKI_TOKEN": "your-loki-token"
      }
    }
  }
}

Restart Claude Desktop. The devops server will appear in the tools list.

Claude Code (CLI)

claude mcp add devops-mcp -e KUBECONFIG=$HOME/.kube/config \
  -e ARGOCD_SERVER=https://argocd.company.com \
  -e ARGOCD_TOKEN=... \
  -e PROMETHEUS_URL=http://prometheus:9090 \
  -e PAGERDUTY_TOKEN=... \
  -e LOKI_URL=http://loki.monitoring:3100 \
  -e LOKI_TOKEN=... \
  -- npx -y @notharshhaa/devops-mcp@latest

Local dev / test

Requires Node.js 20 or newer.

npx @notharshhaa/devops-mcp
# or clone and run:
git clone https://github.com/NotHarshhaa/devops-mcp
cd devops-mcp
npm install
cp .env.example .env   # fill in your values
npm run dev

Configuration

All config is via environment variables. Only set the ones for providers you actually use — providers with missing config are silently skipped.

# ── Kubernetes ────────────────────────────────────────────────
KUBECONFIG=/home/user/.kube/config       # omit to use in-cluster service account
K8S_CONTEXT=my-prod-context              # optional: pin a specific context
K8S_ALLOWED_NAMESPACES=default,backend   # optional: restrict namespace access

# ── ArgoCD ───────────────────────────────────────────────────
ARGOCD_SERVER=https://argocd.company.com
ARGOCD_TOKEN=eyJhbGci...                 # argocd account generate-token

# ── Prometheus ───────────────────────────────────────────────
PROMETHEUS_URL=http://prometheus:9090
PROMETHEUS_BEARER_TOKEN=                 # optional: for authenticated Prometheus

# ── PagerDuty ────────────────────────────────────────────────
PAGERDUTY_TOKEN=your-api-v2-token

# ── Loki ───────────────────────────────────────────────────
LOKI_URL=http://loki.monitoring:3100
LOKI_TOKEN=your-loki-token

# ── Stateless Streamable HTTP ────────────────────────────────
# For stdio mode (default): no transport config needed
MCP_HTTP_HOST=127.0.0.1                 # use 0.0.0.0 inside a container
PORT=3000
MCP_AUTH_TOKEN=shared-secret            # optional static Bearer token
MCP_REQUEST_STATE_SECRET=32+-byte-secret # optional MRTR signing key shared by all replicas
MCP_ALLOWED_HOSTS=localhost,127.0.0.1   # required with non-loopback binding
MCP_ALLOWED_ORIGINS=                    # optional browser Origin hostname allowlist
MCP_CACHE_TTL_MS=60000                  # discovery/tools catalog TTL; 0 disables caching

# ── Safety ───────────────────────────────────────────────────
DEVOPS_MCP_DRY_RUN=false                # true = block all mutations globally
DEVOPS_MCP_AUDIT_LOG=/var/log/devops-mcp-audit.jsonl

Tool reference

All tools follow a three-tier safety model:

Read — safe, no side effects, no confirmation needed
Mutate — defaults to dry_run: true; set dry_run: false to execute
Destructive — requires confirm: true, or a 2026-07-28 client can complete the server's interactive MRTR confirmation

Kubernetes (`k8s__*`)

Tool	Tier	Description
`k8s__list_pods`	read	List pods with status, restarts, node, age
`k8s__get_pod_logs`	read	Tail or stream logs from a pod container
`k8s__describe_resource`	read	Full describe for any resource type
`k8s__get_events`	read	Cluster or namespace events, filterable by reason
`k8s__list_deployments`	read	Deployments with replica counts and rollout health
`k8s__get_resource_usage`	read	CPU/mem usage per pod via metrics-server
`k8s__get_node_status`	read	Node health, conditions, capacity, allocatable resources, taints
`k8s__get_network_policies`	read	Network policies with pod selectors and ingress/egress rules
`k8s__get_ingresses`	read	Ingress resources with hosts, paths, backends, TLS config
`k8s__list_cronjobs`	read	CronJobs with schedule, last run, active jobs, suspend status
`k8s__get_cronjob_status`	read	Detailed CronJob status with recent job history
`k8s__diff_resource`	read	Compare current resource state vs last-applied-configuration
`k8s__get_hpa`	read	HorizontalPodAutoscaler with current/target metrics and scaling status
`k8s__list_pvcs`	read	PersistentVolumeClaims with status, capacity, storage class
`k8s__list_services`	read	Services with type, ports, selectors, clusterIP, endpoints
`k8s__list_contexts`	read	All kubeconfig contexts and the active one
`k8s__switch_context`	mutate	Preview a context selection; set `K8S_CONTEXT` and restart to apply it safely
`k8s__scale_deployment`	mutate	Scale replicas with dry-run diff preview
`k8s__apply_manifest`	mutate	Apply a manifest string with server-side dry-run
`k8s__rollout_restart`	mutate	Trigger rolling restart of a deployment or statefulset
`k8s__delete_resource`	destructive	Delete a named resource — requires direct or interactive confirmation

ArgoCD (`argo__*`)

Tool	Tier	Description
`argo__list_apps`	read	All apps with health, sync status, source repo
`argo__get_app`	read	Full spec and status for one application
`argo__get_app_diff`	read	Live diff between git and cluster state
`argo__get_app_history`	read	Deployment history with git SHAs and timestamps
`argo__get_resource_tree`	read	Full owned resource tree for an app
`argo__sync_app`	mutate	Trigger sync — supports dry-run, prune, force
`argo__rollback_app`	mutate	Preview rollback to a history revision; set `dry_run: false` to execute
`argo__terminate_op`	mutate	Preview cancellation of an in-progress sync; set `dry_run: false` to execute

Prometheus (`prom__*`)

Tool	Tier	Description
`prom__query`	read	Instant PromQL query with label + value output
`prom__query_range`	read	Range query with step, returns time-series data
`prom__list_alerts`	read	All alert rules with state (firing / pending / inactive)
`prom__get_firing_alerts`	read	Only currently firing alerts with duration
`prom__list_targets`	read	All scrape targets with health and last scrape
`prom__label_values`	read	Enumerate values for a given label name
`prom__metric_metadata`	read	Type, help text, and unit for a metric
`prom__compare_periods`	read	📈 Compare metrics between two time windows — detect before/after deployment changes
`prom__slo_status`	read	🎯 SLO compliance — error budget remaining, burn rate, time to exhaustion
`prom__summarize_service_health`	read	📊 Smart summary - human-readable service health metrics including latency changes, error rate vs SLO, and traffic patterns

Example usage:

# Get a human-readable health summary
prom__summarize_service_health(service="payments", timeframeMinutes=30, sloThreshold=0.05)

What it outputs:

Latency: "Latency increased: 120ms → 480ms (+300%)" or "Latency stable: 125ms"
Error rate: "Error rate crossed SLO (5%): 7.2%" or "Error rate within SLO: 2.1%"
Traffic: "Traffic dropped: 500 → 350 req/s (-30%)" or "Traffic spike detected (+150%)"
Overall assessment: Summary of issues and positive indicators

Why this matters: Instead of raw PromQL numbers that require interpretation, this tool provides actionable insights that AI agents can use directly in responses, making monitoring data actually useful for incident investigation and communication.

Loki (`logs__*`)

Tool	Tier	Description
`logs__get_recent_errors`	read	Get recent error logs from Loki for debugging incidents
`logs__search`	read	Search logs in Loki with custom query for root cause analysis

Example usage:

# Get recent error logs
logs__get_recent_errors(service="payments", namespace="default", minutes=30, limit=50)

# Search logs with custom query
logs__search(query='{service="payments"} |= level="error"', limit=100)

Why this matters:

Metrics tell what: Prometheus shows you that latency increased or error rate crossed SLO
Logs tell why: Loki shows you the actual error messages, stack traces, and context around failures
Complete debugging: Without logs, you can see that something is broken but not understand the root cause

Output format:

Structured log entries with timestamp, message, service, namespace, and extracted log levels
Error count summaries and filtering
Raw LogQL results for detailed analysis

This makes incident investigation complete by combining the "what" (metrics) with the "why" (logs).

PagerDuty (`pd__*`)

Tool	Tier	Description
`pd__list_incidents`	read	Open incidents with severity, status, assignee
`pd__get_incident`	read	Full detail with alerts, notes, timeline
`pd__who_is_oncall`	read	Current on-call per schedule or escalation policy
`pd__list_services`	read	All services with integration keys and status
`pd__get_log_entries`	read	Audit log for an incident (all state changes)
`pd__acknowledge_incident`	mutate	Preview acknowledgement; set `dry_run: false` to execute
`pd__add_note`	mutate	Preview appending a note; set `dry_run: false` to execute
`pd__escalate_incident`	destructive	Escalate to a different policy — requires direct or interactive confirmation
`pd__summarize_incident`	read	🚨 Incident auto-summary - what happened, affected services, probable root cause, current status

`pd__summarize_incident`

Example usage:

# Get an auto-summary of an incident
pd__summarize_incident(id="ABC123")

What it outputs:

What happened: Incident title, description, severity, urgency, status, creation time, and duration
Affected services: Service name, ID, and current status
Probable root cause: Analysis of trigger alerts and log entries to identify likely causes
Current status: Current incident state, assignees, acknowledgements, and notes count

Output format:

{
  "what_happened": {
    "title": "API Gateway High Error Rate",
    "description": "5xx error rate exceeded 5% threshold",
    "severity": "high",
    "urgency": "high",
    "status": "acknowledged",
    "createdAt": "2025-01-15T10:30:00Z",
    "updatedAt": "2025-01-15T11:45:00Z",
    "duration": "1h 15m"
  },
  "affected_services": [
    {
      "id": "P123456",
      "name": "API Gateway",
      "status": "critical"
    }
  ],
  "probable_root_cause": "Triggered by: High 5xx error rate from API Gateway pods",
  "current_status": {
    "status": "acknowledged",
    "lastUpdated": "2025-01-15T11:45:00Z",
    "assignees": ["john.doe@company.com"],
    "acknowledgements": 2,
    "notes": 3
  }
}

Why this matters: Instead of manually piecing together incident details from multiple API calls, this tool provides a comprehensive, human-readable summary perfect for:

Demos: Shows AI's ability to understand and summarize complex incident data
Real-world use: Quickly understand incident impact without digging through raw data
Communication: Share concise incident summaries with stakeholders

Helm (`helm__*`)

Tool	Tier	Description
`helm__list_releases`	read	List Helm releases with status, chart, app version
`helm__get_status`	read	Full status of a Helm release
`helm__get_values`	read	User-supplied or computed values for a release
`helm__get_history`	read	Revision history of a release
`helm__rollback`	mutate	Rollback to a previous revision (dry-run by default)

Requirements: Helm CLI binary must be available in PATH.

Example usage:

# List all releases in a namespace
helm__list_releases(namespace="production")

# Check what values a release is using
helm__get_values(name="api-gateway", all_values=true)

# Rollback after a bad deploy
helm__rollback(name="api-gateway", revision=5, dry_run=false)

Cross-Provider Debugging (`devops__*`)

Tool	Tier	Description
`devops__debug_service`	read	🔥 Cross-provider incident debugging - aggregates Kubernetes, ArgoCD, Prometheus, and PagerDuty data to diagnose service issues in one command
`devops__explain_change`	read	🧠 Explain what changed - combines ArgoCD history, Kubernetes rollout history, and Prometheus anomaly window to identify cause of issues
`devops__runbook`	read	📋 Automated runbook - symptom-based diagnostic that runs targeted checks (crashloop, high-latency, oom, 5xx, pod-pending)
`devops__health_report`	read	🏥 Cluster health report - one-shot assessment across all providers with overall status (healthy/degraded/critical)
`devops__incident_timeline`	read	🕐 Incident timeline - unified event timeline across K8s, ArgoCD, Prometheus, and PagerDuty sorted chronologically

`devops__debug_service`

Example usage:

# Debug a service across all providers
devops__debug_service(service="payments", namespace="default")

What it checks:

Kubernetes: Pod status, restart counts, readiness, deployment health, recent events
ArgoCD: Sync status, health status, Git diff detection, deployment history
Prometheus: Error rate (5xx responses), latency (p95), firing alerts
PagerDuty: Active incidents matching the service name

Output format:

Human-readable diagnosis with emoji indicators (⚠️ warnings, ❌ errors)
Per-provider status sections
Summary highlighting critical issues
Raw JSON data for detailed analysis

This is the most powerful tool for incident investigation - it gives you a complete picture of what's wrong with a service in seconds.

`devops__explain_change`

Example usage:

# Explain what changed in the last hour
devops__explain_change(service="payments", namespace="default", timeframeMinutes=60)

What it analyzes:

ArgoCD: Deployment history within the timeframe, including revision, author, repo, and chart
Kubernetes: Current rollout status, replica counts, image tags, and deployment readiness
Prometheus: Error rate trends, latency patterns, and traffic spikes over the time window

Output format:

Timeline of recent deployments with full metadata
Kubernetes rollout status and health
Metric anomaly detection (error rate spikes, latency issues, traffic changes)
Correlation analysis that links deployments to metric changes
Summary with root cause hypothesis

Problem it solves: "Everything was working yesterday… what changed?"

This tool answers that question by correlating deployment events with metric anomalies, helping you quickly identify whether a recent deployment, config change, or external factor caused the issue.

`devops__runbook`

Example usage:

# Diagnose a crashlooping service
devops__runbook(symptom="crashloop", service="payments", namespace="default")

# Investigate high latency
devops__runbook(symptom="high-latency", service="api-gateway")

Supported symptoms:

Symptom	What it checks
`crashloop`	Pod status → logs (tail 50) → BackOff events → deployment health
`high-latency`	p95 latency → resource usage → firing alerts → recent deploys
`oom`	OOMKilled events → memory usage → pod describe → resource limits
`5xx`	Error rate → Loki error logs → deployment health
`pod-pending`	Scheduling events → pending pods → node capacity

Output: Structured JSON with steps_executed[], findings[], and recommended_actions[].

`devops__health_report`

Example usage:

# Get a full cluster health assessment
devops__health_report(namespace="production")

What it gathers:

Kubernetes: Unhealthy pods, deployments not at desired replicas
Prometheus: Count of firing alerts
ArgoCD: Out-of-sync and unhealthy applications
PagerDuty: Open incident count

Output: Overall status (healthy / degraded / critical), per-provider sections, and summary. Perfect for morning standup checks or shift handoffs.

Deployment options

stdio (recommended for local use)

The MCP host launches devops-mcp as a subprocess and communicates over stdin/stdout. Zero network config. Auth comes from the local environment (kubeconfig, env vars). Process lifecycle tied to Claude Desktop.

npx @notharshhaa/devops-mcp
# or with env vars
KUBECONFIG=~/.kube/config npx @notharshhaa/devops-mcp

Stateless Streamable HTTP (shared deployments)

The HTTP entry serves MCP at POST /mcp using the 2026-07-28 stateless protocol. Each request gets a fresh MCP server instance, so requests can land on any replica without session affinity or shared protocol state. The same endpoint also accepts 2025-era Streamable HTTP clients in stateless compatibility mode.

MCP_HTTP_HOST=127.0.0.1 \
PORT=3000 \
MCP_AUTH_TOKEN=your-secret \
npx -y -p @notharshhaa/devops-mcp@latest devops-mcp-http

Connect clients to http://127.0.0.1:3000/mcp. For a container or remote service, set MCP_HTTP_HOST=0.0.0.0 and configure MCP_ALLOWED_HOSTS with the public/proxy hostnames. Put the service behind TLS for team use.

The deprecated devops-mcp-sse binary remains as a temporary alias for the HTTP entry, but /sse, /message, and /ws now return HTTP 410. Legacy HTTP+SSE and non-standard WebSocket clients must migrate to Streamable HTTP.

2026-07-28 behavior

No initialize requirement or Mcp-Session-Id on modern requests.
server/discover, per-request client metadata, and MCP-Protocol-Version are handled by the official SDK.
Mcp-Method and Mcp-Name headers are validated against the JSON-RPC body for gateway routing and authorization.
server/discover and tools/list advertise deterministic, public cache hints using MCP_CACHE_TTL_MS (default 60 seconds).
Destructive tools accept confirm: true; modern clients may instead complete an MRTR interactive confirmation. Confirmation state is HMAC-signed, expires after five minutes, and is bound to the exact tool arguments. Global dry-run blocks before prompting.
MCP_REQUEST_STATE_SECRET must be the same on every replica for MRTR retries to land anywhere. When omitted, the server derives the key from MCP_AUTH_TOKEN, or uses a process-local random key when no token is configured.
2025-era Streamable HTTP and stdio clients remain supported. Sessionful HTTP and legacy HTTP+SSE are not.

The built-in MCP_AUTH_TOKEN is a static bearer-token gate, not an OAuth authorization server. For Internet-facing deployments, terminate TLS and enforce your organization’s OAuth/OIDC policy at a gateway or integrate a dedicated identity provider; do not use deprecated Dynamic Client Registration for new deployments.

A minimal docker-compose.yml is available in examples/.

Security model

devops-mcp is designed for internal use inside a trusted network. That said:

Kubernetes: Uses standard kubeconfig via @kubernetes/client-node. Supports exec plugins (AWS EKS, GKE). In-cluster: auto-mounts SA token. Add RBAC rules scoped to your desired permissions — run devops-mcp under a dedicated ServiceAccount with minimal verbs. Context selection is fixed by K8S_CONTEXT at startup; the tool only previews changes because runtime switching would affect other callers.
ArgoCD: Generate a long-lived token: argocd account generate-token --account devops-mcp. Create a dedicated account in argocd-cm with apiKey capability and a role limited to read + sync.
Prometheus: Usually unauthenticated inside a cluster. If using Grafana Mimir or Thanos with auth, pass a Bearer token. All tools are read-only so minimal permissions are needed.
PagerDuty: Create a dedicated API key in PagerDuty → API Access → Create New API Key. Use Full Access if you want acknowledge/escalate tools; Read-only if you want a safe-only mode.
Mutations are dry-run by default. Every mutating tool defaults dry_run: true. The AI must explicitly pass dry_run: false — it won't do this unless the user clearly requests an action.
Destructive tools require confirmation. Pass confirm: true directly, or use a 2026-07-28 client that supports the server's MRTR confirmation request. DEVOPS_MCP_DRY_RUN=true blocks execution even after confirmation.
Audit log. Set DEVOPS_MCP_AUDIT_LOG to a file path. Every tool call is written as a JSONL line with timestamp, tool name, parameters, and outcome. Mutations and destructive calls are flagged.
Global dry-run mode. Set DEVOPS_MCP_DRY_RUN=true to block every executing mutation, even when a caller passes dry_run: false. Safe previews remain available — useful for read-only team deployments.

Architecture

Client / UI agents (Claude Desktop, Claude Code, gateways)
       │
       ▼
  MCP v2 Serving Layer
  ┌──────────────────────────────────────────┐
  │ serveStdio       │ POST /mcp             │
  │ 2025 + 2026 eras │ stateless per request │
  │                  │ routing/header checks │
  └──────────────────────────────────────────┘
       │
       ▼
  Server Factory & Tool Registry
  ┌──────────────────────────────────────────┐
  │ Fresh protocol instance per HTTP request │
  │ Deterministic tool catalog + cache hints │
  │ MRTR confirmation for destructive tools │
  │ Audit logging and error normalization    │
  └──────────────────────────────────────────┘
       │
       ▼
  ┌─────┬──────┬──────┬──────┬──────┬──────┐
  k8s   argo   prom   pd     logs   helm
       │
       ▼
  Dry-run guard │ Namespace policy │ Provider credentials

Key architectural features:

Stateless remote protocol: each /mcp request creates a fresh server instance; no protocol session ID or sticky load balancing.
Dual-era compatibility: official SDK entries serve 2026-07-28 and compatible 2025-era stdio/Streamable HTTP clients.
Gateway-friendly routing: modern method and tool headers are validated before dispatch.
Safe caching: deterministic tools/list ordering and configurable public cache hints.
Provider isolation: each provider remains independently configured and safely skipped when unavailable.

Contributing

Contributions are welcome. The most useful areas:

New providers — Grafana, Datadog, Vault, Terraform Cloud, Flux CD
New tools — within existing providers (e.g. k8s__get_node_pressure, argo__get_app_logs)
Better output formatting — richer structured responses for specific resource types
Tests — unit tests for provider logic using mocked clients

Adding a new provider

Create src/providers/yourprovider/ with index.ts, client.ts, and one file per resource group.
Register it in src/server.ts.
Add config keys to .env.example and src/config.ts.
Document tools in this README following the existing table format.
Open a PR.

Local development

git clone https://github.com/NotHarshhaa/devops-mcp
cd devops-mcp
npm install
cp .env.example .env
npm run dev        # tsx watch — restarts on file change

Run against a local kind/minikube cluster for Kubernetes testing. Use DEVOPS_MCP_DRY_RUN=true to prevent accidental mutations during development.

Roadmap

Grafana provider (grafana__*) — dashboards, annotations, datasources
Flux CD provider (flux__*) — kustomizations, helm releases, image automation
Terraform Cloud provider (tfc__*) — workspace runs, state, variables
HashiCorp Vault provider (vault__*) — secret read (never write), lease status
Datadog provider (dd__*) — metrics, monitors, events
Web UI for provider health, task progress, and audit events

License

MIT — see LICENSE.

Built for DevOps and platform engineers who want AI that actually knows what's happening in their cluster.

Reviews

No reviews yet

Be the first to review this server!

More Developer Tools MCP Servers

Git

Free

by Modelcontextprotocol · Developer Tools

Read, search, and manipulate Git repositories programmatically

Toleno

Free

by Toleno · Developer Tools

Toleno Network MCP Server — Manage your Toleno mining account with Claude AI using natural language.

mcp-creator-python

Free

by mcp-marketplace · Developer Tools

Create, build, and publish Python MCP servers to PyPI — conversationally.

MarkItDown

Free

by Microsoft · Content & Media

Convert files (PDF, Word, Excel, images, audio) to Markdown for LLM consumption

MCP Marketplace

Free

by mcp-marketplace · Developer Tools

Search and install MCP servers from inside your AI client.

FinAgent

Free

by mcp-marketplace · Finance

Free stock data and market news for any MCP-compatible AI assistant.

Devops MCP Server

About

Security Report

Findings (16)Action required

Permissions Required

What You'll Need

How to Install

Documentation

devops-mcp

What is this?

Quick start

Claude Desktop (stdio — recommended)

Claude Code (CLI)

Local dev / test

Configuration

Tool reference

Kubernetes (k8s__*)

ArgoCD (argo__*)

Prometheus (prom__*)

Loki (logs__*)

PagerDuty (pd__*)

pd__summarize_incident

Helm (helm__*)

Cross-Provider Debugging (devops__*)

devops__debug_service

devops__explain_change

devops__runbook

devops__health_report

Deployment options

stdio (recommended for local use)

Stateless Streamable HTTP (shared deployments)

2026-07-28 behavior

Security model

Architecture

Contributing

Adding a new provider

Local development

Roadmap

License

Reviews

No reviews yet

More Developer Tools MCP Servers

Git

Toleno

mcp-creator-python

MarkItDown

MCP Marketplace

FinAgent

Kubernetes (`k8s__*`)

ArgoCD (`argo__*`)

Prometheus (`prom__*`)

Loki (`logs__*`)

PagerDuty (`pd__*`)

`pd__summarize_incident`

Helm (`helm__*`)

Cross-Provider Debugging (`devops__*`)

`devops__debug_service`

`devops__explain_change`

`devops__runbook`

`devops__health_report`