Server data from the Official MCP Registry
AI-powered app test automation for iOS, Android, Flutter, React Native, Web
AI-powered app test automation for iOS, Android, Flutter, React Native, Web
Valid MCP server (2 strong, 4 medium validity signals). 4 known CVEs in dependencies (0 critical, 3 high severity) Package registry verified. Imported from the Official MCP Registry. Trust signals: trusted author (5/5 approved).
4 files analyzed · 5 issues found
Security scores are indicators to help you make informed decisions, not guarantees. Always review permissions before connecting any MCP server.
This plugin requests these system permissions. Most are normal for its category.
Add this to your MCP configuration file:
{
"mcpServers": {
"io-github-muse-code-space-test-genie": {
"args": [
"-y",
"test-genie-mcp"
],
"command": "npx"
}
}
}From the project's GitHub README.
Built for vibe coders: one command, get a prioritized list of what's actually broken about your project.
Self-healing test automation for iOS, Android, Flutter, React Native and Web apps — as an MCP server.
v3.1.1 — vibe-check + honest auto-fix. One MCP call, ~30 seconds: race conditions + security issues + memory leaks + logic errors + perf smells, prioritized. Stays on your machine, no telemetry. Pass
autoFix: truefor the small, safe mechanical fixes (weak-hash, simpleMath.randomassignment) — backup + syntax-validate + rollback-on-syntax-fail. For test-verified application of harder fixes, use v3.0.0's iterate-fix loop.
You don't read the docs. You open the project, talk to Claude, and want a verdict. Here it is:
In Claude (with test-genie-mcp installed — setup):
/vibe-check /Users/me/my-app
Claude calls diagnose_project under the hood. ~30 seconds later you see:
# vibe-check report
- Project: /Users/me/my-app
- Platform: web
- Findings: 11 total — 4 critical, 4 high, 1 medium, 1 low
- Estimated fix time: ~85 min
## Top 5 issues
### 1. [CRIT] Hardcoded AWS access key id found in source
- File: `server.js:7`
- Category: security / secret (CWE-798)
- Confidence: 95%
- Fix: Move the value to an env var, gitignore the config, rotate the leaked key.
### 2. [CRIT] SQL string built by concatenating user input
- File: `server.js:21`
- Category: security / injection (CWE-89)
- Fix: Use parameterized queries (`db.query("... WHERE id = ?", [id])`).
### 3. [HIGH] useState setter called after await without mount guard
- File: `UserProfile.tsx:16`
- Category: race-condition / react-setstate-after-await (CWE-362)
- Confidence: 78%
- Fix: Use AbortController and check signal.aborted before calling setters.
… (top 5 shown — full list at output: "detailed")
## Next steps
1. Address the critical / high findings above.
2. Re-run diagnose_project after fixing to confirm convergence.
3. Use run_iterative_fix_loop for test-driven verification of each fix.
If any finding is autoFixable: true and is at high/critical severity, the diagnose_project call accepts autoFix: true to apply the mechanical replacement directly (with backup + syntax validation — see SAFETY.md for the exact guards). The v3.1.1 honest scope is narrow: weak hash (createHash('md5'|'sha1') → createHash('sha256')) and standalone Math.random() in security-sensitive files. For broader/structural fixes (race conditions, eval, exec injection) run run_iterative_fix_loop separately — it re-runs tests and auto-rolls-back on regression.
The bottleneck in mobile + cross-platform test automation isn't writing tests — it's the loop between a failing test and a passing test. test-genie closes that loop:
failing test → analyzer flags issue → fix proposed → dry-run + syntax check →
applied with backup → affected tests re-run → regression check → loop or stop
This full loop is the run_iterative_fix_loop tool. The diagnose_project autoFix: true path in v3.1.1 covers a strict subset — backup + dry-run + syntax-validate + apply, without re-running tests (so no test-regression rollback in that path). Use the right tool for the job — and see SAFETY.md for the exact guards on each.
Other tools (Detox, Maestro, Playwright, xcodebuild test) run tests. test-genie runs tests and drives the fix until the bar is met or it can no longer make progress — without you scrubbing through stack traces.
# 1. Install
npm install -g test-genie-mcp
# 2. Add to Claude Desktop config (~/.config/claude/claude_desktop_config.json)
{
"mcpServers": {
"test-genie": {
"command": "npx",
"args": ["test-genie-mcp"],
"env": {
"TEST_GENIE_ALLOWED_ROOT": "/path/to/your/project"
}
}
}
}
# 3. Restart Claude Desktop. From a chat:
# "Run the iterate-fix loop on /Users/me/my-rn-app with autoApply=false"
Expected output (truncated):
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Iterative fix loop f8b3… — PAUSED-FOR-CONFIRMATION
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Iterations completed: 1
Fixes applied: 0
Regressions rolled back: 0
Final tests: 7/10 passing (3 failing)
Pending confirmations (3):
- 71fbe…: Fix: useEffect missing cleanup for setInterval (confidence: 85)
- 92ad1…: Fix: Force-unwrap on possibly-undefined name (confidence: 85)
- …
Resume token: f8b3…
Re-call with autoApply: true (or resumeToken: "f8b3…") to actually patch the files.
The flows below describe the
run_iterative_fix_looppath (v3.0 headline) — full detect → propose → dry-run → apply-with-backup → re-run-tests → rollback-on-regression. Thediagnose_project autoFixpath in v3.1.1 is the narrower mechanical-replacement-only path; see SAFETY.md §4 for what that one actually touches.
A team adds setInterval(...) in a useEffect and forgets cleanup. test-genie's detect_memory_leaks flags it, suggest_fixes proposes return () => clearInterval(id) (src/tools/fixing/suggestFixes.ts:169-179), the loop dry-runs the patch through the TS compiler, applies with backup, re-runs only the affected snapshot test, confirms 100% pass, stops. Before: 1 failing snapshot. After: 0 failing, 1 fix applied, 1 backup at .test-genie-backups/.
dispose() automationAnimationController left undisposed. test-genie sees the missing dispose() override, generates a Dart @override dispose() { controller.dispose(); super.dispose(); } block (suggestFixes.ts:214-217), runs dart analyze on the patched file, applies, re-runs flutter test, converges.
self.timer = Timer.scheduledTimer(...) { _ in self.tick() } — rule-based detector flags closure self-capture, fixer rewrites to [weak self] _ in guard let self = self else { return }; self.tick() (suggestFixes.ts:239-242). If swiftc is on PATH the syntax check is real; otherwise test-genie reports "downgraded validation" so you know.
┌────────────────────┐
│ collect tests │ (run_scenario_test / supplied list)
└─────────┬──────────┘
│
pass-rate ≥ threshold? ── yes ──▶ SUCCESS
│ no
▼
┌────────────────────┐
│ detect issues │ memory + logic analyzers
└─────────┬──────────┘
│
┌────────────────────┐
│ suggest fixes │ rule-based (default) → LLM (hybrid, optional)
└─────────┬──────────┘
│
┌────────────────────┐
│ dry-run + syntax │ TS compiler API / platform compiler / brace check
└─────────┬──────────┘
│
┌────────────────────┐
│ apply with backup │ per-file `.test-genie-backups/`
└─────────┬──────────┘
│
┌────────────────────┐
│ re-run tests │ regression? yes → auto-rollback
└─────────┬──────────┘
│
▼
loop (≤ maxIterations, ≤ totalTimeout)
See docs/ITERATE_FIX_LOOP.md for a sequence diagram and the full safety-guard list.
| # | Tool | Mode |
|---|---|---|
| 1 | analyze_app_structure | real |
| 2 | generate_scenarios | real |
| 3 | create_test_plan | real |
| 4 | run_scenario_test | hybrid |
| 5 | run_simulation | simulated |
| 6 | run_stress_test | hybrid |
| 7 | detect_memory_leaks | real |
| 8 | detect_logic_errors | real |
| 9 | suggest_fixes | real |
| 10 | confirm_fix | real |
| 11 | apply_fix | real |
| 12 | rollback_fix | real |
| 13 | run_full_automation | hybrid |
| 14 | run_iterative_fix_loop (v3.0 headline) | hybrid |
| 15 | generate_report | real |
| 16 | get_pending_fixes | real |
| 17 | get_test_history | real |
| 18 | analyze_performance | real |
| 19 | analyze_code_deep | real |
| 20 | generate_cicd_config | real |
| 21 | diagnose_project (v3.1 headline — vibe-check) | real |
| 22 | detect_race_conditions | real |
| 23 | detect_security_issues | real |
mode legend in docs/SIMULATION_VS_REAL.md.
Plus 4 resources (test-genie://iteration-logs, …/test-history/{path}, …/iteration-logs/{loopId}, …/applied-fixes/{path}) and 3 prompts (full-test-pipeline, diagnose-failure, vibe-check).
Race conditions (detect_race_conditions / diagnose_project):
| Pattern | Language | Severity | Auto-fixable (v3.1.1) |
|---|---|---|---|
useState setter called after await without mount guard | TS/JS/React | high | no (structural) |
useEffect with async fetch, no AbortController/cleanup | TS/JS/React | high | no (structural) |
arr.forEach(async ...) (silent fire-and-forget) | TS/JS | medium | no (ordering-sensitive) |
Adjacent fetches without Promise.all / sequencing | TS/JS | medium | no |
TOCTOU: existsSync then readFileSync without lock | TS/JS Node | medium | no |
| Non-atomic counter increment in async context | TS/JS | low | no |
@Published mutation outside @MainActor | Swift | medium | no |
Concurrent DispatchQueue writes without .barrier | Swift | medium | no |
MutableStateFlow mutated off Dispatchers.Main | Kotlin | medium | no |
Flow collected without flowOn | Kotlin | low | no |
Goroutine + shared map without sync.Mutex | Go | high | no |
v3.1.1 honesty audit:
useEffect-no-abortandforEach-awaitwere previously advertised as auto-fixable. They are not — wrapping withAbortControlleror rewriting toPromise.all(arr.map(...))changes behavior we can't verify statically. They are now report-only. See SAFETY.md.
Security (detect_security_issues / diagnose_project):
| Pattern | Severity | CWE | Auto-fixable (v3.1.1) |
|---|---|---|---|
| Hardcoded AWS / Stripe / GitHub / Google / Slack token | critical / high | CWE-798 | no (rotate) |
| Hardcoded JWT secret literal | high | CWE-798 | no |
| API token in URL query string | high | CWE-200 | no |
.env file present but not gitignored | high | CWE-538 | no (rotation must follow) |
SQL string concat with req.params / req.body | critical | CWE-89 | no |
innerHTML / dangerouslySetInnerHTML with dynamic value | high | CWE-79 | no |
eval() / new Function() with non-literal | critical | CWE-95 | no |
Math.random() in security-sensitive file, standalone assignment | high | CWE-338 | yes (crypto.randomInt) |
Math.random() mixed into arithmetic | high | CWE-338 | no (semantic) |
createHash('md5'|'sha1') in security-keyword file | high | CWE-327 | yes ('sha256') |
createHash('md5'|'sha1') elsewhere | medium | CWE-327 | no (below severity floor) |
child_process.exec with user-input template literal | critical | CWE-78 | no |
fetch(req.query.url) (SSRF) | high | CWE-918 | no |
CORS * origin + Allow-Credentials: true | high | CWE-942 | no |
Cookie set without httpOnly / secure / sameSite | low | CWE-1004 | no |
yaml.load without safe schema | medium | CWE-502 | no |
v3.1.1 honesty audit:
.env/Math.random(general)/yaml.loadwere previously advertised as auto-fixable. They were either too risky to rewrite blindly or no strategy shipped — flipped to report-only. See SAFETY.md §5.
This is a "catch the obvious stuff in 30s" filter, not Snyk / Semgrep / a full SAST tool. We don't catch:
db.query, the regex won't connect the dots. A real SAST traces taint across the call graph. Roadmap: ts-morph reference walking for top-N entry points.npm audit's job, and bundling a stale advisory list would lie. Run npm audit --json in parallel if you want dep-CVE coverage.Math.random() named getNonce won't fool us; a properly-named crypto.randomBytes used with a tiny entropy budget will.securityAnalyzer.SECRET_PATTERNS. PR welcome.run_stress_test / run_simulation, not static analysis.If you want deeper coverage on top of vibe-check: feed the findings into run_iterative_fix_loop for test-verified application, or escalate to Snyk / Semgrep / GitHub Advanced Security for compliance use cases.
| vibe-check (test-genie) | Snyk | Semgrep | GitHub Advanced Security | |
|---|---|---|---|---|
| Runs locally | yes | hybrid (cloud) | yes | no (cloud) |
| Telemetry-free | yes (zero network calls) | no | partial | no |
| Fix loop integration | yes (run_iterative_fix_loop) | no | no | no |
| Race-condition detection | yes (JS/Swift/Kotlin/Go) | no | partial | partial |
| Cross-file taint flow | no (roadmap) | yes | yes | yes |
| Setup time | none (already installed if test-genie is installed) | account + auth | install + ruleset | repo-level enable |
If your goal is "before I commit, what's broken?", vibe-check wins on latency. If your goal is "compliance + supply chain audit", use the dedicated tools.
autoApply: false (the default) and use it as a fix-proposal generator only.| test-genie | Detox | Maestro | xcodebuild test | |
|---|---|---|---|---|
| Runs E2E / unit tests | ✅ (via Jest/Detox/etc.) | ✅ | ✅ | ✅ |
| Detects code issues | ✅ rule + LLM | ❌ | ❌ | ❌ |
| Iterative fix loop | ✅ (run_iterative_fix_loop) | ❌ | ❌ | ❌ |
| Auto-rollback on test regression | ✅ inside run_iterative_fix_loop only | ❌ | ❌ | ❌ |
| Auto-rollback on syntax failure | ✅ all apply paths | ❌ | ❌ | ❌ |
| MCP-native (talks to Claude / agents) | ✅ | ❌ | ❌ | ❌ |
| Multi-platform | iOS+Android+Web+Flutter+RN | iOS+Android | iOS+Android | iOS only |
Scope note:
diagnose_project autoFix: truerolls back on syntax-validate failure (applyFix.ts:185-202) but does not re-run tests, so it cannot detect test regressions. For test-driven rollback userun_iterative_fix_loop. See SAFETY.md §2.4.
test-genie uses tools like Jest, Detox, and xcodebuild test under the hood — it sits at the orchestration layer, not the test-runner layer.
-typecheck mode. If the compiler isn't on PATH, we fall back to brace-balance validation and surface downgraded: true in the result. Install swiftc / kotlinc / javac / dart for real validation.strategy: 'hybrid' only kicks LLM in when rule-based confidence is below threshold. Without an API key the loop is rule-based-only — no failure.$TEST_GENIE_STORAGE_DIR (defaults to ~/.test-genie-mcp). Not synced across machines.run_simulation returns plausible anomalies, not real ones. Use run_scenario_test (hybrid) for real-device runs.| Env var | Default | Purpose |
|---|---|---|
TEST_GENIE_ALLOWED_ROOT | cwd | Capability-based path safety — server refuses to read/write outside this root. |
TEST_GENIE_STORAGE_DIR | ~/.test-genie-mcp | Where scenarios / results / iteration logs live. |
TEST_GENIE_LLM_PROVIDER | auto-detect | anthropic / openai / none. |
ANTHROPIC_API_KEY | — | Used when provider = anthropic. |
OPENAI_API_KEY | — | Used when provider = openai. |
TEST_GENIE_ANTHROPIC_MODEL | claude-haiku-4-5 | Override Anthropic model. |
TEST_GENIE_OPENAI_MODEL | gpt-4o-mini | Override OpenAI model. |
run_full_automation still works. The confirmMode / autoFix options are kept for compatibility but autoApply: boolean is the new way — autoApply: true is equivalent to confirmMode: 'auto'.Issues, PRs, and ideas welcome — see CONTRIBUTING.md (TODO). Code lives under src/, tests under tests/. Run npm test before sending a PR.
@MUSE-CODE-SPACE — Yoonkyoung Gong.
MIT — see LICENSE.
Be the first to review this server!
by Modelcontextprotocol · Developer Tools
Read, search, and manipulate Git repositories programmatically
by Toleno · Developer Tools
Toleno Network MCP Server — Manage your Toleno mining account with Claude AI using natural language.
by mcp-marketplace · Developer Tools
Create, build, and publish Python MCP servers to PyPI — conversationally.