Server data from the Official MCP Registry
Local MCP voice coach with English pronunciation, grammar, and fluency feedback.
Local MCP voice coach with English pronunciation, grammar, and fluency feedback.
Valid MCP server (1 strong, 1 medium validity signals). No known CVEs in dependencies. Package registry verified. Imported from the Official MCP Registry.
4 files analyzed · 1 issue found
Security scores are indicators to help you make informed decisions, not guarantees. Always review permissions before connecting any MCP server.
This plugin requests these system permissions. Most are normal for its category.
Set these up before or after installing:
Environment variable: MCP_PRONUNCIATION_MODEL
Environment variable: HF_HUB_CACHE
Add this to your MCP configuration file:
{
"mcpServers": {
"io-github-juhongpark-pronunciation": {
"env": {
"HF_HUB_CACHE": "your-hf-hub-cache-here",
"MCP_PRONUNCIATION_MODEL": "your-mcp-pronunciation-model-here"
},
"args": [
"mcp-server-pronunciation"
],
"command": "uvx"
}
}
}From the project's GitHub README.
Public beta notice
This project is an early beta and is still under active development. It may contain bugs, runtime errors, inaccurate transcripts, inaccurate pronunciation feedback, or platform-specific recording issues. Use it for experimentation and language-learning practice only, and review outputs carefully before relying on them. See DISCLAIMER.md.
An MCP (Model Context Protocol) server that lets you talk to Claude by voice while getting English pronunciation, grammar, and fluency feedback in the same turn. Use it for casual voice chat with light coaching, or switch to drill mode when you want to practice a specific sentence.
Built for Claude Desktop, Claude Code, and any other MCP client. Everything runs locally — audio is captured with your mic, transcribed by faster-whisper on-device, and never leaves your machine.
mcp-name: io.github.JuhongPark/pronunciation
Voice MCP servers today treat speech as a typing replacement. English tutor MCP servers are text-only. This one combines the two: you speak freely, Claude replies, and feedback on what you just said (pronunciation, grammar, fluency) surfaces inside the same tool call so Claude can weave it into a natural reply — or stay out of the way when you're just chatting.
[phoneme] extra: wav2vec2 CTC forced alignment verifies whether the user actually produced each reference word, so rare proper nouns and domain-specific terms that Whisper rewrites toward more common alternatives no longer surface as mispronunciations.practice, quick_practice, retry) for focused sentence practice.base.en)[phoneme] extra (wav2vec2 weights for forced alignment)2025-06-18 via the official Python SDK (mcp>=1.2)Beta releases are pre-releases. Install the current beta explicitly:
uvx mcp-server-pronunciation@0.3.0b3
For pip users:
pip install --pre mcp-server-pronunciation
Run doctor before relying on the beta in a live session:
mcp-server-pronunciation doctor
# Recommended: uvx (no global install, cached between runs)
uvx mcp-server-pronunciation
# Or install as a uv tool
uv tool install mcp-server-pronunciation
# Or pip
pip install mcp-server-pronunciation
# Optional: forced-alignment upgrade for Whisper-bias mitigation + tighter
# phoneme-level feedback. Adds ~200 MB of torch CPU wheels.
pip install 'mcp-server-pronunciation[phoneme]'
sounddevice ships PortAudio inside the wheel on macOS and Windows, but on Linux you need the system library:
# Debian / Ubuntu
sudo apt-get install libportaudio2
# Fedora / RHEL
sudo dnf install portaudio
# Arch
sudo pacman -S portaudio
# PipeWire-only systems may also need
sudo apt-get install pipewire-alsa
Before wiring the server into Claude, run the preflight:
uvx mcp-server-pronunciation doctor
Optional — pre-download the Whisper model (~150 MB) so the first call is instant:
uvx mcp-server-pronunciation pull-model base.en
claude mcp add pronunciation -- uvx mcp-server-pronunciation
Edit claude_desktop_config.json:
{
"mcpServers": {
"pronunciation": {
"command": "uvx",
"args": ["mcp-server-pronunciation"]
}
}
}
On macOS, if Claude Desktop can't find uvx (spawn uvx ENOENT), use an absolute path. Find it with which uvx in your terminal.
Add to ~/.cursor/mcp.json:
{
"mcpServers": {
"pronunciation": {
"command": "uvx",
"args": ["mcp-server-pronunciation"]
}
}
}
Add to .vscode/mcp.json or your user settings:
{
"servers": {
"pronunciation": {
"type": "stdio",
"command": "uvx",
"args": ["mcp-server-pronunciation"]
}
}
}
You: "Let's have a voice chat. I'll ask you about the weekend. Use the converse tool."
Claude (calls
converse): records your speech, transcribes it, notes that you said "buyed" instead of "bought"Claude: "Oh nice — what kind of apples did you buy? And by the way, the past tense of 'buy' is 'bought' — small thing, but I noticed it."
You: "Give me a sentence to practice with 'th' sounds."
Claude (calls
suggest_sentencewithfocus=th): "Try this: The three brothers thought thoroughly about their future."You: "Record me reading it."
Claude (calls
practicewith that reference): returns an alignment table (match / sub / ins / del) with per-word acoustic confidence when the[phoneme]extra is installed, phoneme-level issues with expected vs produced IPA, learner-profile hints when applicable, minimal-pair drills, and prosody notes (word stress, final-rise intonation, intra-clause pauses).
You: "Let me try again."
Claude (calls
retry): re-records the same target sentence and compares
| Tool | Purpose |
|---|---|
converse | Primary. Record + transcribe + quick feedback + "For Claude" guidance for natural voice-chat-with-coaching. |
practice | Drill mode: record user reading a specific reference sentence, return detailed assessment. |
quick_practice | Pick a random sentence (by phoneme focus + difficulty) and drill it. |
retry | Re-record the last sentence the user was practicing. |
suggest_sentence | Return a practice sentence without recording. |
record | Record audio and save a WAV file (raw, no analysis). |
assess | Assess the last recording (or a specified WAV) without re-recording. When given a reference, runs the full drill pipeline (alignment, phoneme diff, learner-profile hints, prosody). |
check_mic | List available audio input devices. |
Set MCP_PRONUNCIATION_MODEL to pick a different model size:
# Default — fast, English-only (~150 MB)
export MCP_PRONUNCIATION_MODEL=base.en
# Smaller / faster (~75 MB)
export MCP_PRONUNCIATION_MODEL=tiny.en
# More accurate (~470 MB)
export MCP_PRONUNCIATION_MODEL=small.en
# Multilingual options (larger)
export MCP_PRONUNCIATION_MODEL=small
export MCP_PRONUNCIATION_MODEL=medium
Available: tiny, tiny.en, base, base.en, small, small.en, medium, medium.en, large-v3, large-v3-turbo. For English-only use, the .en variants are faster and more accurate at a given size.
GPU (CUDA 12 + cuDNN 9) is auto-detected when available; otherwise runs on CPU with int8 quantization.
By default Whisper weights are cached in ~/.cache/huggingface/hub/. Override with HF_HUB_CACHE:
export HF_HUB_CACHE=/path/to/cache
Recordings are written as temporary WAV files so assess can inspect the last
recording. By default they are removed when the server process exits:
export MCP_PRONUNCIATION_AUDIO_RETENTION=session
Set MCP_PRONUNCIATION_AUDIO_RETENTION=keep if you want temporary recordings
to remain on disk for manual inspection.
claude mcp add pronunciation -e MCP_PRONUNCIATION_MODEL=small.en -- uvx mcp-server-pronunciation
Installing mcp-server-pronunciation[phoneme] enables wav2vec2-based CTC forced alignment. It verifies which reference words the user acoustically produced, regardless of how Whisper's language-model-weighted decoder rewrote them — so rare proper nouns and domain terms no longer surface as false mispronunciations. On first run the extra downloads ~360 MB of weights into ~/.cache/torch/hub/ (override via TORCH_HOME). Inference is CPU-only by default and runtime-quantized to int8 (~95 MB RAM).
Without the extra, assess / practice still run the full pipeline except for the forced-alignment step: you get Needleman-Wunsch word alignment against the Whisper hypothesis, CMUdict phoneme-sequence diff, learner-profile hints, and prosody.
| Platform | Recording method | Status |
|---|---|---|
| macOS | sounddevice (bundled PortAudio) | Supported |
| Linux | sounddevice (needs libportaudio2) | Supported |
| Windows | sounddevice (bundled PortAudio) | Supported |
| WSL2 | PowerShell MCI (winmm.dll) | Supported |
WSL2 note: WSLg's PulseAudio does not forward microphone audio from the Windows host. This server detects WSL2 automatically and records through PowerShell on the Windows side instead.
uvx mcp-server-pronunciation doctor is your first stopIt reports on PortAudio, input devices, Whisper model cache, pronunciation resources, optional forced-alignment dependencies, free disk space, and Python version. Run it whenever something feels off.
sounddevice import fails on LinuxYou're missing libportaudio2. See the install section above. After installing:
uvx mcp-server-pronunciation doctor
pavucontrol (PulseAudio) or pw-cli list-objects (PipeWire) for input levels. On PipeWire-only systems, install pipewire-alsa.The Whisper model downloads on first use (~150 MB for base.en). Pre-download it once:
uvx mcp-server-pronunciation pull-model base.en
Subsequent runs reuse the cached weights. If startup still feels slow, try MCP_PRONUNCIATION_MODEL=tiny.en.
spawn uvx ENOENTClaude Desktop launches MCP servers from a GUI-only environment without ~/.local/bin on PATH. Use the absolute path to uvx in your config (/Users/YOU/.local/bin/uvx or wherever which uvx reports).
[phoneme] extra reduces some reference-sentence false positives but does not eliminate them.doctor and pull-model before relying on the server in a live session.MCP_PRONUNCIATION_AUDIO_RETENTION=keep if you want to inspect them later.This project is moving toward benchmark-backed scoring. Planned public benchmark work is tracked in ROADMAP.md, the testing methodology lives in docs/TESTING.md, and the current benchmark helper docs live in docs/BENCHMARKS.md. The primary candidate is Speechocean762 because it has a permissive CC BY 4.0 license and multi-level expert pronunciation scores. L2-ARCTIC is useful for phone-error and learner-profile research checks, including Korean-L1 subset review, but its non-commercial license means it should remain optional and separate from default release claims.
The source repository is public. PyPI, GitHub Release, and MCP Registry publication steps are tracked in docs/PUBLICATION.md.
.wav files under your system temp directory ($TMPDIR) and are removed when the server exits unless MCP_PRONUNCIATION_AUDIO_RETENTION=keep is set.[phoneme] extra is installed, the wav2vec2 forced aligner also runs locally. Weights are downloaded once from the PyTorch Hub.git clone https://github.com/JuhongPark/mcp-server-pronunciation.git
cd mcp-server-pronunciation
uv sync --extra dev
uv run pytest -v
uv run ruff check .
uv run ruff format --check .
To work on the optional wav2vec2 forced-alignment path, install the phoneme extra as well:
uv sync --extra dev --extra phoneme
Issues: https://github.com/JuhongPark/mcp-server-pronunciation/issues
MIT. See LICENSE.
Third-party components (all MIT / permissive):
faster-whisper — MITsounddevice — MITcmudict — BSDg2p-en — Apache 2.0librosa — ISC[phoneme] extra): PyTorch — BSD, torchaudio — BSD, wav2vec2 weights — MITBe the first to review this server!
by Modelcontextprotocol · Developer Tools
Read, search, and manipulate Git repositories programmatically
by Toleno · Developer Tools
Toleno Network MCP Server — Manage your Toleno mining account with Claude AI using natural language.
by mcp-marketplace · Developer Tools
Create, build, and publish Python MCP servers to PyPI — conversationally.