Server data from the Official MCP Registry
Pseudonymizes sensitive data before it reaches cloud LLMs and restores it on the way back.
Pseudonymizes sensitive data before it reaches cloud LLMs and restores it on the way back.
Valid MCP server (2 strong, 4 medium validity signals). No known CVEs in dependencies. Package registry verified. Imported from the Official MCP Registry. Trust signals: trusted author (3/3 approved).
5 files analyzed · 1 issue found
Security scores are indicators to help you make informed decisions, not guarantees. Always review permissions before connecting any MCP server.
Add this to your MCP configuration file:
{
"mcpServers": {
"io-github-woladi-pseudonym-mcp": {
"args": [
"-y",
"pseudonym-mcp"
],
"command": "npx"
}
}
}From the project's GitHub README.
Local pseudonymisation tools for LLM workflows — replace detected PII with opaque tokens before you hand text to a cloud LLM, then restore those tokens afterward.
Expose MCP tools (mask_text and unmask_text) that your client or agent can call as an explicit privacy step. The server detects PII locally, replaces it with opaque tokens, and keeps the token mapping in memory for later restoration.
It is a defense-in-depth measure, not a compliance silver bullet. Read the Limitations and GDPR & AI Compliance sections before assuming this stack does more than it does.
detectLanguage()) infers the language from text content — --lang remains the authoritative override but is no longer the only input.[PERSON:1] map back to originals in an isolated, per-request session. Multiple round-trips preserve token coherence.mask_text returns auto_unmask for clients that want to honor that preference, but this server does not intercept arbitrary LLM responses automatically.regex only (no Ollama required), llm only, or hybrid (default).❌ Without pseudonym-mcp:
"John Smith, SSN 123-45-6789, card 4111 1111 1111 1111" → sent verbatim to the LLM provider✅ With pseudonym-mcp used before the cloud call:
"[PERSON:1], SSN [SSN:1], card [CREDIT_CARD:1]" when you call mask_text firstunmask_text before reaching the userThis is a meaningful reduction in cleartext PII exposure. It is not "no personal data leaves your machine" — see Limitations.
pseudonym-mcp is relevant to compliance work, but it is a technical control, not a compliance product. Whether you are compliant with any specific regulation depends on your full stack, your role (controller/processor), your contracts, your DPIA, and your jurisdiction.
The EU General Data Protection Regulation (GDPR) classifies names, national ID numbers (like SSN or PESEL), bank account numbers (IBAN), email addresses, credit card numbers, and phone numbers as personal data under Article 4(1). Sending this data to a cloud LLM provider constitutes processing under Article 4(2). Pseudonymisation is explicitly recognised under Art. 4(5) as a risk-reduction measure — but, critically, pseudonymised data is still personal data (Recital 26).
| GDPR Article | Obligation | Where pseudonym-mcp helps | Where it doesn't |
|---|---|---|---|
| Art. 5(1)(c) | Data minimisation | Strips detected direct identifiers before transmission | Doesn't minimise context, structure, or undetected PII |
| Art. 25 | Privacy by design and by default | Provides a technical layer that fits into a privacy-by-design architecture | Architecture and policy decisions are still your responsibility |
| Art. 32 | Security of processing | Recognised technical measure under Recital 83 (pseudonymisation) | One control among many; doesn't replace access control, logging, encryption |
| Art. 44 | Transfers to third countries | Reduces the cleartext PII you transfer | Pseudonymised personal data is still personal data — transfer rules still apply |
| Art. 4(5) | Pseudonymisation definition | The mapping store is opaque to the cloud LLM; re-identification requires the local session | Re-identification is possible from context for anyone with side knowledge |
The honest bottom line: pseudonymisation under GDPR Art. 4(5) is not anonymisation. The data remains personal data in your system, and Art. 44 transfer obligations are not switched off just because you tokenised the name field.
The EU AI Act places additional requirements on high-risk AI systems that process personal data. Using pseudonym-mcp as an intermediary layer can:
It does not change your AI Act risk classification on its own — classification is a function of use-case and deployment context, not of the masking step in front of the model.
The tool is also relevant outside the EU, with the same caveats:
| Sector | Relevant regulation | PII types commonly handled |
|---|---|---|
| Healthcare | GDPR + HIPAA + national health data laws | Patient names, SSN, diagnoses |
| Banking & Finance | GDPR + PCI DSS + PSD2 + DORA | Credit cards, IBAN, SSN, PESEL |
| HR & Recruitment | GDPR Art. 9 (special categories) | Names, national IDs, contact details |
| Legal | GDPR + attorney–client privilege | Names, case numbers, personal details |
| Insurance | GDPR + Solvency II | Personal identifiers, health data |
| Public Sector (US) | CCPA + state privacy laws | SSN, driver's license numbers |
| Public Sector (PL) | GDPR + UODO + KRI | PESEL, NIP, REGON |
In every row of this table, pseudonym-mcp is a useful building block. None of those regimes can be satisfied by a masking tool alone.
Your App / Claude Desktop
│
│ explicit mask_text tool call with PII
▼
┌─────────────────────────┐
│ pseudonym-mcp │
│ │
│ Phase 1: Regex NER │ ← SSN, CREDIT_CARD, EMAIL, PHONE (en)
│ │ ← PESEL, IBAN, EMAIL, PHONE, NIP (pl)
│ Phase 2: Ollama NER │ ← PERSON, ORG (local LLM)
│ MappingStore (session) │ ← [TAG:N] ↔ original value
└────────────┬────────────┘
│ masked text returned to the client/agent
▼
Your workflow sends the masked text
▼
Cloud LLM API
(Claude / GPT-4 / Gemini)
│
│ response with [TAG:N] tokens
▼
┌─────────────────────────┐
│ pseudonym-mcp │
│ unmask_text / revert │ ← tokens → originals
└────────────┬────────────┘
│ restored response
▼
Your App / User
English (--lang en, default):
[PERSON:1] John Smith
[SSN:1] 123-45-6789
[CREDIT_CARD:1] 4111 1111 1111 1111
[ORG:1] Acme Corp
[EMAIL:1] john@acme.com
[PHONE:1] (555) 123-4567
Polish (--lang pl):
[PERSON:1] Jan Kowalski
[PESEL:1] 90010112318
[ORG:1] Auto-Lux
[IBAN:1] PL27114020040000300201355387
[EMAIL:1] jan@example.pl
[PHONE:1] +48 123 456 789
The mapping is stored in a session-scoped in-memory store. Each mask_text call returns a session_id; pass it back to unmask_text to restore originals.
You have a note:
Meeting with Jan Kowalski (PESEL: 90010112318) from Acme sp. z o.o.
We discussed a contract for 45 000 zł. Contact: jan.kowalski@acme.pl
In Claude Code you type:
Use mask_text on this note, then summarise the key points of the meeting.
First, call mask_text; pseudonym-mcp replaces detected PII locally:
Meeting with [PERSON:1] ([PESEL:1]) from [ORG:1].
We discussed a contract for 45 000 zł. Contact: [EMAIL:1]
Then ask Claude to work from the masked text. Claude responds with tokens:
Meeting with [PERSON:1] from [ORG:1] covered a contract
for 45 000 zł. Follow up via [EMAIL:1].
pseudonym-mcp restores originals locally:
Meeting with Jan Kowalski from Acme sp. z o.o. covered
a contract for 45 000 zł. Follow up via jan.kowalski@acme.pl
If the masked text is what you send upstream, the cloud provider sees the structure of the meeting and the amount — but not the detected name, PESEL, organisation, or email in cleartext. The swap happens on your machine.
session_id# mask the entire vault once — save the session_id
Use mask_text on my notes — remember the session_id
# ask Claude anything across multiple prompts
Summarise all meetings from Q1
# Claude replies with tokens; restore originals
Use unmask_text with session_id abc123 on the response
The session_id keeps the token map alive for the session — the same [PERSON:1] always refers to the same person across notes. That consistency is what makes cross-note reasoning possible; it is also what makes a masked corpus potentially re-identifiable to anyone with side knowledge of your work. Use long-lived sessions deliberately.
pseudonym-mcp ships two built-in prompt templates that describe a mask → task → unmask workflow.
Important: MCP prompt templates are convenience helpers, not a privacy boundary. Inline prompt arguments may be visible to the host client or model before tool masking happens. For strongest privacy, call mask_text directly first, then use the returned masked_text in your LLM prompt.
pseudonymize_task — inline text/pseudonymize_task text="Meeting with Jan Kowalski (PESEL: 90010112318). Contract: 45 000 zł." task="Extract action items"
Intended workflow:
[PERSON:1], [PESEL:1]Optional lang argument: en (default) or pl.
privacy_scan_file — file / PDF (macOS only)Requires macos-vision-mcp — a separate MCP server that uses Apple's Vision framework to extract text from PDFs and images on-device. macOS only.
/privacy_scan_file filePath="/Users/me/contracts/nda.pdf" task="Summarise obligations and deadlines"
Intended workflow:
Optional arguments: task (default: summarise the key points), lang (en or pl).
Step 1 — Add to your MCP client (example for Claude Code — no install needed):
claude mcp add pseudonym-mcp -- npx -y pseudonym-mcp --engines hybrid
Step 2 — (Optional) Pull an Ollama model for full hybrid NER:
ollama pull llama3
Skip this step if you only need regex-based masking (--engines regex). Without Ollama, you'll catch structured identifiers (SSN, IBAN, cards, email, phone, PESEL) but not free-form names and organisations.
Global install — if you prefer
npm install -g pseudonym-mcp, replacenpx -y pseudonym-mcpwithpseudonym-mcpin all snippets below.
Restart your client. The mask_text and unmask_text tools appear automatically.
| Tool | What it does | Example prompt |
|---|---|---|
mask_text | Pseudonymise detected PII in text. Returns masked_text + session_id. | "Use mask_text on this customer letter before summarising it" |
unmask_text | Restore original values from a session. Pass the session_id returned by mask_text. | "Use unmask_text with session_id X to restore the response" |
mask_text input{
"text": "John Smith (SSN: 123-45-6789) works at Acme Corp.",
"session_id": "optional — omit to create a new session",
"custom_literals": ["John Smith", "Acme Corp"]
}
mask_text output{
"session_id": "3f2a1b...",
"masked_text": "[PERSON:1] (SSN: [SSN:1]) works at [ORG:1].",
"auto_unmask": false,
"ner_status": "ready"
}
unmask_text input{
"text": "The case concerns [PERSON:1] at [ORG:1].",
"session_id": "3f2a1b..."
}
mcp-config.json (project root){
"lang": "en",
"engines": "hybrid",
"ollamaModel": "llama3",
"ollamaBaseUrl": "http://localhost:11434",
"autoUnmask": false,
"strictValidation": true,
"customLiterals": ["Jan Kowalski", "78091512345", "+48 123 456 789"]
}
| Key | Values | Default | Description |
|---|---|---|---|
lang | en, pl | en | Language pack for regex rules |
engines | regex | llm | hybrid | hybrid | Which NER engines to run |
ollamaModel | any Ollama model name | llama3 | Local LLM for entity detection |
ollamaBaseUrl | URL | http://localhost:11434 | Ollama API endpoint |
autoUnmask | true | false | false | Report the preferred unmask behavior to clients; this server does not intercept responses |
strictValidation | true | false | true | Enable checksum / format validation (SSN area check, Luhn for cards, PESEL checksum) |
customLiterals | string[] | [] | Specific strings always redacted regardless of engine (names, IDs, phone numbers) |
All config keys can be overridden at startup (highest priority):
pseudonym-mcp --lang en --engines regex --ollama-model llama3 --auto-unmask
| Flag | Description |
|---|---|
--lang | Language for regex rules: en or pl (default: en) |
--engines | regex, llm, or hybrid (default: hybrid) |
--ollama-model | Ollama model to use for NER |
--ollama-base-url | Ollama base URL |
--config | Path to a custom JSON config file |
--auto-unmask | Set auto_unmask: true in mask_text output for clients that honor it |
--custom-literals | Comma-separated strings to always redact, e.g. "Jan Kowalski,78091512345" |
claude mcp add pseudonym-mcp -- npx -y pseudonym-mcp --engines hybrid
Edit ~/Library/Application Support/Claude/claude_desktop_config.json:
{
"mcpServers": {
"pseudonym-mcp": {
"command": "npx",
"args": ["-y", "pseudonym-mcp", "--engines", "hybrid"]
}
}
}
Add to ~/.cursor/mcp.json:
{
"mcpServers": {
"pseudonym-mcp": {
"command": "npx",
"args": ["-y", "pseudonym-mcp", "--engines", "regex"]
}
}
}
Detection is best-effort. The patterns below are what the tool looks for — not a guarantee of what it will always catch. See Limitations for known gaps.
| Tag | Detection | Match |
|---|---|---|
CUSTOM | Exact match (case-insensitive) against customLiterals config or custom_literals tool param | Exact string |
Custom literals are applied after the regex phase and before LLM NER, regardless of engine mode. Longest literals are matched first to prevent partial substitution.
The tables below list patterns active in the current Engine pipeline. Some additional pattern modules exist in the repository for experimentation, but they are not advertised here unless the language rules actually use them.
--lang en, default)| Tag | Pattern | Validation |
|---|---|---|
SSN | XXX-XX-XXXX (US Social Security Number) | Area number check (rejects 000, 666, 900+) |
CREDIT_CARD | 13–19 digits (Visa, Mastercard, Amex, Discover) | Luhn checksum |
EMAIL | RFC 5321-compatible | Format match |
PHONE | +1 (XXX) XXX-XXXX, XXX-XXX-XXXX, XXX.XXX.XXXX | Format match |
PERSON | Full names | Ollama NER (hybrid / llm engines) |
ORG | Company / organisation names | Ollama NER (hybrid / llm engines) |
--lang pl)| Tag | Pattern | Validation |
|---|---|---|
PESEL | 11-digit national ID | Full checksum (weights [1,3,7,9,1,3,7,9,1,3]) |
IBAN | PL + 26 digits, compact or spaced | Format match |
EMAIL | RFC 5321-compatible | Format match |
PHONE | +48 / 0048 prefix, 9-digit mobile, landline (XX) XXX-XX-XX | Format match |
NIP | 10-digit tax ID (strict / paranoid modes) | Checksum (weights [6,5,7,2,3,4,5,6,7]) |
PERSON | Full names | Ollama NER (hybrid / llm engines) |
ORG | Company / organisation names | Ollama NER (hybrid / llm engines) |
pseudonym-mcp includes a lightweight heuristic language detector based on franc.
It infers the language from text content and returns a structured result:
detectLanguage('Umowa zostaje zawarta na czas nieokreślony')
// → { detected: 'pl', source: 'text', raw: 'pol', confidence: 0.94 }
detectLanguage('Hello')
// → { detected: 'unknown', source: 'fallback', raw: null, confidence: null }
| Field | Description |
|---|---|
detected | 'pl', 'en', or 'unknown' |
source | 'text' — franc ran and mapped successfully; 'fallback' — too short or undetermined |
raw | Raw ISO 639-3 code from franc (e.g. 'pol'), or null |
confidence | Score 0–1 from franc, or null when franc was not called |
Texts shorter than 20 characters or with low confidence return detected: 'unknown'.
The detector does not affect the current pseudonymisation pipeline — --lang config remains authoritative.
It is a building block for future multi-language and auto-select modes.
| Mode | Requires Ollama | Detects structured PII | Detects names / orgs |
|---|---|---|---|
regex | No | Yes | No |
llm | Yes | No | Yes |
hybrid (default) | Yes (graceful fallback) | Yes | Yes |
In hybrid mode, Ollama runs after the regex pass, so the local NER model receives already-tokenised structured identifiers. If Ollama is unreachable, the server logs a warning to stderr and returns the regex-only masked text — no crash, no hang.
Calibrated claims:
[PERSON:1] will not become [PERSON:2] for the same name on a second occurrence), preserving semantic coherence in LLM reasoning.What this does not guarantee:
pseudonym-mcp is a technical privacy control, not a legal guarantee of compliance.
mask_text, this tool cannot help you.Under GDPR Art. 4(5) and Recital 26, pseudonymised data is still personal data. pseudonym-mcp substantially reduces cleartext PII exposure but does not eliminate your legal obligations.
git clone https://github.com/woladi/pseudonym-mcp
cd pseudonym-mcp
npm install
npm run build # tsc compile
npm test # vitest (no Ollama required)
The test suite runs fully offline — Ollama calls are injected via constructor and mocked in all tests. No live LLM required.
src/patterns/locale/<lang>/ — each file exports a PatternRule with id, entityType, pattern, locales, engines, and optional validatesrc/patterns/index.ts (add to allPatterns array)src/languages/<lang>/rules.ts that composes from the new patterns using toPatternDefLANGUAGE_MAP in src/core/engine.tssrc/language/language-map.tsSee src/patterns/locale/pl/ and src/languages/pl/rules.ts for a complete example.
Contributions are welcome. Please follow Conventional Commits for commit messages — this project uses release-it with @release-it/conventional-changelog to automate releases.
Language pack contributions are especially welcome — German (Personalausweis, Steuer-ID), French (NIR, SIRET), Spanish (DNI/NIE) and others would significantly expand the tool's usefulness.
MIT — Adrian Wolczuk
Be the first to review this server!
by Modelcontextprotocol · Developer Tools
Read, search, and manipulate Git repositories programmatically
by Toleno · Developer Tools
Toleno Network MCP Server — Manage your Toleno mining account with Claude AI using natural language.
by mcp-marketplace · Developer Tools
Create, build, and publish Python MCP servers to PyPI — conversationally.