Server data from the Official MCP Registry
Deterministic prose linter for LLM text — flags AI slop, filler, buzzwords, hedges, clichés.
Deterministic prose linter for LLM text — flags AI slop, filler, buzzwords, hedges, clichés.
Valid MCP server (1 strong, 4 medium validity signals). 2 known CVEs in dependencies Package registry verified. Imported from the Official MCP Registry.
6 files analyzed · 3 issues found
Security scores are indicators to help you make informed decisions, not guarantees. Always review permissions before connecting any MCP server.
This plugin requests these system permissions. Most are normal for its category.
Add this to your MCP configuration file:
{
"mcpServers": {
"io-github-ahmedak-defluff": {
"args": [
"defluff"
],
"command": "uvx"
}
}
}From the project's GitHub README.
The deterministic slop check for AI-generated prose. Point it at a changelog, a doc, or an agent's own output and get back the filler phrases to cut — plus a CI exit code and a pinnable score, identical on every run. No model, no API key.
Every flagged span carries no information, so cutting them loses nothing. Clean text, same tool, passes straight through:
What makes defluff worth installing over a one-off grep is the engine around the list: bring your own phrases, per-project overlays, and an MCP server your agents pick up with no wiring.
pip install defluff
Or on macOS/Linux via Homebrew:
brew install ahmedak/defluff/defluff
That's it. No model download. No API key. Runs anywhere Python does.
# Lint a file — exit 1 on slop, 0 when clean
defluff lint essay.md
# Pipe text
cat draft.md | defluff lint
# Get a bare score for scripts (0.0 – 1.0)
defluff score essay.md
# Machine-readable JSON for downstream tooling
defluff lint essay.md --json
Exposes three tools so any MCP-aware agent can self-check prose without bespoke wiring — including its own draft, before returning it.
Zero-install via uvx (recommended) — pulls the package and the mcp extra on first run:
{
"mcpServers": {
"defluff": {
"command": "uvx",
"args": ["--from", "defluff[mcp]", "defluff-mcp"]
}
}
}
Or install it and run the entry point directly:
pip install "defluff[mcp]"
defluff-mcp
{
"mcpServers": {
"defluff": { "command": "defluff-mcp" }
}
}
Published to the MCP Registry as io.github.ahmedak/defluff (see server.json).
mcp-name: io.github.ahmedak/defluff
| Tool | Args | Returns |
|---|---|---|
slop_detect | text: str | slop_score, spans (text, category, weight, offsets), categories, lexicon_version |
slop_add | pattern: str, category: str, scope: "user"|"project" | adds a phrase to the lexicon overlay |
slop_ignore | pattern: str, scope: "user"|"project" | suppresses a phrase (e.g. domain jargon) |
slop_detect on a draft and revise the flagged phrases before returning it. Zero wiring, one session, no second model in the loop.Every "AI detector" tries to classify whether text was AI-generated — a hard, unreliable problem. defluff asks a different question: does this text contain removable filler? That's deterministic, and it's true whether a human or an LLM wrote "at the end of the day."
proselint is the closest prior art — deterministic, no model — but emits yes/no warnings rather than a tunable density score, and isn't built around a list you swap, overlay, or pin.
A grep over a word list gives you raw hits. defluff gives you what you'd otherwise have to build around that list:
"foster" won't fire inside "fostering".defluff lint draft.md --lexicon team-slop.md
# team-slop.md
- circle back
- low-hanging fruit
- boil the ocean
paradigm shift
One phrase per line (# comments and -/* markers ignored). These layer on top of the built-in defaults and report under a neutral custom category. For real categories and per-phrase weights, use a .json list (see Lexicon overlays).
defluff lint post.md --pack marketing-growth
defluff lint post.md --pack marketing-growth,ai-llm # stack several
| Pack | Catches | Pack | Catches |
|---|---|---|---|
corporate-linkedin | office jargon | crypto-web3 | crypto hype |
startup-vc | pitch-deck speak | pr-press-release | press-release boilerplate |
marketing-growth | hype copy | academic | research hedging |
ai-llm | LLM tells | wellness-selfhelp | influencer-speak |
social-media | X/Twitter engagement-bait |
List them with defluff packs. High-false-positive terms (e.g. pivot, detox) ship commented-out so they're inert until you opt in. See the packs README.
~130 curated patterns across five weighted categories, case-insensitive, whole-word matched:
| Category | What it catches | Examples |
|---|---|---|
ai-vocab | Words disproportionately overused by LLMs | delve, tapestry, nuanced, pivotal, robust, showcase |
cliche | Hollow idioms that add no information | at the end of the day, move the needle, circle back, game changer |
hedge | Empty qualifiers | it should be noted that, needless to say, basically, essentially |
corporate | Buzzword inflation | leverage, synergy, actionable insights, cutting-edge, scalable |
transition | Filler connectives LLMs reach for by default | furthermore, moreover, in conclusion, first and foremost |
Some AI tells are sentence shapes, not fixed phrases — the antithesis: it's not X, it's Y, not just a list but a runtime. A regex pattern layer catches these under a rhetoric category:
| Mode | What it looks for | Default | Why |
|---|---|---|---|
| Compound (confident) | full shape: it's not X, it's Y · not just X but Y | on | the second clause proves the rhetorical move — rarely a false alarm |
| Fragment (guessing) | bare X, not Y — e.g. a signal, not a verdict | off, --pack rhetoric | also fires on plain corrections (shipped Tuesday, not Wednesday) |
Each match counts as one unit toward the score regardless of length. Turning on fragment mode changes the lexicon hash.
defluff lint draft.md # compound antithesis caught by default
defluff lint draft.md --pack rhetoric # also catch the punchy "X, not Y" fragment
defluff lint draft.md --category rhetoric # gate CI on antithesis alone
import defluff
report = defluff.detect("It is worth noting that we should leverage synergies.")
print(report.slop_score) # 0.0 – 1.0
print(report.spans) # flagged phrase locations + categories
score = defluff.score(text) # bare float
clean = defluff.is_slop(text) # bool at default threshold
lex = defluff.load_lexicon() # pin for reproducible runs
score = defluff.score(text, lexicon=lex)
SlopReport fields:
| Field | Type | Notes |
|---|---|---|
slop_score | float | Clamped [0, 1] — use for thresholds |
slop_density | float | Raw unclamped — better gradient for reward loops |
spans | list[Span] | Per hit: text, category, weight, char offsets into the cleaned text (see Limitations) |
categories | dict[str, float] | Per-category density |
n_words | int | Token count |
low_confidence | bool | True when n_words < 20 |
lexicon_version | str | SHA prefix of resolved entry set |
The bundled lexicon is the baseline; layer on top without editing the package:
# "synergy" is slop on this machine
defluff lexicon add "synergy" --category corporate --scope user
# "leverage" is fine in this repo (finance context) — commit this with the repo
defluff lexicon rm "leverage" --scope project
git add .defluff/ignore.json && git commit -m "allow 'leverage' in finance context"
~/.config/defluff/) — machine-wide, not committed.defluff/ at git root) — per-repo, commit it for your whole teamWrites are atomic and cross-process locked; a corrupt overlay is warned and skipped — detect() never crashes.
Every resolved lexicon (bundled + overlays + packs) carries a short content hash, printed on every run (lexicon: 2cc05ba84457) and exposed as SlopReport.lexicon_version. Pass lexicon=defluff.load_lexicon() once and every call scores against the same ruler — pinnable for CI baselines and RL rewards, and auditable since the hash changes if and only if the resolved entry set changes. Each release ships a dated lexicon with a changelog (CHANGELOG.md); ai-vocab is expected to turn over release to release, cliche/hedge/corporate/transition are near-stable.
slop_score = flagged words ÷ total, weighted by category (ai-vocab counts a little more, transition a little less). 0.20 ≈ "a fifth of this text is listed filler." Usually 0.0–1.0; can edge slightly above on text that's almost nothing but slop. Instead of being a quality quantfier, its just a tripwire that drives the CI exit code.Pick a threshold based on how filler-dense the text is allowed to be:
--threshold | Meaning | Good for |
|---|---|---|
0.05 | ~5% filler — strict | marketing copy, landing pages, customer-facing text |
0.08 (default) | ~8% filler — a tripwire for triage | general prose, blog drafts |
0.12–0.20 | only flag heavy padding | technical docs that legitimately use robust, scalable, in order to |
The default 0.08 is provisional — hand-chosen, not yet calibrated on a labeled corpus (hence the [threshold provisional] tag). For a hard CI gate, set your own threshold and suppress your domain's vocabulary first (below).
Technical writing — API docs, ADRs, RFCs — legitimately uses words like
robust,scalable,in order to. Suppress your project's domain vocabulary first:defluff lexicon rm "scalable" --scope project defluff lexicon rm "in order to" --scope project git add .defluff/ignore.json && git commit -m "defluff: allow domain vocabulary"Then gate only on the categories you trust, not all five:
- name: Check AI-generated content for slop
# --category ai-vocab,hedge gates only on the highest-precision categories
run: cat generated_output.md | defluff lint --category ai-vocab,hedge --threshold 0.1
Exit code 1 fails the step, 0 passes.
# .pre-commit-config.yaml
repos:
- repo: local
hooks:
- id: defluff
name: defluff slop check
entry: defluff lint
language: system
types: [markdown]
A deterministic, non-differentiable scalar for filler density can be a small component of a reward mix — but it's gameable alone: a model optimized purely against a fixed phrase list learns to paraphrase the filler rather than remove it. Pair it with a real quality signal (human or LLM judge); we don't yet have a published training run showing it helps.
lex = defluff.load_lexicon() # pin once
reward = lambda text: -defluff.detect(text, lexicon=lex).slop_density # unclamped, better gradient
delta = defluff.compare(draft_v1, draft_v2, lexicon=lex)
# {"score_a": 0.31, "score_b": 0.18, "delta": -0.13,
# "improved": [...], "regressed": [...]} # set diff of flagged phrases, not a semantic diff
defluff lint [FILE] [--json] [--threshold FLOAT] [--category CATS] [--lexicon PATH] [--pack NAMES] [--no-project-overlay]
defluff score [FILE] [--pack NAMES] [--no-project-overlay]
defluff packs # list bundled domain packs
defluff lexicon list [--category CATEGORY] [--scope SCOPE] [--json]
defluff lexicon add PATTERN --category CATEGORY [--scope SCOPE] [--weight FLOAT]
defluff lexicon rm PATTERN [--scope SCOPE]
--category — comma-separated; only spans in those categories count toward the exit-code decision (still reports all hits). Valid: ai-vocab, cliche, hedge, corporate, transition, custom, rhetoric.--lexicon PATH — layers your phrases on top of the defaults. .txt/.md is one phrase per line (lands in custom); .json carries explicit categories and weights.--pack NAMES — comma-separated domain packs. rhetoric is reserved for the pattern pack, enabling the opt-in X, not Y antithesis fragment.Exit codes for defluff lint: 0 = clean · 1 = slop · 2 = bad input.
defluff is a deterministic matcher, not a trained classifier, so the metric that matters is precision — when it flags something, is it actually removable filler? On a 50-example hand-labeled set (eval/validation.jsonl) spanning clear slop, clean prose, and jargon-as-content traps (e.g. "the robust standard errors", "pivotal trials"), at the default threshold:
| Metric | Score | Reading |
|---|---|---|
| Precision | 1.00 | 0 false positives — clean prose and legitimate jargon were not flagged |
| Recall | 0.65 | bounded by lexicon coverage |
Reproduce: python eval/score.py eval/validation.jsonl
Misses are novel buzzwords the lexicon hasn't seen yet (e.g. "operationalize the ideation funnel") — the known limit of a list-based matcher, not noise. Recall on listed filler is 1.00 and will rise as the lexicon grows, but won't reach 1.00 against open-ended novel jargon without a semantic layer.
Caveats: the set is small and labeled by the author — a sanity check on precision, not an independently adjudicated benchmark.
"leverage" in a finance document is real content. Read the flagged spans; suppress false positives with defluff lexicon rm (adds to an ignore list, doesn't delete from the bundled lexicon).span.text, not raw offsets, against marked-up source.custom is read-only-via-file. Phrases from a --lexicon file land in custom, but defluff lexicon add --category custom is rejected — add only takes the five curated categories.low_confidence: true — the denominator is floored at 20 so one phrase can't read as 100% slop on a two-sentence input.The easiest contribution is adding a missed filler phrase:
src/defluff/data/lexicon-v1.json with the right categorypytest — smoke tests catch boundary errorsSee CONTRIBUTING.md for code setup and guidelines.
MIT
Be the first to review this server!
by Modelcontextprotocol · Developer Tools
Web content fetching and conversion for efficient LLM usage
by Toleno · Developer Tools
Toleno Network MCP Server — Manage your Toleno mining account with Claude AI using natural language.
by mcp-marketplace · Developer Tools
Create, build, and publish Python MCP servers to PyPI — conversationally.