Which MCP servers actually earn their place in a delivery workflow.
Over 10,000 MCP servers exist. Most are weekend projects. Here is the three-filter test Sant uses to decide which ones earn a place in real client delivery work.
20 April 20269 min read
Over ten thousand MCP servers are listed across directories right now. Most of them are weekend projects that break the first time they are used in a real production context. A search for "MCP servers every developer needs" returns lists of thirty, thirty-five, fifty. Some are genuinely useful. Most are demos that work in a screencapture and nowhere else.
The Model Context Protocol itself is sound. The idea that an AI assistant should be able to connect to external systems in a standardised way, reading from databases, writing to repositories, querying APIs, managing infrastructure, is genuinely useful for delivery work. The protocol is not the problem. The ecosystem around it is in the same state any ecosystem is two years after a new standard ships: full of options, short on production-tested signal.
This post covers the three filters Sant uses to decide whether an MCP server earns a place in a delivery workflow, names the ones that pass, names the ones that do not, and explains the Sant compliance checker server as an example of why building your own sometimes outperforms installing someone else's.
What an MCP server actually does in a real delivery context
The demo version of an MCP server is impressive. Connect it to Claude, type a natural language request, watch it retrieve data or trigger an action. This is real. It works. It is also not what delivery work looks like.
Delivery work is repetitive, high-stakes, and intolerant of unreliable tools. A developer running a pull request review workflow, a technical lead auditing a codebase before a WordPress submission, a team managing deployments across multiple client environments, none of them can absorb tool failures mid-process. An MCP server that works most of the time is a liability, not an asset.
The useful question is not "what can this server do?" It is "what specific friction does this server remove from a real recurring task, and is it reliable enough that adding it to the workflow creates net positive value rather than a new category of failure to manage?"
Those are different questions, and most MCP servers on any list cannot answer the second one.
Filter one: does it remove a specific, named friction. Not a general capability, a specific friction. "Searching GitHub" is a capability. "Retrieving all open pull requests on a client repository filtered by label, without leaving the editor" is a specific friction. The distinction matters because capability-framing lets any server onto the list, while friction-framing requires the server to justify its place in an actual workflow.
If the server cannot be described in terms of a task that currently takes longer than it should, it is not solving a delivery problem. It is adding a dependency to satisfy curiosity about what AI-assisted development could look like in theory.
Filter two: is it reliable enough to trust. Reliability here means: works consistently across real use conditions, including authentication edge cases, rate limits, partial failures, and network interruptions. A server that works perfectly in a development environment and fails when a token expires mid-session is not reliable. A server that handles five of the ten documented actions correctly and silently fails on the other five is not reliable.
The test is straightforward: use the server for real work across multiple sessions, under conditions that are not ideal, and observe the failure rate. A server with any meaningful silent failure mode does not belong in a delivery workflow.
Filter three: no dependency you cannot explain. An MCP server that connects to an external service, stores credentials, or operates through a cloud function creates a dependency. That dependency has to be legible. What data does it access? Where does it send requests? What happens if the service goes down? Who operates the server, and what is the liability model if it handles client data?
For individual developers working on personal projects, this filter is light. For delivery teams working on client systems, it is not. A server that touches client credentials, client repositories, or client infrastructure needs to be understood before it is deployed, not after.
The servers that earn their place
Each entry below is in the stack because it passed all three filters in real delivery use.
GitHub. Removes the friction of context-switching between editor and browser for repository operations. Pull request review, issue retrieval, branch management, and commit inspection are all tasks that happen repeatedly in a delivery workflow. The official GitHub MCP server is reliable, the authentication model is well-documented, and the dependency is one most delivery teams already have. This is the first server worth installing and the last one worth removing.
Filesystem. Read and write local files without leaving the assistant context. For tasks like auditing a codebase, generating documentation from source, or batch editing configuration files, this removes genuine friction. It operates locally, with no external dependency, and the risk model is visible because it is your own machine.
PostgreSQL. Direct database querying for any delivery workflow that involves data analysis or migration work. The friction it removes is the round-trip between the editor and a separate database client. Reliable when configured correctly, the dependency is the database connection itself.
Playwright. Browser automation for testing and auditing work. Useful specifically for end-to-end test scripting and for accessibility audits. Not a general-purpose server, but in the contexts where it applies it removes significant manual effort. The dependency is local, which keeps the risk model clean.
Stripe. For any delivery team that manages subscription or payment systems for clients. Retrieving subscription states, debugging payment events, and auditing webhook activity without leaving the development context removes real friction. The authentication dependency requires careful management.
Memory. Persistent context across sessions for ongoing projects. Without persistent memory, every session with a long-running client engagement requires re-establishing context that was established in a previous session. A memory server that correctly persists and retrieves project context reduces that overhead. The caveat is that the specific server matters: session-level memory that resets is not the same as persistent memory that survives across restarts.
Cloudflare. For delivery teams managing client infrastructure on Cloudflare. DNS management, worker deployment, and cache operations are tasks that happen frequently enough in some delivery contexts to justify the server. The dependency is the Cloudflare account, which most teams using Cloudflare infrastructure already manage.
The servers that do not
AI-optimised search servers. Useful for research tasks and answer generation. Not useful in a production delivery workflow where the task is deterministic and the source of truth is a specific codebase or database, not the open web. Adding a search server to a delivery workflow introduces latency and variability where neither is helpful.
Notion, Linear, and similar workspace servers. Useful for project management and documentation retrieval, in contexts where those tools are the actual source of truth. In many delivery workflows, they are not. A server that retrieves project notes from a workspace that is not reliably maintained is a server that retrieves stale information. The value of the server depends entirely on the discipline of the team maintaining the underlying tool.
The novelty servers. Any server whose primary use case is demonstrating what MCP can do rather than removing a friction that exists in real work. These are the majority of the ten thousand. They are fine as educational tools. They do not belong in a delivery stack.
The Sant compliance checker server
The WordPress Plugin Compliance Checker that Sant Limited published includes an MCP server. The reason it was built rather than assembled from existing servers is that the specific friction it removes, auditing a WordPress plugin against WordPress.org submission rules and WPCS standards and generating a structured report, did not have a reliable existing solution.
The server takes a plugin directory as input, runs the compliance analysis, and returns a structured result that an AI assistant can reason over and act on. It was built to be used in Sant's own release workflow, which meant the reliability bar was set by the cost of a failed compliance check going unnoticed before a submission.
Building a narrow, well-specified server for a specific recurring task produces a more reliable tool than adapting a general-purpose server to cover the same ground. The compliance checker server is an example of when "build your own" is the right answer to an MCP tooling question, not because the general-purpose options are bad, but because the specific friction required a specific solution.
Three questions. Run them on every server currently installed.
What specific friction does this server remove, in which specific recurring task? If the answer is vague, the server is occupying a slot in the configuration without earning it.
When was this server last used in real work, not a demo or experiment? If the answer is more than thirty days ago, the server is maintenance overhead without current benefit.
What does this server touch, and what is the dependency chain? If the answer requires investigation to produce, the server was added before the risk model was understood.
Any server that cannot answer all three questions clearly either gets removed or gets a defined probation period with a specific task it is expected to demonstrate value on.
The goal is a small stack of reliable tools, not a large stack of available ones. Over ten thousand servers exist. A delivery workflow needs eight to twelve, well-understood and well-maintained.
Frequently asked questions
How do you decide between installing an existing MCP server and building your own. If an existing server passes all three filters for the specific friction you are trying to remove, install the existing one. If no existing server passes all three, or if the friction is narrow enough that a general-purpose server would require significant workarounds to cover it, build your own. The compliance checker is an example of a case where the friction was specific enough that a purpose-built server was cheaper to maintain than adapting a general one.
Is there a risk in having too many MCP servers installed. Yes. Each server is a dependency, an authentication credential to manage, and a potential failure mode. A large stack of unreliable or underused servers creates more noise than signal. Most developers who maintain a large MCP configuration spend more time managing it than benefiting from it. Start with the smallest set that covers the real frictions, and add servers only when a specific use case justifies it.
Do these servers work with Claude Code, Claude Desktop, and other MCP clients. The configuration format is the same across Claude Desktop, Claude Code, and most other MCP-compatible clients. A server that works in one client works in the others without modification. The transport type matters: HTTP is the recommended option for remote servers, stdio for local processes. SSE transport is deprecated.
What is the security posture for MCP servers that access client systems. Any server that accesses client repositories, databases, or credentials requires the same security controls as any other integration with those systems. Authentication tokens should be scoped to the minimum permissions required, rotated on a schedule, and not shared between projects. Servers that touch client data should be documented as part of the client engagement's data handling posture.
Closing
The protocol is useful. The ecosystem is noisy. Most lists of MCP servers worth installing are lists of capabilities, not lists of frictions removed. The three-filter test, specific friction, reliability, explicable dependency, produces a shorter and more useful list.
The Sant stack is eight to twelve servers, depending on the engagement. Each one was added because it removed a real friction from a recurring task. The compliance checker was built because no existing server did the specific job. That is the shape of a production MCP configuration rather than a demo configuration, and it is the difference between tooling that helps delivery work and tooling that adds to the maintenance load.