10 Tips for Shipping Production MCP Servers

A free preview from the Claude Code Playbook — Albino Geek Services Ltd.

1. Nail stdio framing before anything else

Every MCP server over stdio speaks JSON-RPC 2.0 on newline-delimited frames. A single extra newline inside a result payload corrupts the frame boundary and leaves the client in a stuck read loop — no timeout, no recovery.

Serialize all result content to a single line (JSON.stringify with no indent), and write exactly one trailing \n per frame.

Write a five-line framing test that sends a crafted response and asserts the client parses it before you ship.

2. Mirror the JSON-RPC error envelope exactly

Clients parse {jsonrpc:"2.0", id, error:{code, message}} and nothing else on failure paths. If your server panics and writes a plain text stack trace instead, the client throws a parse error and your actual failure is invisible.

Wrap your top-level handler in a try/catch that always emits a valid error envelope — use code -32603 (Internal error) as the catch-all. Structured logging of the real error goes to stderr; stdout is for protocol only.

3. Version your tool schemas from day one

Claude caches tool definitions per session. If you rename a parameter mid-session, the client still sends the old name and your handler receives undefined.

Treat tool input schemas the same as REST API contracts: bump a version field when you make breaking changes, and keep old parameter names as deprecated aliases for at least one release cycle. The cost of schema discipline on day one is thirty minutes; the cost of retrofitting it after callers exist is a week.

4. Prevent tool-name collisions across servers

When an agent loads multiple MCP servers, all tool names land in one flat namespace. read_file from your server and read_file from a community server silently collide — Claude picks one, unpredictably.

Prefix every tool name with a short namespace: agt_read_file, db_read_row. Two underscores signal convention; one is enough. Document your namespace in your server's README so integrators know before they connect.

5. Make every tool idempotent where possible

Agents retry on transient failures without asking. If your create_record tool is not idempotent, the agent may call it twice and you silently create two records.

Accept a client-generated idempotency_key parameter on any mutating tool. Store it with a short TTL (24 hours is enough) and return the original result on a duplicate key instead of executing again. Idempotency is not optional on tools that touch money, email, or external state.

6. Choose transport based on deployment topology

stdio is for local tools running in the same process group as the agent — file access, shell commands, local databases. HTTP+SSE is for remote tools or any tool that needs to survive an agent restart without losing state.

Never use stdio for a tool that talks to a remote API; network latency on a stdio hop adds up fast and there is no connection pooling. Pick the transport once per server and document it in the server manifest so integrators do not have to read source.

7. Set explicit timeout budgets on every tool call

The default MCP timeout in most clients is generous enough that a hung database query will block the agent for minutes before the call is abandoned. Set a timeout on every outbound I/O call in your tool handler — database queries, HTTP requests, subprocess calls.

Return a structured error before the client times out: {error:"upstream_timeout",retryable:true,timeout_ms:5000}. The retryable field lets the agent decide whether to retry immediately or surface the failure to the user.

8. Log structured JSON to stderr, nothing to stdout

stdout is the protocol channel. Any non-protocol write to stdout corrupts the frame stream. All diagnostic output — request IDs, timing, upstream errors — goes to stderr as newline-delimited JSON.

Include at minimum: {ts,level,tool,duration_ms,error?} on every tool invocation. DigitalOcean App Platform and most container runtimes capture stderr separately; structured lines give you grep-able, alertable observability without extra infrastructure.

9. Truncate large tool results at the boundary

Returning a 200 KB file content blob in a tool result fills the context window and degrades agent reasoning for the rest of the session. Decide on a result size budget per tool (4 KB is a safe default; 16 KB for content tools).

When the result exceeds the budget, truncate and append a {truncated:true,total_bytes:N,hint:"use offset/limit params to page"} envelope at the end of the result. Never let a single tool call consume more than 10% of the model's context window.

10. Classify errors before surfacing them

Not every error deserves the same treatment. Transient errors (network timeout, rate limit, lock contention) should be returned with retryable: true so the agent can back off and retry. Permanent errors (invalid input, not found, permission denied) should be returned with retryable: false so the agent stops retrying and surfaces the issue to the user.

Unknown errors default to retryable: false — it is safer to surface a failure than to loop forever. Define an error taxonomy in your server's types file and enforce it; ad-hoc error strings are unactionable for agents and operators alike.

Want the full picture?

The Claude Code Playbook covers end-to-end MCP server architecture, production hardening, testing patterns, and deployment automation — everything you need to ship a server you can trust in production.

← Back to the preview

Albino Geek Services Ltd. · BC, Canada