Six months of shipping Model Context Protocol servers against real Claude clients — not tutorial fixtures, live ones with timeouts, retries, and customers downstream. The gap between "my hello-world tool returns a string" and "this server has been up thirty days handling concurrent calls" is bigger than the docs let on. Most of it is not SDK bugs; it's the shape of the protocol leaking into your assumptions.
These are the five gotchas I keep hitting on new servers, and on new contributors' servers when I'm asked to look at one. Each one cost me hours before I had a name for it. If you're past the first server, one of these is already in your stack.
Symptom. The client connects, lists tools fine, then on the first tool call you get one of: a hard disconnect with no error, a SyntaxError: Unexpected token from the client's JSON parser, or — worst — silent message drops where the call appears to succeed but the result never arrives. Restarting the server "fixes" it for one call.
Root cause. On the stdio transport (what the Claude desktop client and most local runners use), stdout is the JSON-RPC frame. Anything written to stdout that isn't a framed JSON-RPC message corrupts the stream. The corrupting writer is rarely your own console.log — by ship time you've stripped those. It's a dependency: a DB driver printing a deprecation warning on connect, dotenv complaining about an override, a transitive package's banner. All default to stdout.
Fix. Treat stdout as a binary pipe and re-route everything else to stderr before importing anything that writes to stdout.
// server.ts — first lines of your entrypoint, BEFORE any other import.
const origStdoutWrite = process.stdout.write.bind(process.stdout);
process.stdout.write = ((chunk: string | Uint8Array, ...rest: unknown[]) => {
const s = typeof chunk === "string" ? chunk : Buffer.from(chunk).toString("utf8");
// Only let JSON-RPC frames through; everything else goes to stderr.
if (s.startsWith("{") || s.startsWith("Content-Length:")) return origStdoutWrite(chunk, ...(rest as []));
process.stderr.write(chunk, ...(rest
Receipt. I shipped an internal customer-data MCP that worked on my machine for a week, then broke on a teammate's because his Postgres driver version printed a one-line connect warning. Same code, different node_modules. The shim above is the first 20 lines of every server I write now.
Symptom. Your tool handler crashes with TypeError: Cannot read properties of undefined or returns nonsense because the input you assumed was a string is null, or the array you assumed had at least one element is []. In the worst case the tool returns success and writes garbage downstream.
Root cause. The MCP TypeScript SDK gives you inputSchema as JSON Schema, but inside the handler the params are typed unknown. There is no runtime enforcement at the framework boundary that the inbound arguments match the schema you declared. Clients send malformed payloads when the model hallucinates an argument shape, and the SDK forwards them straight to your handler.
Fix. Parse at the boundary with Zod, return a structured error instead of throwing, and derive the JSON Schema from the same Zod object so they can't drift.
import { z } from "zod";
import { zodToJsonSchema } from "zod-to-json-schema";
const SearchInput = z.object({
query: z.string().min(1).max(500),
limit: z.number().int().min(1).max(100).default(10),
});
server.setRequestHandler(CallToolRequestSchema, async (req) => {
if (req.params.name !== "search") {
return { content: [{ type: "text", text: `unknown tool: ${req.
Receipt. A buyer-bot webhook handler I shipped in the spring kept returning empty results until I logged the raw req.params.arguments — Claude was sending { query: ["foo"] } (an array) about one call in twenty. The Zod parse caught it; the structured error let Claude self-correct on the next call instead of getting a 500.
Symptom. A tool that does something with a side effect — insert a row, send a webhook, append to a file, charge a card — fires twice (or three times) for what looks like one call from the user. Logs show the same arguments with different request IDs, seconds apart.
Root cause. When a tool call exceeds the client's timeout, the client treats it as failed and retries. Your server can't distinguish the retry from a genuine new call — the arguments are identical because the model produced them identically. Non-idempotent handlers double-fire; it looks like an agent bug but it's a protocol-shape bug.
Fix. Derive an idempotency key from the call inputs, cache the result for a short TTL, and short-circuit duplicates.
import { createHash } from "node:crypto";
const inflight = new Map<string, Promise<unknown>>();
const recent = new Map<string, { result: unknown; expires: number }>();
const TTL_MS = 60_000;
const idemKey = (name: string, args: unknown) =>
createHash("sha256").update(name + JSON.stringify(args)).digest("hex");
async function
Receipt. A notification tool I shipped posted to a webhook on every call; for a week the team channel got duplicate posts on slow calls and nobody could figure out why the agent was "spamming". It wasn't the agent — it was a 30-second downstream crossing the client timeout. Idempotency key + 60-second TTL, gone.
Symptom. Two tool calls that touch the same file, row, or counter race; one of them silently wins and the other's write is lost. The user sees inconsistent state the next time they read. There is no error.
Root cause. MCP does not serialize tool calls. A client (or multi-agent setup fanning out through one client) can dispatch parallel calls, and the SDK runs your handlers concurrently. Two handlers doing read → mutate → write against the same resource is a classic check-then-act race. Nothing in the protocol warns you.
Fix. Per-resource mutex inside the server. For anything you can't redesign as append-only, serialize writes to it with a keyed lock.
const locks = new Map<string, Promise<void>>();
async function withLock<T>(key: string, fn: () => Promise<T>): Promise<T> {
const prev = locks.get(key) ?? Promise.resolve();
let release!: () => void;
const next = new Promise<void>((r) => (release = r));
locks.set(key, prev.then(()
If the contended resource is a database row, use a DB-level optimistic lock (version column + conditional update) instead — the in-process mutex stops working the moment you run more than one server instance.
Receipt. A shared-notes MCP I shipped to a small internal team lost about one note in fifty for a month before anyone noticed; two teammates would ask the assistant to append at the same time and one append clobbered the other. Keyed mutex on the note-ID killed the loss rate to zero overnight.
Symptom. During dev, after a few hot-reloads of your server, your machine starts complaining about too many open files, or the client wedges and won't reconnect without a full restart. In prod, after a crash-loop, the supervisor reports the process exited but lsof shows the stdio pipes still held.
Root cause. On stdio transport the parent (Claude desktop, editor, dev runner) opened a pair of OS pipes to talk to your server. When the server exits, the parent-side pipe FDs stay open until the parent notices and reaps the child. Hard exits (uncaught exception, SIGKILL from a hot-reloader) skip the SDK's transport-close path, and the parent's read end blocks on a pipe that will never get more data. Ten hot-reloads in, you're out of FDs.
Fix. Explicit close handlers on the lifecycle signals, a best-effort flush on exit, and a server-side keepalive ping so dead transports are detected fast on the client side too.
import { StdioServerTransport } from "@modelcontextprotocol/sdk/server/stdio.js";
const transport = new StdioServerTransport();
await server.connect(transport);
let shuttingDown = false;
async function shutdown(signal: string) {
if (shuttingDown) return;
shuttingDown = true;
process.stderr.write(`[mcp] ${signal}, closing transport\n`);
try { await server.close(); await transport.close(); }
catch (e) { process.stderr.write(`[mcp] shutdown error: ${(e as
If you're running under a hot-reloader in dev, make sure it sends SIGTERM (not SIGKILL) and waits for the process to actually exit before starting the new one. tsx watch and node --watch do this correctly; some homegrown nodemon configs do not.
Receipt. I lost half a day to "the client stops responding after a while" on a server I was iterating with a sloppy reload script — it was sending SIGKILL and racing the new process. Switching to SIGTERM with a 2-second drain window killed the symptom, and the shutdown handler above made the same code safe in prod where the supervisor sends SIGTERM on deploy.
None of this is a critique of the protocol — JSON-RPC over stdio is a sensible substrate and the SDK does the heavy lifting. These are the seams that show up when you push past the example servers, and they show up the same way for everyone.
If you want the patterns above (and the rest of the playbook) as a guide instead of a blog post, see https://albinogeek.com/#offers.