5 production gotchas after shipping MCP servers for 6 months

Six months of stdio framing bugs, FD leaks, and retry-storm misfires — the gotchas that cost me hours before I had names for them.

May 12, 2026·6 min read·By Damon Blais

MCP Claude Code Production Debugging

Share:X / Twitter Hacker News Lobsters

Six months of shipping Model Context Protocol servers against real Claude clients — not tutorial fixtures, live ones with timeouts, retries, and customers downstream. The gap between "my hello-world tool returns a string" and "this server has been up thirty days handling concurrent calls" is bigger than the docs let on. Most of it is not SDK bugs; it's the shape of the protocol leaking into your assumptions.

These are the five gotchas I keep hitting on new servers, and on new contributors' servers when I'm asked to look at one. Each one cost me hours before I had a name for it. If you're past the first server, one of these is already in your stack.

1. stdout pollution kills the JSON-RPC channel

Symptom. The client connects, lists tools fine, then on the first tool call you get one of: a hard disconnect with no error, a SyntaxError: Unexpected token from the client's JSON parser, or — worst — silent message drops where the call appears to succeed but the result never arrives. Restarting the server "fixes" it for one call.

Root cause. On the stdio transport (what the Claude desktop client and most local runners use), stdout is the JSON-RPC frame. Anything written to stdout that isn't a framed JSON-RPC message corrupts the stream. The corrupting writer is rarely your own console.log — by ship time you've stripped those. It's a dependency: a DB driver printing a deprecation warning on connect, dotenv complaining about an override, a transitive package's banner. All default to stdout.

Fix. Treat stdout as a binary pipe and re-route everything else to stderr before importing anything that writes to stdout.

// server.ts — first lines of your entrypoint, BEFORE any other import.
const origStdoutWrite = process.stdout.write.bind(process.stdout);
process.stdout.write = ((chunk: string | Uint8Array, ...rest: unknown[]) => {
  const s = typeof chunk === "string" ? chunk : Buffer.from(chunk).toString("utf8");
  // Only let JSON-RPC frames through; everything else goes to stderr.
  if (s.startsWith("{") || s.startsWith("Content-Length:")) return origStdoutWrite(chunk, ...(rest as []));
  process.stderr.write(chunk, ...(rest as []));
  return true;
}) as typeof process.stdout.write;
 
const toStderr = (...args: unknown[]) => process.stderr.write(args.join(" ") + "\n");
console.log = toStderr; console.info = toStderr; console.warn = toStderr;
 
import { Server } from "@modelcontextprotocol/sdk/server/index.js";

Receipt. I shipped an internal customer-data MCP that worked on my machine for a week, then broke on a teammate's because his Postgres driver version printed a one-line connect warning. Same code, different node_modules. The shim above is the first 20 lines of every server I write now.

2. Schema validation gap between the SDK and real payloads

Symptom. Your tool handler crashes with TypeError: Cannot read properties of undefined or returns nonsense because the input you assumed was a string is null, or the array you assumed had at least one element is []. In the worst case the tool returns success and writes garbage downstream.

Root cause. The MCP TypeScript SDK gives you inputSchema as JSON Schema, but inside the handler the params are typed unknown. There is no runtime enforcement at the framework boundary that the inbound arguments match the schema you declared. Clients send malformed payloads when the model hallucinates an argument shape, and the SDK forwards them straight to your handler.

Fix. Parse at the boundary with Zod, return a structured error instead of throwing, and derive the JSON Schema from the same Zod object so they can't drift.

import { z } from "zod";
import { zodToJsonSchema } from "zod-to-json-schema";
 
const SearchInput = z.object({
  query: z.string().min(1).max(500),
  limit: z.number().int().min(1).max(100).default(10),
});
 
server.setRequestHandler(CallToolRequestSchema, async (req) => {
  if (req.params.name !== "search") {
    return { content: [{ type: "text", text: `unknown tool: ${req.params.name}` }], isError: true };
  }
  const parsed = SearchInput.safeParse(req.params.arguments);
  if (!parsed.success) {
    // Return as an MCP error result, NOT a thrown exception — Claude can read this and retry sanely.
    return {
      content: [{ type: "text", text: `invalid arguments: ${parsed.error.message}` }],
      isError: true,
    };
  }
  const rows = await doSearch(parsed.data);
  return { content: [{ type: "text", text: JSON.stringify(rows) }] };
});
 
// And register the tool with the schema derived from the same Zod object:
const tools = [{ name: "search", description: "Search the index", inputSchema: zodToJsonSchema(SearchInput) }];

Receipt. A buyer-bot webhook handler I shipped in the spring kept returning empty results until I logged the raw req.params.arguments — Claude was sending { query: ["foo"] } (an array) about one call in twenty. The Zod parse caught it; the structured error let Claude self-correct on the next call instead of getting a 500.

3. Tool-call timeouts vs Claude's retry semantics

Symptom. A tool that does something with a side effect — insert a row, send a webhook, append to a file, charge a card — fires twice (or three times) for what looks like one call from the user. Logs show the same arguments with different request IDs, seconds apart.

Root cause. When a tool call exceeds the client's timeout, the client treats it as failed and retries. Your server can't distinguish the retry from a genuine new call — the arguments are identical because the model produced them identically. Non-idempotent handlers double-fire; it looks like an agent bug but it's a protocol-shape bug.

Fix. Derive an idempotency key from the call inputs, cache the result for a short TTL, and short-circuit duplicates.

import { createHash } from "node:crypto";
 
const inflight = new Map<string, Promise<unknown>>();
const recent = new Map<string, { result: unknown; expires: number }>();
const TTL_MS = 60_000;
 
const idemKey = (name: string, args: unknown) =>
  createHash("sha256").update(name + JSON.stringify(args)).digest("hex");
 
async function withIdempotency<T>(name: string, args: unknown, fn: () => Promise<T>): Promise<T> {
  const key = idemKey(name, args);
  const cached = recent.get(key);
  if (cached && cached.expires > Date.now()) return cached.result as T;
  const existing = inflight.get(key);
  if (existing) return existing as Promise<T>;
  const p = fn().then((result) => {
    recent.set(key, { result, expires: Date.now() + TTL_MS });
    inflight.delete(key);
    if (recent.size > 1000) recent.delete(recent.keys().next().value as string);
    return result;
  });
  inflight.set(key, p);
  return p;
}
 
// Usage:
const result = await withIdempotency("send_email", parsed.data, () => mailgun.send(parsed.data));

Receipt. A notification tool I shipped posted to a webhook on every call; for a week the team channel got duplicate posts on slow calls and nobody could figure out why the agent was "spamming". It wasn't the agent — it was a 30-second downstream crossing the client timeout. Idempotency key + 60-second TTL, gone.

4. Concurrent tool execution against shared mutable state

Symptom. Two tool calls that touch the same file, row, or counter race; one of them silently wins and the other's write is lost. The user sees inconsistent state the next time they read. There is no error.

Root cause. MCP does not serialize tool calls. A client (or multi-agent setup fanning out through one client) can dispatch parallel calls, and the SDK runs your handlers concurrently. Two handlers doing read → mutate → write against the same resource is a classic check-then-act race. Nothing in the protocol warns you.

Fix. Per-resource mutex inside the server. For anything you can't redesign as append-only, serialize writes to it with a keyed lock.

const locks = new Map<string, Promise<void>>();
 
async function withLock<T>(key: string, fn: () => Promise<T>): Promise<T> {
  const prev = locks.get(key) ?? Promise.resolve();
  let release!: () => void;
  const next = new Promise<void>((r) => (release = r));
  locks.set(key, prev.then(() => next));
  try { await prev; return await fn(); }
  finally { release(); }
}
 
// Usage: every write against a given list-id serializes on its own key.
return withLock(`todo:${listId}`, async () => {
  const list = await readList(listId);
  list.push(item);
  await writeList(listId, list);
  return { content: [{ type: "text", text: "ok" }] };
});

If the contended resource is a database row, use a DB-level optimistic lock (version column + conditional update) instead — the in-process mutex stops working the moment you run more than one server instance.

Receipt. A shared-notes MCP I shipped to a small internal team lost about one note in fifty for a month before anyone noticed; two teammates would ask the assistant to append at the same time and one append clobbered the other. Keyed mutex on the note-ID killed the loss rate to zero overnight.

5. Long-lived client lifecycle vs server restart leaks FDs

Symptom. During dev, after a few hot-reloads of your server, your machine starts complaining about too many open files, or the client wedges and won't reconnect without a full restart. In prod, after a crash-loop, the supervisor reports the process exited but lsof shows the stdio pipes still held.

Root cause. On stdio transport the parent (Claude desktop, editor, dev runner) opened a pair of OS pipes to talk to your server. When the server exits, the parent-side pipe FDs stay open until the parent notices and reaps the child. Hard exits (uncaught exception, SIGKILL from a hot-reloader) skip the SDK's transport-close path, and the parent's read end blocks on a pipe that will never get more data. Ten hot-reloads in, you're out of FDs.

Fix. Explicit close handlers on the lifecycle signals, a best-effort flush on exit, and a server-side keepalive ping so dead transports are detected fast on the client side too.

import { StdioServerTransport } from "@modelcontextprotocol/sdk/server/stdio.js";
 
const transport = new StdioServerTransport();
await server.connect(transport);
 
let shuttingDown = false;
async function shutdown(signal: string) {
  if (shuttingDown) return;
  shuttingDown = true;
  process.stderr.write(`[mcp] ${signal}, closing transport\n`);
  try { await server.close(); await transport.close(); }
  catch (e) { process.stderr.write(`[mcp] shutdown error: ${(e as Error).message}\n`); }
  process.exit(0);
}
 
process.on("SIGINT", () => shutdown("SIGINT"));
process.on("SIGTERM", () => shutdown("SIGTERM"));
process.on("uncaughtException", (e) => {
  process.stderr.write(`[mcp] uncaught: ${e.stack ?? e.message}\n`);
  void shutdown("uncaughtException");
});

If you're running under a hot-reloader in dev, make sure it sends SIGTERM (not SIGKILL) and waits for the process to actually exit before starting the new one. tsx watch and node --watch do this correctly; some homegrown nodemon configs do not.

Receipt. I lost half a day to "the client stops responding after a while" on a server I was iterating with a sloppy reload script — it was sending SIGKILL and racing the new process. Switching to SIGTERM with a 2-second drain window killed the symptom, and the shutdown handler above made the same code safe in prod where the supervisor sends SIGTERM on deploy.

None of this is a critique of the protocol — JSON-RPC over stdio is a sensible substrate and the SDK does the heavy lifting. These are the seams that show up when you push past the example servers, and they show up the same way for everyone.

If you want the patterns above (and the rest of the playbook) as a guide instead of a blog post, see https://www.albinogeek.com/#offers.

Share:X / Twitter Hacker News Lobsters

Damon Blais

Operator, Albino Geek Services Ltd.

Runs MCP servers in daily production. Ships open-source MCP tooling and orchestrates parallel agentic projects across Claude Code and multi-agent stacks — not toy demos.

5 production gotchas after shipping MCP servers for 6 months

Six months of stdio framing bugs, FD leaks, and retry-storm misfires — the gotchas that cost me hours before I had names for them.

May 12, 2026·6 min read·By Damon Blais

MCP Claude Code Production Debugging

Share:X / Twitter Hacker News Lobsters

1. stdout pollution kills the JSON-RPC channel

Fix. Treat stdout as a binary pipe and re-route everything else to stderr before importing anything that writes to stdout.

// server.ts — first lines of your entrypoint, BEFORE any other import.
const origStdoutWrite = process.stdout.write.bind(process.stdout);
process.stdout.write = ((chunk: string | Uint8Array, ...rest: unknown[]) => {
  const s = typeof chunk === "string" ? chunk : Buffer.from(chunk).toString("utf8");
  // Only let JSON-RPC frames through; everything else goes to stderr.
  if (s.startsWith("{") || s.startsWith("Content-Length:")) return origStdoutWrite(chunk, ...(rest as []));
  process.stderr.write(chunk, ...(rest as []));
  return true;
}) as typeof process.stdout.write;
 
const toStderr = (...args: unknown[]) => process.stderr.write(args.join(" ") + "\n");
console.log = toStderr; console.info = toStderr; console.warn = toStderr;
 
import { Server } from "@modelcontextprotocol/sdk/server/index.js";

2. Schema validation gap between the SDK and real payloads

Fix. Parse at the boundary with Zod, return a structured error instead of throwing, and derive the JSON Schema from the same Zod object so they can't drift.

import { z } from "zod";
import { zodToJsonSchema } from "zod-to-json-schema";
 
const SearchInput = z.object({
  query: z.string().min(1).max(500),
  limit: z.number().int().min(1).max(100).default(10),
});
 
server.setRequestHandler(CallToolRequestSchema, async (req) => {
  if (req.params.name !== "search") {
    return { content: [{ type: "text", text: `unknown tool: ${req.params.name}` }], isError: true };
  }
  const parsed = SearchInput.safeParse(req.params.arguments);
  if (!parsed.success) {
    // Return as an MCP error result, NOT a thrown exception — Claude can read this and retry sanely.
    return {
      content: [{ type: "text", text: `invalid arguments: ${parsed.error.message}` }],
      isError: true,
    };
  }
  const rows = await doSearch(parsed.data);
  return { content: [{ type: "text", text: JSON.stringify(rows) }] };
});
 
// And register the tool with the schema derived from the same Zod object:
const tools = [{ name: "search", description: "Search the index", inputSchema: zodToJsonSchema(SearchInput) }];

3. Tool-call timeouts vs Claude's retry semantics

Fix. Derive an idempotency key from the call inputs, cache the result for a short TTL, and short-circuit duplicates.

import { createHash } from "node:crypto";
 
const inflight = new Map<string, Promise<unknown>>();
const recent = new Map<string, { result: unknown; expires: number }>();
const TTL_MS = 60_000;
 
const idemKey = (name: string, args: unknown) =>
  createHash("sha256").update(name + JSON.stringify(args)).digest("hex");
 
async function withIdempotency<T>(name: string, args: unknown, fn: () => Promise<T>): Promise<T> {
  const key = idemKey(name, args);
  const cached = recent.get(key);
  if (cached && cached.expires > Date.now()) return cached.result as T;
  const existing = inflight.get(key);
  if (existing) return existing as Promise<T>;
  const p = fn().then((result) => {
    recent.set(key, { result, expires: Date.now() + TTL_MS });
    inflight.delete(key);
    if (recent.size > 1000) recent.delete(recent.keys().next().value as string);
    return result;
  });
  inflight.set(key, p);
  return p;
}
 
// Usage:
const result = await withIdempotency("send_email", parsed.data, () => mailgun.send(parsed.data));

4. Concurrent tool execution against shared mutable state

Fix. Per-resource mutex inside the server. For anything you can't redesign as append-only, serialize writes to it with a keyed lock.

const locks = new Map<string, Promise<void>>();
 
async function withLock<T>(key: string, fn: () => Promise<T>): Promise<T> {
  const prev = locks.get(key) ?? Promise.resolve();
  let release!: () => void;
  const next = new Promise<void>((r) => (release = r));
  locks.set(key, prev.then(() => next));
  try { await prev; return await fn(); }
  finally { release(); }
}
 
// Usage: every write against a given list-id serializes on its own key.
return withLock(`todo:${listId}`, async () => {
  const list = await readList(listId);
  list.push(item);
  await writeList(listId, list);
  return { content: [{ type: "text", text: "ok" }] };
});

5. Long-lived client lifecycle vs server restart leaks FDs

Fix. Explicit close handlers on the lifecycle signals, a best-effort flush on exit, and a server-side keepalive ping so dead transports are detected fast on the client side too.

import { StdioServerTransport } from "@modelcontextprotocol/sdk/server/stdio.js";
 
const transport = new StdioServerTransport();
await server.connect(transport);
 
let shuttingDown = false;
async function shutdown(signal: string) {
  if (shuttingDown) return;
  shuttingDown = true;
  process.stderr.write(`[mcp] ${signal}, closing transport\n`);
  try { await server.close(); await transport.close(); }
  catch (e) { process.stderr.write(`[mcp] shutdown error: ${(e as Error).message}\n`); }
  process.exit(0);
}
 
process.on("SIGINT", () => shutdown("SIGINT"));
process.on("SIGTERM", () => shutdown("SIGTERM"));
process.on("uncaughtException", (e) => {
  process.stderr.write(`[mcp] uncaught: ${e.stack ?? e.message}\n`);
  void shutdown("uncaughtException");
});

If you want the patterns above (and the rest of the playbook) as a guide instead of a blog post, see https://www.albinogeek.com/#offers.

Share:X / Twitter Hacker News Lobsters

Damon Blais

Operator, Albino Geek Services Ltd.

Runs MCP servers in daily production. Ships open-source MCP tooling and orchestrates parallel agentic projects across Claude Code and multi-agent stacks — not toy demos.

Contact All posts

← Back to all posts

1. stdout pollution kills the JSON-RPC channel

2. Schema validation gap between the SDK and real payloads

3. Tool-call timeouts vs Claude's retry semantics

4. Concurrent tool execution against shared mutable state

5. Long-lived client lifecycle vs server restart leaks FDs

Related posts

1. stdout pollution kills the JSON-RPC channel

2. Schema validation gap between the SDK and real payloads

3. Tool-call timeouts vs Claude's retry semantics

4. Concurrent tool execution against shared mutable state

5. Long-lived client lifecycle vs server restart leaks FDs

Related posts