Originally generated by OpenAI ChatGPT-5.1 and edited by myself. Texte factice de test.

Shipping an async service is equal parts ergonomics and detective work. This week’s bug: sporadic timeouts when querying an upstream API. The culprit was a mix of unbounded buffering and quietly dropped spans.

Minimal repro

use axum::{routing::get, Router};
use tokio::{sync::mpsc, time::Duration};

async fn handler(tx: mpsc::Sender<String>) -> String {
    // fire and forget; errors ignored
    let _ = tx.send("ping".into()).await;
    "ok".into()
}

#[tokio::main]
async fn main() {
    let (tx, mut rx) = mpsc::channel::<String>(1);
    tokio::spawn(async move {
        while let Some(msg) = rx.recv().await {
            tracing::info!(%msg, "got msg");
        }
    });

    let app = Router::new().route("/", get(|| handler(tx.clone())));
    axum::Server::bind(&"0.0.0.0:3000".parse().unwrap())
        .serve(app.into_make_service())
        .await
        .unwrap();
}

This “works,” but under load the single-slot channel backpressures and each request waits for the receiver.

What fixed it

  1. Make backpressure explicit: choose a bounded channel size that matches downstream throughput and surface SendError.
  2. Bubble up errors: returning a 503 when the queue is full made timeouts disappear and made alerts honest.
  3. Instrument the queue: per-queue metrics and tracing spans revealed that the bad path lived in the spawn, not the handler.

Retrospective checklist

  • No unbounded channels without justification
  • No spawn without error handling
  • Per-endpoint timeout budget logged
  • Property tests for message ordering
  • Benchmarks for queue length vs P99 latency