Originally generated by OpenAI ChatGPT-5.1 and edited by myself. Texte factice de test.
Shipping an async service is equal parts ergonomics and detective work. This week’s bug: sporadic timeouts when querying an upstream API. The culprit was a mix of unbounded buffering and quietly dropped spans.
Minimal repro
use axum::{routing::get, Router};
use tokio::{sync::mpsc, time::Duration};
async fn handler(tx: mpsc::Sender<String>) -> String {
// fire and forget; errors ignored
let _ = tx.send("ping".into()).await;
"ok".into()
}
#[tokio::main]
async fn main() {
let (tx, mut rx) = mpsc::channel::<String>(1);
tokio::spawn(async move {
while let Some(msg) = rx.recv().await {
tracing::info!(%msg, "got msg");
}
});
let app = Router::new().route("/", get(|| handler(tx.clone())));
axum::Server::bind(&"0.0.0.0:3000".parse().unwrap())
.serve(app.into_make_service())
.await
.unwrap();
}
This “works,” but under load the single-slot channel backpressures and each request waits for the receiver.
What fixed it
- Make backpressure explicit: choose a bounded channel size that matches downstream throughput and surface
SendError. - Bubble up errors: returning a 503 when the queue is full made timeouts disappear and made alerts honest.
- Instrument the queue: per-queue metrics and tracing spans revealed that the bad path lived in the spawn, not the handler.
Retrospective checklist
- No unbounded channels without justification
-
No
spawnwithout error handling - Per-endpoint timeout budget logged
- Property tests for message ordering
- Benchmarks for queue length vs P99 latency