Tracing Async Python: How to Instrument FastAPI and Celery in the Same Trace
Did you know that >30 % of production‑grade Python services miss critical latency bugs simply because their async work isn’t traced end‑to‑end? Imagine a user‑facing FastAPI endpoint that fires a background Celery task—without a unified trace you see two isolated timings, not the full story. In this article we’ll stitch those pieces together so you can watch the request flow from HTTP call to worker execution in a single, searchable span.
Why Unified Tracing Matters for Async Python
Visibility across process boundaries is the first win. FastAPI runs in an ASGI server, Celery spins in separate worker processes. A single trace shows the true request latency, not just a split snapshot. Root‑cause diagnostics become a breeze: you can spot a slow pandas DataFrame crunch next to a queue back‑pressure spike. Business impact? Faster incident response, lower SLO breach risk, and measurable ROI for observability investments.
Core Concepts: Tracing, Spans, and Context Propagation
- Trace vs. Span – A trace is the whole journey; individual spans are the building blocks. In FastAPI, each HTTP request becomes a root span, while Celery tasks are child spans.
- Context propagation mechanisms – HTTP headers carry trace IDs; message brokers attach them to task payloads. OpenTelemetry’s
Contextobject glues everything together. - Async‑aware instrumentation – The event loop can lose the active span if not handled. Libraries make sure
asynciokeeps the context alive across awaits.
Setting Up the Toolchain (pip, OpenTelemetry, Jaeger/Tempo)
First, grab the right packages. In your requirements.txt add:opentelemetry-sdk and run
opentelemetry-instrumentation-fastapi
opentelemetry-instrumentation-celery
opentelemetry-exporter-jaegerpip install -r requirements.txt. Then configure the tracer provider in tracing.py:
from opentelemetry import trace
from opentelemetry.sdk.resources import Resource
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import BatchSpanProcessor
from opentelemetry.exporter.jaeger.thrift import JaegerExporter
resource = Resource.create({"service.name": "fastapi-celery-demo", "service.version": "1.0.0"})
trace.set_tracer_provider(TracerProvider(resource=resource))
jaeger = JaegerExporter(agent_host_name="localhost", agent_port=6831)
trace.get_tracer_provider().add_span_processor(
BatchSpanProcessor(jaeger)
)
Run a local Jaeger instance with Docker Compose. Once up, all spans will surface in the UI.
Step‑by‑Step Walkthrough: Instrument FastAPI and Celery in One Trace
- Create a FastAPI app and add the OpenTelemetry middleware:
FastAPIInstrumentor.instrument_app(app). - Define a Celery task and enable the Celery instrumentation:
CeleryInstrumentor.instrument_celery(app). - Propagate the trace context when queuing:
task.apply_async(headers=carrier), wherecarrieris the current context injection. - Verify in Jaeger: a single root span with child spans for the HTTP request, a pandas/numpy compute span, and the background task.
- Troubleshooting: missing spans often mean the context wasn’t injected; duplicate IDs usually stem from manual span creation without
start_as_current_span.
Here’s a minimal runnable example:
# tracing.py
from opentelemetry import trace
from opentelemetry.sdk.resources import Resource
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import BatchSpanProcessor
from opentelemetry.exporter.jaeger.thrift import JaegerExporter
resource = Resource.create({"service.name": "fastapi-celery-demo"})
trace.set_tracer_provider(TracerProvider(resource=resource))
jaeger = JaegerExporter(agent_host_name="localhost", agent_port=6831)
trace.get_tracer_provider().add_span_processor(
BatchSpanProcessor(jaeger)
)
tracer = trace.get_tracer(__name__)
# main.py
from fastapi import FastAPI
from celery import Celery
from opentelemetry.instrumentation.fastapi import FastAPIInstrumentor
from opentelemetry.instrumentation.celery import CeleryInstrumentor
from tracing import tracer
app = FastAPI()
celery_app = Celery("worker", broker="redis://localhost:6379/0")
CeleryInstrumentor().instrument(celery_app)
FastAPIInstrumentor().instrument_app(app)
@app.get("/process")
async def process():
with tracer.start_as_current_span("http-request"):
# optional data crunch
import numpy as np
with tracer.start_as_current_span("numpy.compute"):
arr = np.random.rand(1_000_000)
arr.sum()
# enqueue task with context
carrier = {}
tracer.get_current_span().get_span_context().inject(carrier)
background_task.apply_async(headers=carrier)
return {"status": "queued"}
@celery_app.task
def background_task():
with tracer.start_as_current_span("celery.task"):
# pretend heavy db query
import pandas as pd
df = pd.DataFrame({"x": range(1000)})
df["y"] = df["x"] * 2
df.sum().sum()
Run uvicorn main:app --reload and the worker with celery -A main.celery_app worker --loglevel=info. Hit /process and watch the trace unfold.
Actionable Takeaways & Best Practices
- Automate instrumentation – add it to
requirements.txtand your CI pipeline. The moment a new service is spun up, it ships with tracing. - Standardize naming conventions – use
fastapi.api.requestfor HTTP spans,celery.task.processfor tasks. It keeps the UI readable. - Monitor key metrics – keep an eye on 95th percentile latencies, error rates, and queue depth together with traces. One dashboard, one view.
- Scale safely – switch to batch exporters for high‑throughput, limit the number of spans per trace, and prune non‑essential attributes.
Frequently Asked Questions
How do I trace async FastAPI endpoints with OpenTelemetry?
Add opentelemetry-instrumentation-fastapi and call FastAPIInstrumentor.instrument_app(app). The middleware automatically creates a span for each incoming request and preserves context across await calls.
Can Celery tasks share the same trace as the HTTP request that started them?
Yes—by injecting the current trace context into the task payload (e.g., via headers or task.apply_async) and enabling opentelemetry-instrumentation-celery, the worker extracts the context and continues the same trace.
What exporters work best for Python async tracing in development?
Jaeger (Docker) and Grafana Tempo are both easy to spin up locally; they understand the OpenTelemetry protocol and display async spans hierarchically.
Do pandas or numpy operations automatically appear in traces?
Not by default. Wrap heavy data‑processing blocks in manual spans (with tracer.start_as_current_span("pandas.compute"):) to capture their duration and link them to the surrounding request trace.
How can I view traces from a Jupyter notebook?
Use the opentelemetry-sdk to export spans to an in‑memory exporter or to a local collector, then query the collector’s API from the notebook (e.g., via requests) and render with pyvis or plotly.
Related reading: Original discussion
Related Articles
- How I built my own MyAnimeList alternative in Python...
- React is Overkill: Why Python + HTMX is Dominating in 2026
What do you think?
Have experience with this topic? Drop your thoughts in the comments - I read every single one and love hearing different perspectives!
Comments
Post a Comment