OpenTelemetry & Distributed Tracing Training

What We Cover

One day from zero to production-ready tracing

Module 1

Observability Fundamentals & Tracing Concepts

The three pillars: Logs, metrics, traces — what each delivers and where each falls short
Trace anatomy: Spans, trace IDs, parent-child relationships, span attributes and events
W3C TraceContext: How context is propagated across service boundaries
Why grep isn't enough: live demo of a distributed failure without and with traces

Module 2

OpenTelemetry SDK & Instrumentation

Auto-instrumentation: Java agent for Spring Boot, Node.js, Python — zero code changes
Manual spans: Making business logic traceable — which spans are actually valuable
Attributes & events: Enriching spans with relevant context (user.id, order.value, error.type)
Context propagation: HTTP headers, gRPC metadata, Kafka message headers
Hands-on: instrumenting a Spring Boot microservice from scratch

Module 3

OTel Collector, Sampling & Backends

OTel Collector: Architecture, pipeline configuration (receivers → processors → exporters)
Tail-based sampling: Keep all errors, sample healthy requests — configuration and trade-offs
Attribute processors: Filtering sensitive data, adding labels, optimizing batching
Backend integration: Exporting to Grafana Tempo, Jaeger, Datadog, Honeycomb via OTLP
Latency dashboard in Grafana: combining traces and metrics, p99 alerting

Hands-on

Workshop: Diagnosing a production problem

Provided microservice application with a built-in latency problem
Reading and interpreting traces: waterfall view, critical path, span gaps
Locating an N+1 query, a slow external call, and a race condition using traces alone
Setting up an alert rule on trace-based metrics: error rate and latency SLOs

Learning Outcomes

What your team walks away with

Building observability sounds like an infrastructure concern — in practice, it changes how teams think about production systems. After this day:

Instrument services with OpenTelemetry without any vendor lock-in
Read distributed traces and name the slow or failing service precisely
Configure the OTel Collector for sampling, routing, and multi-backend export
Build Grafana dashboards that combine traces and metrics in one view
Diagnose production problems that hide behind logs — in minutes instead of hours

For teams who want a 2-day version: day one can be expanded with Prometheus, Grafana, and log aggregation with Loki — a complete observability stack training in two focused days.

Book the observability training

Whether it's a focused OpenTelemetry introduction or a full observability stack workshop — brief conversation about your setup, then a concrete proposal.

Get in touch