Training Agenda

OpenTelemetry &
Distributed Tracing

In microservice architectures, a single log entry is no longer enough to understand what went wrong with a request. OpenTelemetry is the CNCF standard for vendor-neutral observability: a unified data model and SDK for traces, metrics, and logs — instrumented once, exportable to any backend. Distributed tracing makes the complete journey of a request through every involved service visible, turning production problems that used to take hours into minute-level diagnoses.

1 day (expandable to 2) Remote or on-site Up to 20 participants German or English
What We Cover
One day from zero to production-ready tracing
Module 1

Observability Fundamentals & Tracing Concepts

  • The three pillars: Logs, metrics, traces — what each delivers and where each falls short
  • Trace anatomy: Spans, trace IDs, parent-child relationships, span attributes and events
  • W3C TraceContext: How context is propagated across service boundaries
  • Why grep isn't enough: live demo of a distributed failure without and with traces
Module 2

OpenTelemetry SDK & Instrumentation

  • Auto-instrumentation: Java agent for Spring Boot, Node.js, Python — zero code changes
  • Manual spans: Making business logic traceable — which spans are actually valuable
  • Attributes & events: Enriching spans with relevant context (user.id, order.value, error.type)
  • Context propagation: HTTP headers, gRPC metadata, Kafka message headers
  • Hands-on: instrumenting a Spring Boot microservice from scratch
Module 3

OTel Collector, Sampling & Backends

  • OTel Collector: Architecture, pipeline configuration (receivers → processors → exporters)
  • Tail-based sampling: Keep all errors, sample healthy requests — configuration and trade-offs
  • Attribute processors: Filtering sensitive data, adding labels, optimizing batching
  • Backend integration: Exporting to Grafana Tempo, Jaeger, Datadog, Honeycomb via OTLP
  • Latency dashboard in Grafana: combining traces and metrics, p99 alerting
Hands-on

Workshop: Diagnosing a production problem

  • Provided microservice application with a built-in latency problem
  • Reading and interpreting traces: waterfall view, critical path, span gaps
  • Locating an N+1 query, a slow external call, and a race condition using traces alone
  • Setting up an alert rule on trace-based metrics: error rate and latency SLOs
Learning Outcomes
What your team walks away with

Building observability sounds like an infrastructure concern — in practice, it changes how teams think about production systems. After this day:

For teams who want a 2-day version: day one can be expanded with Prometheus, Grafana, and log aggregation with Loki — a complete observability stack training in two focused days.

Book the observability training

Whether it's a focused OpenTelemetry introduction or a full observability stack workshop — brief conversation about your setup, then a concrete proposal.

Get in touch