Training Agenda

Big Data Engineering

Big Data Engineering is the discipline of designing, building, and operating data infrastructure at scale — pipelines, storage, processing, and serving layers that handle the volume, velocity, and variety of modern data. This training takes a technology-agnostic, architectural approach: covering the lakehouse pattern, data pipeline design, orchestration, data quality, and the trade-offs between batch and streaming processing that determine which technology fits which problem.

2 days On-site, remote, or hybrid Up to 20 participants German or English
What We Cover
Architecture and practice for data platforms at scale
Day 1

Data Architecture & Pipeline Design

  • Big data architecture patterns: Lambda, Kappa, Lakehouse — comparing models
  • Medallion architecture: Bronze, Silver, Gold layers — data quality by zone
  • Batch vs streaming: choosing the right processing model for each use case
  • Data pipeline design: idempotency, late arrival handling, SLAs
  • File formats: Parquet, ORC, Avro — columnar vs row, compression, schema
  • Data catalog and metadata: Apache Atlas, AWS Glue, DataHub
  • Data quality: Great Expectations, dbt tests, schema validation
  • Data observability: lineage tracking, anomaly detection on data
  • Workflow orchestration: Apache Airflow — DAGs, sensors, operators
  • dbt: data transformation as code — models, tests, documentation
Day 2

Cloud Data Platforms & Operations

  • Cloud data warehouses: Snowflake, BigQuery, Redshift — architecture and cost model
  • Delta Lake, Iceberg, Hudi: open table format comparison
  • Streaming ingestion patterns: Kafka to data lake, CDC pipelines
  • Data partitioning and clustering strategies for query performance
  • Incremental processing: change data capture, watermarks, snapshot isolation
  • Cost optimization: storage format selection, partition pruning, query result caching
  • Data governance: access control, column masking, row-level security
  • SLA monitoring: pipeline latency dashboards, data freshness alerts
  • DataOps: CI/CD for data pipelines — testing, versioning, deployment
  • Team topology: data engineering, data platform, analytics engineering roles
Learning Outcomes
What your team walks away with

Data engineers and architects who can design and evaluate data platform architectures — making intentional decisions about storage, processing, orchestration, and governance.

Book the Big Data Engineering training

Works as a standalone architecture and design training, or as a complement to hands-on Spark, Flink, or Kafka courses.

Get in touch