Data Architecture & Pipeline Design
- Big data architecture patterns: Lambda, Kappa, Lakehouse — comparing models
- Medallion architecture: Bronze, Silver, Gold layers — data quality by zone
- Batch vs streaming: choosing the right processing model for each use case
- Data pipeline design: idempotency, late arrival handling, SLAs
- File formats: Parquet, ORC, Avro — columnar vs row, compression, schema
- Data catalog and metadata: Apache Atlas, AWS Glue, DataHub
- Data quality: Great Expectations, dbt tests, schema validation
- Data observability: lineage tracking, anomaly detection on data
- Workflow orchestration: Apache Airflow — DAGs, sensors, operators
- dbt: data transformation as code — models, tests, documentation
Cloud Data Platforms & Operations
- Cloud data warehouses: Snowflake, BigQuery, Redshift — architecture and cost model
- Delta Lake, Iceberg, Hudi: open table format comparison
- Streaming ingestion patterns: Kafka to data lake, CDC pipelines
- Data partitioning and clustering strategies for query performance
- Incremental processing: change data capture, watermarks, snapshot isolation
- Cost optimization: storage format selection, partition pruning, query result caching
- Data governance: access control, column masking, row-level security
- SLA monitoring: pipeline latency dashboards, data freshness alerts
- DataOps: CI/CD for data pipelines — testing, versioning, deployment
- Team topology: data engineering, data platform, analytics engineering roles
Data engineers and architects who can design and evaluate data platform architectures — making intentional decisions about storage, processing, orchestration, and governance.
- Design a lakehouse architecture with appropriate bronze/silver/gold separation
- Choose batch vs streaming processing based on latency, cost, and complexity trade-offs
- Implement data quality checks and observability across a data pipeline
- Evaluate cloud data warehouse and open table format trade-offs for a given workload
Book the Big Data Engineering training
Works as a standalone architecture and design training, or as a complement to hands-on Spark, Flink, or Kafka courses.
Get in touch