Training Agenda

Apache Iceberg

Apache Iceberg is the open table format that brings database-grade capabilities to data lakes — ACID transactions, schema evolution, partition evolution, time travel, and hidden partitioning — on top of object storage like S3 or GCS. Iceberg decouples the storage format from the compute engine, allowing the same tables to be queried from Spark, Flink, Trino, Dremio, and DuckDB simultaneously. This training covers Iceberg's table format, catalog integration, and practical usage with Spark and Flink.

1 day On-site, remote, or hybrid Up to 20 participants German or English
What We Cover
Open table format for reliable, queryable data lakes
Module 1

Iceberg Table Format & Operations

  • Iceberg architecture: table format spec — metadata, manifests, data files
  • Catalog options: Hive Metastore, AWS Glue, Nessie, REST catalog
  • ACID transactions: optimistic concurrency, snapshot isolation
  • Schema evolution: adding, renaming, dropping, reordering columns — safely
  • Partition evolution: changing partition strategy without rewriting data
  • Hidden partitioning: no partition columns in queries
  • Time travel and rollback: AS OF TIMESTAMP, snapshot IDs
  • Maintenance operations: expire_snapshots, remove_orphan_files, rewrite_data_files
Module 2

Iceberg with Spark, Flink & Query Engines

  • Iceberg with Spark: reading and writing Iceberg tables, merge-on-read vs copy-on-write
  • Iceberg with Flink: streaming writes, changelog mode
  • Trino and DuckDB: ad-hoc querying of Iceberg tables
  • Row-level operations: UPDATE, DELETE, MERGE INTO with Spark
  • Incremental reads: streaming reads with append and changelog mode
  • Iceberg vs Delta Lake vs Hudi: format comparison and migration paths
  • AWS integration: Glue catalog, S3 storage, Athena, EMR
  • Lakehouse patterns: medallion architecture with Iceberg tables
Learning Outcomes
What your team walks away with

Data engineers who can design and operate Iceberg-based data lakes — with reliable writes, schema and partition evolution, and multi-engine query access.

Book the Apache Iceberg training

Best combined with Apache Spark or Apache Flink for a complete data lakehouse engineering day.

Get in touch