Apache Iceberg Training — Raffael Hühnerschulte

What We Cover

Open table format for reliable, queryable data lakes

Module 1

Iceberg Table Format & Operations

Iceberg architecture: table format spec — metadata, manifests, data files
Catalog options: Hive Metastore, AWS Glue, Nessie, REST catalog
ACID transactions: optimistic concurrency, snapshot isolation
Schema evolution: adding, renaming, dropping, reordering columns — safely
Partition evolution: changing partition strategy without rewriting data
Hidden partitioning: no partition columns in queries
Time travel and rollback: AS OF TIMESTAMP, snapshot IDs
Maintenance operations: expire_snapshots, remove_orphan_files, rewrite_data_files

Module 2

Iceberg with Spark, Flink & Query Engines

Iceberg with Spark: reading and writing Iceberg tables, merge-on-read vs copy-on-write
Iceberg with Flink: streaming writes, changelog mode
Trino and DuckDB: ad-hoc querying of Iceberg tables
Row-level operations: UPDATE, DELETE, MERGE INTO with Spark
Incremental reads: streaming reads with append and changelog mode
Iceberg vs Delta Lake vs Hudi: format comparison and migration paths
AWS integration: Glue catalog, S3 storage, Athena, EMR
Lakehouse patterns: medallion architecture with Iceberg tables

Learning Outcomes

What your team walks away with

Data engineers who can design and operate Iceberg-based data lakes — with reliable writes, schema and partition evolution, and multi-engine query access.

Create and manage Iceberg tables with schema evolution and partition evolution
Perform time travel queries and roll back to previous snapshots
Write Iceberg tables from Spark and Flink with correct commit semantics
Set up catalog integration with AWS Glue or Nessie for multi-engine access

Book the Apache Iceberg training

Best combined with Apache Spark or Apache Flink for a complete data lakehouse engineering day.

Get in touch