Iceberg Table Format & Operations
- Iceberg architecture: table format spec — metadata, manifests, data files
- Catalog options: Hive Metastore, AWS Glue, Nessie, REST catalog
- ACID transactions: optimistic concurrency, snapshot isolation
- Schema evolution: adding, renaming, dropping, reordering columns — safely
- Partition evolution: changing partition strategy without rewriting data
- Hidden partitioning: no partition columns in queries
- Time travel and rollback: AS OF TIMESTAMP, snapshot IDs
- Maintenance operations: expire_snapshots, remove_orphan_files, rewrite_data_files
Iceberg with Spark, Flink & Query Engines
- Iceberg with Spark: reading and writing Iceberg tables, merge-on-read vs copy-on-write
- Iceberg with Flink: streaming writes, changelog mode
- Trino and DuckDB: ad-hoc querying of Iceberg tables
- Row-level operations: UPDATE, DELETE, MERGE INTO with Spark
- Incremental reads: streaming reads with append and changelog mode
- Iceberg vs Delta Lake vs Hudi: format comparison and migration paths
- AWS integration: Glue catalog, S3 storage, Athena, EMR
- Lakehouse patterns: medallion architecture with Iceberg tables
Data engineers who can design and operate Iceberg-based data lakes — with reliable writes, schema and partition evolution, and multi-engine query access.
- Create and manage Iceberg tables with schema evolution and partition evolution
- Perform time travel queries and roll back to previous snapshots
- Write Iceberg tables from Spark and Flink with correct commit semantics
- Set up catalog integration with AWS Glue or Nessie for multi-engine access
Book the Apache Iceberg training
Best combined with Apache Spark or Apache Flink for a complete data lakehouse engineering day.
Get in touch