Flink Architecture & DataStream API
- Flink architecture: JobManager, TaskManagers, parallelism, slots
- Flink execution model: dataflow graphs, operators, task chaining
- DataStream API: map, filter, flatMap, keyBy, window
- Event time vs processing time vs ingestion time
- Watermarks: generating watermarks, handling late elements
- Windowing: tumbling, sliding, session windows
- Stateful functions: ValueState, ListState, MapState, BroadcastState
- Checkpointing: RocksDB backend, checkpoint interval, exactly-once
- Kafka source and sink: FlinkKafkaConsumer/Producer, FLIP-27 sources
- Side outputs: routing late data and error records
Table API, SQL & Production Deployment
- Flink SQL and Table API: CREATE TABLE, SELECT, JOIN, aggregations
- Temporal joins: joining streams with slowly-changing dimension tables
- CDC ingestion: Flink CDC connectors for MySQL, PostgreSQL — debezium-based
- Iceberg sink: writing to Iceberg tables from Flink
- Savepoints: stateful upgrades, rescaling, migration
- Flink on Kubernetes: Kubernetes Operator, application mode, session mode
- Backpressure analysis: Flink UI metrics, identifying bottlenecks
- Metrics and monitoring: Prometheus reporter, Grafana dashboards for Flink
- Exactly-once end-to-end: Kafka transactions + Flink checkpoints
- Flink vs Spark Structured Streaming: when to choose each
Data and platform engineers who can build, tune, and operate stateful Flink streaming pipelines — from first source to exactly-once sinks in production.
- Build stateful Flink DataStream pipelines with event-time processing and watermarks
- Write Flink SQL for streaming joins, aggregations, and CDC ingestion
- Configure checkpointing with RocksDB for fault-tolerant exactly-once processing
- Deploy Flink on Kubernetes with the Operator and monitor with Prometheus and Grafana
Book the Apache Flink training
Available as a standalone 2-day course or combined with Apache Kafka and Iceberg for a complete streaming data platform week.
Get in touch