Vorarlberg University
of Applied Sciences

Information on individual educational components (ECTS-Course descriptions) per semester

Stream Processing Systems

Degree programme	Computer Science
Subject area	Engineering Technology
Type of degree	Master Full-time Summer Semester 2024
Course unit title	Stream Processing Systems
Course unit code	024913020504
Language of instruction	English
Type of course unit (compulsory, optional)	Elective
Teaching hours per week	2
Year of study	2024
Level of the course / module according to the curriculum
Number of ECTS credits allocated	3
Name of lecturer(s)	Peter REITER

Requirements and Prerequisites

Sound knowledge of at least one object-oriented programming language (preferably Java).
Basic knowledge of algorithms and data structures.

Course content

In addition to databases and batch processing systems, stream processing systems form another class of tools that are used to store and process data. Stream processing systems are used in particular when processing unrestricted and / or unordered data and / or when low latency times are required. Knowledge of the associated concepts, models and algorithms is the basis for the efficient use of the tools (e.g. Apache Spark / Flink / Apex / Kafka) which are used for processing.

Introduction to general concepts of stream processing: differences / similarities between services (online systems), batch processing systems (offline systems) and stream processing systems (near real-time systems), event streams, messaging Systems, unconstrained and / or disordered data, latency
The concept of “time”: event time vs. processing time, stateful operations, handling of “late data” and watermarking, “windowed operations” based on the event time
Relationships between state, streams and immutability
Fault tolerance: micro-batching and checkpointing, idempotency, recovery
Introduction to Apache Spark and Apache Kafka as stream processing platforms
Get to know typical use cases for stream processing systems: complex event processing, stream analytics
Architectures, programming models, challenges in processing data streams: parallel processing (e.g. load balancing, acknowledgment and redelivery), partitioned logs
Algorithms and patterns for efficient processing of data streams: stream joins (stream table join, table table join, time dependency of joins), probabilistic data structures
Best practices when using state-of-the-art stream processing platforms

Learning outcomes

Students can:

name essential concepts which form the basis for stream processing systems and their tasks.
explain the similarities / differences between stream processing and batch processing systems as well as the relationships between the two paradigms.
recognize typical use cases for stream processing systems and implement them with the help of state-of-the-art frameworks (e.g. Spark Structured Streaming, Apache Kafka).
explain the concept of "time" in the context of stream processing and take into account the associated challenges when implementing algorithms.
name fundamental patterns for efficient processing of data streams and use them in a targeted manner.
compare different frameworks for stream processing based on their architecture or properties in practical use.
implement simple practical tasks with the help of state-of-the-art frameworks.

Planned learning activities and teaching methods

Lecture plus in-class exercises (supplemented with flipped classroom elements)
Guided processing of relevant literature
Processing (analysis, design, implementation, testing, documenting and presentation) of case studies in small groups
Discussions and dialogues in plenary to receive and give efficient feedback.

Assessment methods and criteria

Evaluation of the exercises (written elaboration / documentation, quality of the solution achieved, ...)
Evaluation of the presentation of the exercises
Evaluation of participation in the discourse (e.g. peer feedback)

Comment

None

Internal

Stream Processing Systems