Data Stream Processing and Analytics - Spring 2019

OverviewLectures | ExercisesReadings

Course Information

Code: 263-3826-00, 6 ECTS credits
Language of instruction: English

Lecturer: Vasiliki Kalavri (kalavriv@inf.ethz.ch) CAB E 73.1
Teaching Assistants: Zaheer Chothia (zchothia@inf.ethz.ch) and Michal Wawrzoniak (michal.wawrzoniak@inf.ethz.ch).

Lectures: Mondays, 10-12, CHN E42
Exercises: Mondays, 13-15, CHN F46
Submissions: Moodle

What is this course about? Lecture 0 is now available.

This course is generously supported by a Google Cloud Platform Education Grant.


Overview

Modern data-driven applications require continuous, low-latency processing of large-scale, rapid data events such as videos, images, emails, chats, clicks, search queries, financial transactions, traffic records, sensor measurements, etc. Extracting knowledge from these data streams is particularly challenging due to their high speed and massive volume.

Distributed stream processing has recently become highly popular across industry and academia due to its capabilities to both improve established data processing tasks and to facilitate novel applications with real-time requirements. In this course, we will study the design and architecture of modern distributed streaming systems as well as fundamental algorithms for analyzing data streams.

Specifically, we will cover the following topics:

  • Distributed streaming systems design and architectures
  • Fault-tolerance and processing guarantees
  • State management
  • Windowing semantics and optimizations
  • Basic data stream mining algorithms (e.g. sampling, counting, filtering)
  • Query languages and libraries for stream processing (e.g. Complex Event Processing, online machine learning)
  • Streaming applications and use-cases 
  • Modern streaming systems: Apache Flink, Apache Beam, Apache Kafka, Timely dataflow

Recommended readings


Exercise Sessions

The exercise sessions will be a mixture of (1) reviews, discussions, and evaluation of research papers on data stream processing, and (2) programming assignments on implementing data stream mining algorithms and analysis tasks.


Examination

The course consists of lectures, exercises, and a final semester project. There will be no formal examination at the end of the course. Students are continuously graded based on their participation in class (10%), weekly assignments (50%), and semester project (40%).