Data Processing on Modern Hardware - Fall 2012

Course Description

The roots of many productive database systems today date back thirty years or more. Prominent systems such as IBM's System R or the then-research prototype Ingres were first developed in the 1970s and were designed to address the hardware landscape of the time: disks or even tapes were the only medium to hold reasonable amounts of data; main memory could be considered as truly random access; and the major cost factor in database processing was I/O.

Since that time, computer architectures have changed significantly. RAM chips have become cheap enough to make in-memory processing feasible; caches and other architectural details lead to non-uniform memory access cost (an increasingly relevant performance factor); and the omnipresence of multi-core systems adds a whole new class of complexity to the problem.

In this course we look at how architectural changes affect database systems. Rather than suffering from the increasing latency gap for accesses to main memory, for instance, we can use available CPU caches to our advantage. A cache-aware design can improve the performance of a database operation by orders of magnitude. Likewise, modern CPU features (such as vector instructions) or specialized CPUs (like IBM's Cell processor or the nVidia CUDA architecture) can accelerate database tasks if the respective implementation has beed designed carefully.

We will complement this course with many hands-on exercises, where we verify some of the presented techniques on commodity hardware.

Lecture Slides

Lecture slides will be made available here as the semester progresses.

Description PDF last updated
Introduction PDF Sep 20, 2012
Cache Awareness PDF Oct 4, 2012
Instruction Execution PDF Oct 10, 2012
Vectorization PDF Oct 25, 2012
Execution on Multiple Cores PDF Nov 1, 2012
Graphics Processors (GPUs) PDF Nov 29, 2012
FPGAs for Data Processing PDF Dec 13, 2012

Exercises

Exercise sheets will be made available here as the semester progresses.

Description   PDF   last updated   Resources   Solutions
Assignment1:  Probing your Cache   assignment01.pdf   Sep 19, 2012   none   src-solution-01.tar.bz2
Assignment2:  Cache Awareness in Databases   assignment02.pdf   Sep 26, 2012   src-handout-02_tar.bz2   src-solution-02.tar.bz2
Assignment3:  Partitioned Hash Join   assignment03.pdf   Oct 11, 2012   src-handout-03.tar.bz2   src-solution-03.tar.bz2
Assignment4:  Data Level Parallelism   assignment04.pdf   Oct 25, 2012   src-handout-04.tar.bz2   src-solution-04_tar.bz2
Assignment5:  Parallel Partitioned Hash Join   assignment05.pdf   Nov 8, 2012   src-handout-05.tar.bz2   src-solution-05_tar.bz2
Assignment6:  Query Execution on GPGPUs   assignment06.pdf   Nov 29, 2012   src-handout-06.tar.bz2   src-solution-06_tar.bz

Slides Exercise Class

Slides shown in the exercise groups will be made available here.

Description   PDF   last updated
Exercise Session 1   ex01.pdf   Sep 19, 2012
Exercise Session 2   ex02.pdf   Sep 26, 2012
Exercise Session 3   ex03.pdf   Oct 3, 2012
Exercise Session 4   ex04.pdf   Oct 11, 2012
Exercise Session 5   ex05.pdf   Oct 18, 2012
Exercise Session 6   ex06.pdf   Oct 25, 2012
Exercise Session 7   ex07.pdf   Nov 1, 2012
Exercise Session 8   ex08.pdf   Nov 8, 2012
Exercise Session 9   ex09.pdf   Nov 15, 2012
Exercise Session 10   ex10.pdf   Nov 29, 2012
Exercise Session 11   ex11.pdf   Dec 6, 2012
Exercise Session 12   ex12.pdf   Dec 13, 2012

Material

This course is mostly based on very recent research work that is not (yet) covered in textbooks. Throughout the course we will give references to research papers (mostly published at major database conferences like VLDB or SIGMOD) for background reading.

Another important source are going to be the lecture slides for this course, which we will post on this website as the semester progresses.

Course Hours

Lecture
Thu, 13−15h, room CAB G 59 ― Instructor: Jens Teubner
Exercises
Thu, 15−16h, room CAB G 59 ― Instructor: Cagri Balkesen

Course Evaluation

Thanks for participating in the electronic evaluation of this course. You can find the results here.

Additional Information

This course will be taught in english and is listed as course number 263-3502-00 in the ETH course catalog. You'll get 4 credit points for this 2V + 1Ü course.