Data Processing on Modern Hardware
Course Description
The roots of many productive database systems today date back thirty years or more. Prominent systems such as IBM's System R or the then-research prototype Ingres were first developed in the 1970s and were designed to address the hardware landscape of the time: disks or even tapes were the only medium to hold reasonable amounts of data; main memory could be considered as truly random access; and the major cost factor in database processing was I/O.
Since that time, computer architectures have changed significantly. RAM chips have become cheap enough to make in-memory processing feasible; caches and other architectural details lead to non-uniform memory access cost (an increasingly relevant performance factor); and the omnipresence of multi-core systems adds a whole new class of complexity to the problem.
In this course we look at how architectural changes affect database systems. Rather than suffering from the increasing latency gap for accesses to main memory, for instance, we can use available CPU caches to our advantage. A cache-aware design can improve the performance of a database operation by orders of magnitude. Likewise, modern CPU features (such as vector instructions) or specialized CPUs (like IBM's Cell processor or the nVidia CUDA architecture) can accelerate database tasks if the respective implementation has beed designed carefully.
We will complement this course with many hands-on exercises, where we verify some of the presented techniques on commodity hardware.
Lecture Slides
| Description | last updated | |
|---|---|---|
| Introduction | Sep 25, 2011 | |
| Cache Awareness | Oct 3, 2011 | |
| Instruction Execution | Oct 17, 2011 | |
| Vectorization | Oct 31, 2011 | |
| Multiple Cores | Nov 13, 2011 | |
| GPUs | Nov 28, 2011 | |
| FPGAs | Dec 12, 2011 |
Exercises
| Description | last updated | Resources | Solutions | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| Assignment1: | Probing your Cache | Oct | 10, | 2011 | none | src-solution-01.tar.bz2 | |||||
| Assignment2: | Cache Awareness in Databases | Oct | 24, | 2011 | src-handout-02.tar.bz2 | src-solution-02.tar.bz2 | |||||
| Assignment3: | Partitioned Hash Join | Nov | 07, | 2011 | src-handout-03.tar.bz2 | src-solution-03.tar.bz2 | |||||
| Assignment4: | Vectorized (de)compression | Oct | 29, | 2011 | src-handout-04.tar.bz2 | src-solution-04.tar.bz2 | |||||
| Assignment5: | Parallel Partitioned Hash Join | Dec | 05, | 2011 | src-handout-05.tar.bz2 | src-solution-05.tar.bz2 | |||||
| Assignment6: | Query Execution on GPGPUs | Dec | 13, | 2011 | src-handout-06.tar.bz2 | src-solution-06.tar.bz2 | |||||
Slides Exercise Class
| Description | PDF 2x4 | last updated | ||||||
|---|---|---|---|---|---|---|---|---|
| Exercise Session 1 | PDF 2x4 | Sep | 26, | 2011 | ||||
| Exercise Session 2 | PDF 2x4 | Oct | 03, | 2011 | ||||
| Exercise Session 3 | PDF 2x4 | Oct | 10, | 2011 | ||||
| Exercise Session 4 | PDF 2x4 | Oct | 17, | 2011 | ||||
| Exercise Session 5 | PDF 2x4 | Oct | 24, | 2011 | ||||
| Exercise Session 6 | PDF 2x4 | Oct | 31, | 2011 | ||||
| Exercise Session 7 | PDF 2x4 | Nov | 07, | 2011 | ||||
| Exercise Session 8 | PDF 2x4 | Nov | 18, | 2011 | ||||
| Exercise Session 9 | PDF 2x4 | Nov | 21, | 2011 | ||||
| Exercise Session 10 | PDF 2x4 | Nov | 28, | 2011 | ||||
| Exercise Session 11a | PDF 2x4 | Dec | 05, | 2011 | ||||
| Exercise Session 11b | PDF 2x4 | Dec | 05, | 2011 | ||||
| Exercise Session 12 | PDF 2x4 | Dec | 12, | 2011 | ||||
Material
This course is mostly based on very recent research work that is not (yet) covered in textbooks. Throughout the course we will give references to research papers (mostly published at major database conferences like VLDB or SIGMOD) for background reading.
Another important source are going to be the lecture slides for this course, which we will post on this website as the semester progresses.
- PDFs that I showed in class on Oct 3, 2011: Figure 7, Figure 37. These files are © Elsevier Science.
- Slides that I showed in class to illustrate the "handshake join" operator.
- Course Evaluation Results along with your hand-written comments.
Course Hours
- Lecture
- Mon, 9−11h, room CAB G 59 ― Instructor: Jens Teubner
- Exercises
- Mon, 11−12h, room CAB G 59 ― Instructor: Louis Woods
Additional Information
This course will be taught in english and is listed as course number 263-3502-00 in the ETH course catalog. You'll get 4 credit points for this 2V + 1Ü course.



