Personal tools

Data Processing on Modern Hardware

Course Description

The roots of many productive database systems today date back thirty years or more. Prominent systems such as IBM's System R or the then-research prototype Ingres were first developed in the 1970s and were designed to address the hardware landscape of the time: disks or even tapes were the only medium to hold reasonable amounts of data; main memory could be considered as truly random access; and the major cost factor in database processing was I/O.

Since that time, computer architectures have changed significantly. RAM chips have become cheap enough to make in-memory processing feasible; caches and other architectural details lead to non-uniform memory access cost (an increasingly relevant performance factor); and the omnipresence of multi-core systems adds a whole new class of complexity to the problem.

In this course we look at how architectural changes affect database systems. Rather than suffering from the increasing latency gap for accesses to main memory, for instance, we can use available CPU caches to our advantage. A cache-aware design can improve the performance of a database operation by orders of magnitude. Likewise, modern CPU features (such as vector instructions) or specialized CPUs (like IBM's Cell processor or the nVidia CUDA architecture) can accelerate database tasks if the respective implementation has beed designed carefully.

We will complement this course with many hands-on exercises, where we verify some of the presented techniques on commodity hardware.

Lecture Slides

Description PDF PDF 2x4 last updated
Introduction PDF PDF 2x4 21.09.2009
Cache Awareness PDF PDF 2x4 12.10.2009
Instruction Execution PDF PDF 2x4 19.10.2009
Execution on Multiple Cores PDF PDF 2x4 16.11.2009
Data Processing on GPUs and GPGPUs PDF PDF 2x4 09.11.2009
CUDA and OpenCL PDF PDF 2x4 15.11.2009
FPGAs for Data Processing PDF PDF 2x4 30.11.2009
High-Speed Networks for Distributed Data Processing PDF PDF 2x4 07.12.2009

Exercises

Description PDF last updated Resources
Homework 1: Cache PDF 21.09.2009

Homework 2: Row vs. Column Store
PDF 05.10.2009
store.tar.bz2
Homework 3: Partitioned Hash Join
PDF 26.10.2009
phj.tar.gz
Homework 4: Query Execution in CUDA
PDF 15.11.2009
cuda-store.tar.bz2
Homework 5: Regular Expressions in Hardware
PDF 22.11.2009
regex-testbench.tar.bz2

Slides Exercise Class

Description PDF PDF 2x4 last updated
Exercise 1 PDF PDF 2x4 21.09.2009
Exercise 2


no slides
Exercise 3
PDF PDF 2x4 11.10.2009
Exercise 4
PDF PDF 2x4 19.10.2009
Exercise 5


no slides
Exercise 6
PDF PDF 2x4 2.11.2009
Exercise 7
PDF PDF 2x4 23.11.2009
Exercise 9
PDF PDF 2x4 30.11.2009
Exercise 10


no slides

Material

This course is mostly based on very recent research work that is not (yet) covered in textbooks. Throughout the course we will give references to research papers (mostly published at major database conferences like VLDB or SIGMOD) for background reading.

Another important source are going to be the lecture slides for this course, which we will post on this website as the semester progresses.

Source code for GPU examples discussed in class.

Georgios Giannikis presented his work on crescando in the lecture. His presentation slides are available in PDF and PPTX format.

Course Evaluation

This course was evaluated on November 23, 2009. Here are the aggregated evaluation results:

Course Hours

Lecture
Mon, 9−11h, room IFW B 42 ― Instructor: Jens Teubner
Exercises
Mon, 11−12h, room IFW B 42 ― Instructor: René Müller, Ionut Subasu, Louis Woods

Additional Information

This course will be taught in english and is listed as course number 263-3502-00 in the ETH course catalog. You'll get 4 credit points for this 2V + 1Ü course.

Document Actions