Big Data

Overview

This course gives an overview of Big Data technologies. It revisits the classic ways of answering "tough questions" using data warehouses and parallel database systems. Then, it presents the modern technologies of cloud computing and massively parallel data analysis that have led to the Big Data hype. Furthermore, this course studies a number of other technologies that have been associated with Big Data (e.g., push-based, real-time data processing and unstructured data models).  The course will include a guest lecture that presents Big Data use cases from personalized medicine.

 

Content

  1. What is Big Data
  2. Data Warehouses (Star Schemas, Column Stores, Parallel Databases)
  3. Map Reduce & the Hadoop Eco System
  4. Cloud Computing
  5. Semi-structured Data
  6. Streaming Data
  7. Other Topics (e.g., Visualizations, Data Cleaning, Statistics with R)
  8. Applications

 

Announcments

 

Project

Milestone 1: description (due on September 26th 10am)

Milestone 2: description (due on October 30)

Milestone 3: description (due on November 27)

Milestone 4: description (due on December 17)

Exercises

Exercise 1: exercise1.pdf  Short example on summarizability exercise

Exercise 2: exercise2.pdf exercise-02-logistics.tar.gz Hadoop Tutorial sample data

Exercise 3: exercise3.pdf

Exercise 4: exercise4.pdf

Exercise 5: exercise5.pdf solution5.pdf

Exemplary Exam Questions

 

 Slides

Week Slides
1. Sept. 18/19

a. Introduction;  b. What is Big Data

2. Sept 25/2

Data Warehouses I

Updated Version-Data Warehouses I -3.Oct.

3. Oct 9/10 MapReduce and Hadoop Part 1
4. Oct 16/17 MapReduce and Hadoop Part 2
5. Oct 23/24 Cloud Computing
6. Oct 31 Guest Lecture: Big Data in Health Care – Genome meets iPhone, Ernst Hafen
7. Nov 13/14 Data Stream Processing Part 1
8. Nov 19/20 Data Stream Processing Part 2
9. Nov 27/28 XML
10. Dez 19 SPARQL

 

Course Hours

Lecture: Tuesdays 10:00 to 12:00, CAB G 51
Wednesdays 13:00 to 14:00, ML F 36
Exercise:

Wednesdays 14:00 to 15:00, ML F 36
Fridays
14:00 to 15:00, CHN D 46

 

 

Lecturer     

Donald Kossmann, Nesime Tatbul

 

Assistants

Martin Kaufmann, Besmira Nushi