Personal tools

Job scheduling for data-intensive cluster computing

Motivation:

Data-intensive cluster computing is becoming more and more relevant to a large number of applications like data mining, network traffic analysis, stream processing. MapReduce, Hadoop, Dryad and other grid computing platforms are today used to support such applications. This project investigates scheduling techniques for data-intensive tasks that can be used in the context of a very dynamic and heterogeneous overlay network..

Goals:

Get familiar with the Hadoop platform and identify performance bottlenecks and trade-offs in a wide-area environment like PlanetLab. Design and implement new scheduling techniques that allow for greedy, fair, location-based scheduling of computing tasks in the context of the Rhizoma platform. Jobs can expose constraints such as data locality as well as specific performance requirements. Identify new sensor and resource management components besides those provided by Rhizoma that are necessary to support the proposed scheduling techniques.

Contact:

Gustavo Alonso

Document Actions