Motivation:
Data-intensive cluster computing is becoming more and more relevant to
a large number of applications like data mining, network traffic
analysis, stream processing. MapReduce, Hadoop, Dryad and other grid
computing platforms are today used to support such applications. This
project investigates scheduling techniques for data-intensive tasks
that can be used in the context of a very dynamic and heterogeneous
overlay network..
Goals:
Get familiar with the Hadoop platform and identify performance
bottlenecks and trade-offs in a wide-area environment like PlanetLab.
Design and implement new scheduling techniques that allow for greedy,
fair, location-based scheduling of computing tasks in the context of
the Rhizoma platform. Jobs can expose constraints such as data locality
as well as specific performance requirements. Identify new sensor and
resource management components besides those provided by Rhizoma that
are necessary to support the proposed scheduling techniques.
Contact:
Gustavo Alonso