Projects List
Overview of research projects in the Systems Group
Current Project Areas
Anzere
Anzere is a storage system which replicates a user's personal data (photos, music, etc.) across an ensemble of physical and virtual devices owned (or rented on demand from cloud infrastructures) by a single user. With Anzere we show how to flexibly replicate data at scale in response to a complex, user-specified set of replication policies. Anzere is built on the Rhizoma platform, and includes an overlay network, monitoring infrastructure, CLP solver, data replication based on PRACTI, and Paxos for consistency. Anzere currently runs on mobile phones, laptops and desktops, and VMs on PlanetLab and Amazon EC2.
Project members: Qin Yin, Ercan Ucan, Timothy Roscoe
Alpenrhein
We explore how an interface between the OS and the managed runtime system should look in terms of information flow in both direction to improve hardware resource utilization.By extending the interface between the runtime system and the OS, legacy application written on top of the runtime system can benefit from modern manycore systems without modification.Semantic information about the application within the runtime system provides deeper insight into a new or legacy application and allows the system to automatically derive resource needs in terms of memory, caches, cores and interconnect. Communication, synchronization, computation and memory allocation information is provided to the SKB which then gets a global view and can provide feedback to all running instances in the system.
Project members: Adrian Schuepbach, Timothy Roscoe
Avalanche: Data Processing on Bare Metal
The limitations of today's computing architectures are well known: high power consumption, heat dissipation, network and I/O bottlenecks, and the memory wall. Field-programmable gate arrays (FPGAs), user-configurable hardware chips, are promising candidates to overcome these limitations. With tailor-made and software-configured hardware circuits it is possible to process data at very high throughput rates and with extremely low latency. Yet, FPGAs consume orders of magnitude less power than conventional systems. Thanks to their high configurability, they can be used as co-processors in heterogeneous multi-core architectures, and/or directly be placed in critical data paths to reduce the load that hits the system CPU.
In Avalanche we work on a realization of these promises. To this end, we are building a processing stack for FPGA-based database processing. Our Glacier compiler translates a useful subset of a stream query language into hardware circuits, e.g., to make the latency and throughput advantages of FPGAs accessible to financial trading applications. Our recent hardware solution to the frequent item problem significantly outperforms known software solutions—at a fraction of the resource consumption.
Project members: Gustavo Alonso, Jens Teubner, Louis Woods
Barrelfish
The Barrelfish project is performing operating systems research in the context of a new, written-from-scratch kernel, targeted at multi- and many-core processor systems. The OS will use virtualization techniques in some processes or domains in order to support legacy applications and devices, but will export a new OS ABI to most domains and will manage all the hardware itself.
Project members: Simon Peter, Timothy Roscoe, Adrian Schuepbach
Cloudy/Smoky
Cloud computing has changed the view on data management by focusing primarily on cost, flexibility and availability instead of consistency and performance at any price as traditional DBMS do. As a result, cloud data storages run on commodity hardware, are designed to be scalable, easy to maintain and highly fault-tolerant often providing relaxed consistency guarantees. The success of key-value stores like Amazon's S3 or the variety of open-source systems reflect this shift. Existing solutions, however, still lack substantial functionality provided by a traditional DBMS (e.g., support for transactions and a declarative query language) and are tailored to specific scenarios creating a jungle of services. That is, users have to decide for a specific service and are later locked into this service, preventing the evolution of the application, leading to misuse of services and expensive migrations to other services. With Cloudy we have started to build our own highly scalable database, which provides a completely modularized architecture and is not tailored to a specific use case. For example, Cloudy supports stream processing, as well as SQL and simple key-value requests.
Project members: Donald Kossmann, Simon Loesing
ClockScan on Column Stores Toolkit (CSCSTK)
The project studies the behavior of Main Memory resident Column Stores as data is processed using Shared Scans. The project investigates algorithms and data structures.
Project members: Tudor Salomie, Jana Giceva
Crescando - ECC Project
Crescando is a scalable, distributed relational table implementation designed to perform large numbers of queries and updates with guaranteed access latency and data freshness. To this end, Crescando leverages a number of modern query processing techniques and hardware trends. Specifically, Crescando is based on parallel, collaborative scans in main memory and so-called "query-data" joins known from data-stream processing. While the approach is not always optimal for a given workload, it provides latency and freshness guarantees for all workloads. Thus, Crescando is particularly attractive if the workload is unknown, changing, or involves many different queries.
Project members: Philipp Unterbrunner, Georgios Giannikis, Gustavo Alonso, Donald Kossmann
CrowdDB: Integrating Human Input into Databases
The goal of this project is to develop a set of novel techniques that allow to integrate human resources into a database system in order to process some of the impossible queries that Google and Oracle cannot answer today and address some of the notoriously hard database research problems in a very different way as has been done in the past. Specfically, we plan to build an extended relational database system, called CrowdDB.
Project members: Donald Kossmann, Sukriti Ramesh, Florian Widmer (Master student)
Data Cyclotron
The transport mechanisms offered by modern network cards that support remote direct memory access (RDMA) significantly shift the priorities in distributed systems. Complex and sophisticated machinery designed only to avoid network traffic can now be replaced by schemes that can use the available bandwidth to their advantage. One such scheme is Data Cyclotron, a research effort that we pursue jointly with the database group at CWI Amsterdam. Based on a simple ring-shaped topology, Data Cyclotron offers ad-hoc querying over data of arbitrary shape and arbitrary size.
Project members: Romulo Gonçalves (CWI), Martin Kersten (CWI), Jens Teubner
DejaVu
The DejaVu project explores scalable complex event processing techniques for streams of events. The goal is to provide a system that can seamlessly integrate pattern detection over live and historical streams of events behind a common, declarative interface. We are investigating various optimization ideas for efficient data access and query execution.
Project members: Nihal Dindar, Nesime Tatbul
flowSGI
flowSGI combines the paradigm of Fluid Computing with the dynamics of the OSGi service platform. Data and applications can be shared and kept synchronized among different peers, including small mobile devices. Offline operations on data are permitted; changes are reconciled as soon as a network connection is available again. flowSGi includes the following subprojects:
- Concierge OSGi Concierge is an implementation of the OSGi R3 technology, optimized
for mobile and embedded devices. It runs on all J2SE and J2ME CDC VMs
and shows a good performance even on not so optimized virtual machines.
With a footprint of only 85 kBytes, it's one of the smallest OSGi
implementations available. Project members: Gustavo
Alonso
- jSLP is a pure Java
implementation of RFC 2608: Service Location Protocol. It provides
service discovery on packet level and can run either in managed
environments, or in ad-hoc networks using multicast requests. Project members: Gustavo
Alonso
- R-OSGI (Past Project)
Limmat: Analytics for the Real-Time Web
Today, with the growing use of mobile devices constantly connected to the Internet, the nature of user-generated data has changed: it has become more real-time. People share their thoughts and discuss breaking news on Twitter and Facebook; they share their current locations and activities on location-based social networks such as Foursquare. The difference is that, today, people share more often and the lifespan of the data has become shorter.
Analyzing this data leads to new requirements for analytical systems: real-time processing and database intensive workloads. Driven by these requirements, we have developed Limmat. Limmat extends a key-value store architecture with push-based processing, transactional task execution, and synchronization. We modified the MapReduce programming model to support push-style data processing.
Current Project members: Martin Hentschel, Donald Kossmann
Former Project members: Maxim Grinev, Maria Grineva
Mapping Data to Queries
Mapping Data to Queries (MDQ) is a radically different approach to process data with many different schemas. MDQ differs from traditional approaches to data integration by integrating data at the latest possible point in time, at runtime of a query. This opens up a great potential for optimization because at query runtime both, the data and the query, are known and we can exploit this knowledge to only apply fewer mapping rules that traditional approaches. Consequently, MDQ scales well with the number of schemas and outperforms traditional approaches by orders of magnitude in extreme cases.
Project members: Martin Hentschel, Laura Haas (IBM Almaden) Donald Kossmann, and Renée Miller (University of Toronto)
MaxStream - ECC Project
Despite the availability of several commercial data stream processing engines (SPEs), it remains hard to develop and maintain streaming applications. A major difficulty is the lack of standards, and the wide (and changing) variety of application requirements. Consequently, existing SPEs vary widely in data and query models, APIs, functionality, and optimization capabilities. This has led to some organizations using multiple SPEs, based on their application needs. Furthermore, management of stored data and streaming data are still mostly separate concerns, although applications increasingly require integrated access to both. In the MaxStream project, our goal is to design and build a federated stream processing architecture that seamlessly integrates multiple autonomous and heterogeneous SPEs with traditional databases behind a common SQL-based declarative query interface and a common API in a way to facilitate the incorporation of new functionality and requirements.
Project members: Nihal Dindar, Laura Haas (IBM Almaden), Renée Miller (University of Toronto), Nesime Tatbul
Multimed
Multicore computers pose a substantial challenge to infrastructure software such as operating systems or databases. These platforms typically evolve slower than the underlying hardware but with multicore they face structural limitations that can be solved only with radical architectural changes. In this paper we argue that, as has been suggested for operating systems, databases could treat multicore architectures as a distributed system rather than trying to hide the parallel nature of the hardware. We first analyze the limitations of database engines when running on multicores using MySQL and PostgreSQL as examples. We then show how to deploy several replicated engines within a single multicore machine to achieve better scalability and stability than a single database engine operating on all cores. When combined with options like virtualization and the ability to tune the system configuration to the load and number of available cores, the approach we propose becomes an appealing alternative to having to entirely redesign the database engine.
Project members: Tudor Salomie, Iount Subasu, Jana Giceva
Privacy in the Cloud - ECC Project
Cloud computing is the next big thing. But many potential users hesitate to outsource their computing needs to a cloud service provider because they do not want to outsource control. This project addresses the need to encrypt databases in the cloud and at the same time execute complex SQL queries efficiently. The goal is to use the computing power of a cloud service and at the same time preserve privacy. A dictionary-based encoding is used to achieve this goal.
Project members: Stefan Hildenbrand, Donald Kossmann, Tahmineh Sanamrad, Carsten Binnig (SAP), Franz Färber (SAP), Johannes Wöhler (SAP)
Privacy for Cloud Applications
Cloud application services or "Software as a Service" deliver software as a service over the Internet, eliminating the need to install and run the application on the customer's own computers and simplifying maintenance and support. Having all these advantages said, the privacy of the users still remains a big question mark. In this project we try to have a security layer implemented in between. The first project is the Encrypted Google Calendar Service that is currently in progress and can be tested using the following link and information.
Encrypted Calendar User Manual
Project members: Tahmineh Sanamrad, Daniel Widmer, Donald Kossmann
Rhizoma
Rhizoma is a constraint-based runtime system for distributed applications which is self-hosting. The application manages itself to the extent of acquiring and releasing resources (in particular, virtual machines) in response to failures, offered load, or changing policy. Operators developing and deploying application using Rhizoma specify desired application deployment using a form of constrained logic programming, and the Rhizoma runtime uses this to drive resource requests continuously during the lifetime of the application.
Semantic Data Warehouse Search - ECC Project
During the financial crises in 2008 several financial institutions needed to search their data warehouses for investment products related to Lehman Brothers. Often that information was not readily available. The goal is to design and implement novel (semantic) search strategies that enable easy to use key word searching over the data warehouse. One of the challenges is to combine (semantic) search technology on meta data with base data stored in Terabyte-scale relational databases.
This project is a joint research challenge between ETH Zurich and Credit Suisse.
The project started in Summer 2009. In the beginning we have been working on the following topics:
- Problem definition and use cases
- Query classification (business objects, values, operators)
- Graph construction
- Graph search
Currently, we are working on two main issues:
- Discovering and translating patterns in the metadata graph to executable SQL queries.
- Searching and ranking on the metadata graph.
Project members:Gustavo Alonso, Lukas Blunschi, Claudio Jossen (CS), Donald Kossmann, Magdalini Mori, Kurt Stockinger (CS)
Snapshot Isolation in Distributed Column Stores - ECC Project
Snapshot Isolation is a widely adopted technique for transaction handling in database systems. This project explores the possibilities of this technique in two directions: in a distributed setting and on column stores. In the distributed setting, we assume that most of the transactions can be processed locally (i.e. only one node involved, e.g. because of partitioning) while only a very few need to access multiple nodes. In current implementations, all transactions are coordinated globally. This is a bottleneck. A large overhead is payed by all transactions but it is only caused by a few ones. The goal is to find an implementation of snapshot isolation that enables local transactions to be coordinated locally (i.e. without contacting a global coordinator) and at the same time provide global snapshot semantics for the distributed transactions. In terms of column stores, we are investigating on which granularity the snapshot should be provided.
Project members: Andreas Morf (Master Student), Stefan Hildenbrand, Donald Kossmann, Carsten Binnig (SAP), Franz Färber (SAP), Juchang Lee (SAP), Michael Mühle (SAP)
Travel Time in Column Stores - ECC Project
Column stores have been shown to outperform row stores significantly in a number of recent studies. In this project we investigate alternative approaches to extend column stores with versioning; i.e., the maintenance of historic data and time-travel queries. On the one hand, adding versioning can simplify the design of a column store because it provides a solution for the implementation of updates, traditionally a weak point in the design of column stores. On the negative side, implementing a versioned column store is challenging because it imposes a two dimensional clustering problem.
Project members: Martin Kaufmann, Donald Kossmann
UpStream
Most data stream processing systems model streams as append-only sequences of data elements. In this model, the application expects to receive a query answer on the complete stream. However, there are many situations in which each data element in the stream is in fact an update to a previous one, and therefore, the most recent value is all that really matters to the application. In UpStream, we explore how to efficiently process continuous queries under such an update-based stream data model.
Project members: Alexandru Moga, Nesime Tatbul
Xadoop - ECC Project
Due to legal requirements all the queries against the DWH databases in production need to be audited. Hence, all queries of the DWH are logged and written out into compressed XML files. The log files are currently kept for certain number of days before they are archived. The total volume of the log files before being archived has an estimated size of some 6 TB. Due to the large data volume, processing these queries is not straightforward. In this project we evaluate a cloud approach as a cost-effective alternative to a typical DWH-based approach. The idea is to take advantage of the Hadoop parallel filesystem and to process the queries with a data parallel processing approach using the query language PIG. The solution is based on the MapReduce technology. It efficiently processes terabytes of log files on commodity hardware and scales well.
Project members: Donald Kossmann, Georg Polzer (CS), Kurt Stockinger (CS).
XQuery in the Browser
Over the years, the browser has become a complete runtime environment for client-side programs. The main scripting language used towards this purpose is JavaScript, which was designed so as to program the browser. A lot of extensions and new layers have been built on top of it to allow e.g. DOM navigation and manipulation. However, JavaScript has become a victim of its own success and is used way beyond its possibilities, leading to increased code complexity. We suggest to reduce programming complexity by proposing XQuery as a client-side programming language. We wrote an extension for Microsoft Internet Explorer, based on the Zorba XQuery engine, which allows execution of XQuery script in the browser. An extension for Firefox is on the way as well.
Project members: Dana Florescu (Oracle), Ghislain Fourny (28msec), Donald Kossmann
Past members: Peter Fischer
- Project page: http://www.xqib.org
Zorba and MXQuery
Zorba and MXQuery are XQuery processors written in C++ and Java, respectively. Both systems implement the whole XQuery family of standards (XQuery 1.0, XQuery Updates, XQuery Scripting, XQuery Fulltext) with some extensions (e.g., REST, Web Services, windows and streaming capabilities, group by, etc.). Both engines can be embedded into other software systems. For instance, Zorba has been embedded in Web browsers and both Zorba and MXQuery have been embedded in an Eclipse plug-in. Currently, there are efforts to integrate Zorba into a database engine and therefore, have an integrated database and application server. The overall goal of the project is to make declarative database application program ubiquituous and to simplify the database application programming stack by providing a uniform programming environment for all application layers (presentation, application logic, and database backend).
Project members: Dana Florescu (Oracle), Ghislain Fourny (28msec), Donald Kossmann
Past members: Peter Fischer, Kyumars Sheykh Esmaili
Project pages:
Past Projects



