Industry Retreat 2024

The Industry Retreat 2024 will take place between the 14th and 17th of January 2024 at the Hotel Bellevue-Terminus, Engelberg.

Location
Program
Industry Talks

Location

The Systems Group’s Industry Retreat 2024 will be held in Engelberg at the
external pageHotel Bellevue-Terminus.

Program

The program, presentations, and posters will be available as PDF documents with a link that will be sent by e-mail.

Sunday 14th January

18:30 – 19:30 Dinner
19:30 – 20:30 Introductory session: Welcome, introduction, logistics, agenda,
and format.

Monday 15th January

08:00 – 09:00 Breakfast

09:00 – 10:30     Session 1: Serverless Computing
09:00 – 09:30     Ana Klimovic: Rethinking System Software for the New Era of
Cloud Computing
09:30 – 09:50 Tom Kuchler: Dandelion: fast, efficient function execution
09:50 – 10:10 Lazar Cvetkovic: Dirigent: cluster manager for serverless
10:10 – 10:30     David Cock: How to build a faster RPC with first-class message passing

10:30 – 11:00 Coffee Break

11:00 – 12:30     Session 2: New Hardware
11:00 – 11:30     Timothy Roscoe: Enzian update and overview
11:30 – 11:50     Anastasiia Ruzhanskaia: The return of Programmed I/O
11:50 – 12:10     Pengcheng Xu: Merging the OS and the NIC
12:10 – 12:30     Jasmin Schult: Formal validation of coherent interconnect protocols

12:30 ~ Lunch, meetings, and free time

17:00 – 18:30     Session 3: Heterogeneous computing
17:00 – 17:30     Gustavo Alonso: Data processing on heterogeneous platforms
17:30 – 17:50     Marko Kabić: Maximus: Modular Query Engine for Heterogeneous
Accelerated Systems
17:50 – 18:10     Bowen Wu: Efficiently processing large relational joins on GPUs
18:10 – 18:30     Vasileios Mageirakos: Towards Unstructured Analytics with Vector Data

19:00 – 20:00 Dinner

20:00 ~ Poster Session

Tuesday 16th January

08:00 – 09:00 Breakfast

09:00 – 10:30     Session 4: ML and the cloud
09:00 – 09:30     Tim Harris (Microsoft): Everything I should have known about large
language models -- a systems perspective
09:30 – 09:50     Xiaozhe Yao: DeltaZip: efficient LLM inference serving
09:50 – 10:10     Foteini Strati: Orion: GPU scheduling for high-utilization
10:10 – 10:30     Maximilian Böther: Modyn: platform for ML training on dynamic data

10:30 – 11:00   Coffee Break

11:00 – 12:30     Session 5: Accelerators and Smart NICs
11:00 – 11:30     Michaela Blott (AMD): Datacentric Computing
11:30 – 11:50     Dario Korolija: Virtualization of FPGAs in the cloud
11:50 – 12:10     Maximilian Heer: RoCE-BALBOA: Current and future work with RDMA
on FPGAs
12:10 – 12:30     Benjamin Ramhorst: Distributed and Heterogenous Hardware for
Machine Learning

12:30 ~ Lunch, meetings, and free time

17:00 – 18:30     Session 6: OS4RC
17:00 – 17:30     Roman Meier: Kirsch: a heterogeneous CHERI OS
17:30 – 17:50     Ben Fiedler: Finding cross-SoC vulnerabilities with Sockeye3
17:50 – 18:10     Daniel Schwyn: Declarative board managment
18:10 – 18:30     Zikai Liu: Trustworthy hw/sw drivers

19:00 – 20:00   Dinner

20:00 ~   Poster Session

Wednesday 17th January

08:00 – 09:00 Breakfast

09:00 – 10:30     Session 7: Privacy and Noodles
09:00 – 09:50     Eric Sedlar (Oracle) & Nora Hossle: Lightweight in-process multi-tenant
isolation in a cloud runtime based on Java
09:50 – 10:10     Nicolas Küchler: Managing Differential Privacy in Large Scale Systems
10:10 – 10:30     Hidde Lycklama: RoFL: Robustness of Secure Federated Learning

10:30 – 11:00 Coffee Break

11:00 – 12:30 Open session, brainstorming, and feedback.

12:30 – Lunch

14:02 – Departure Group Train for Zurich

Industry Talks

Everything I should have known about large language models -- a systems perspective

Tim Harris, Microsoft – Tuesday 16th at 9:00

For the last few years I have been working on optimizing the performance of serving large language models. In this talk I'll introduce exactly what that involves, going from the kinds of challenges that exist at the top of the stack (routing and scheduling user requests), down through intermediate levels (decomposing the process of text generation to actual execution of the model), and on to the implementation of the individual kernels used on GPUs for computation and communication. I will highlight some of the work we are doing in Microsoft to address these challenges, but my main aim here is to provide a whole-stack systems perspective on what this kind of workload looks like, and the kinds of resource demands that it has.

Tim Harris works at Microsoft, focused on performance optimization for inference of large language models, both for their services in AzureML, and for the ONNX runtime system on edge devices. More generally, his research interests span the stack encompassing distribution, language runtime systems, and operating systems, and with a particular emphasis on scalability and performance analysis. He is also an Affiliated Lecturer at the University of Cambridge Computer Lab where he teaches the undergraduate distributed systems course, and masters-level lectures on concurrency and synchronization. Prior to Microsoft he was with AWS and worked on large-scale storage performance and data analytics with Amazon S3. Further back, he led the Oracle Labs group in Cambridge, UK working on runtime systems for in-memory graph analytics, and the confluence of work on “big data” and ideas from high-performance computing.

Datacentric Computing

Michaela Blott, AMD – Tuesday 16th at 11:00

As compute performance and efficiency improve while workloads continue to grow, data movement rapidly becomes the number one performance and energy bottleneck in traditional compute architectures. An approach to tackle these challenges is to judiciously move bandwidth-constrained operations closer to the data. As a result, new, more intelligent, system components emerge, such as SmartNICs and computational memory. In this talk, we discuss these trends in more detail along with some of our more specific research in this space.

Michaela Blott is a Senior Fellow at AMD Research. She heads a team of international scientists driving exciting research in computer architectures for AI, green AI and agile compiler stacks. She earned a PhD from Trinity College Dublin and her Master’s degree from the University of Kaiserslautern, Germany, and brings over 25 years of experience in leading-edge computer architecture, advanced FPGA design and AI.

Lightweight in-process multi-tenant isolation in a cloud runtime based on Java

Eric Sedlar, Oracle and Nora Hossle, ETH – Wednesday 17th at 9:00

Current approaches to cloud isolation are based on applications written to a POSIX abstraction. In other words, any facility available in Linux must be supported for those apps. This is problematic in that there are many ways to leak hardware state into a program address space via Linux APIs (making VM migration error-prone and slow) and because there are many ways to share state with other tenants in the same OS outside the scope of process isolation. For this reasons, container isolation is considered insecure for cloud runtimes. Beyond that, however, even process isolation is too expensive for many multi-tenant use cases. For example, Oracle multitenant applications such as NetSuite and Fusion applications in some cases want to allow customers to upload custom code into a shared cloud runtime. While that code is generally very simple (i.e an side-effect free expression to define a new column in a table) sometimes it can be quite complex and involve stateful caches and deep codebases such as ML inferencing libraries.

GraalOS is an approach to restrict the isolation abstraction to a Java model that allows for more reliable virtualization and much less overhead to cross tenancy boundaries. Oracle Labs has been collaborating with the ETH Systems Group to look in exploring the design space for very lightweight in-process isolation of untrusted code (with some help from compiler toolchains) for use in GraalOS.

Eric Sedlar is Vice President & Technical Director of Oracle Labs. This position entails figuring out how to transfer research results from Labs research into Oracle products & services, as well as setting overall technical direction for new research projects in Oracle Labs. A few of the major research areas include: Compilers, language runtimes & frameworks: generally based on GraalVM and Graal Cloud Native (the Oracle distro of Micronaut), Application-Level and Cloud Operations Security, Machine Learning, AI & Large Language Models, and Databases and large scale analytics. Previously, he led the effort for XML & JSON-native storage inside Oracle. Eric has held various architecture and development management positions at Oracle since starting there in 1990. He holds over 68 patents, and has served on standards organizations for Oracle in the W3C and IETF. He co-authored the Best Paper at SIGMOD 2010 on architecture-sensitive search trees.