COMPASS TALK by Tamás Hauer (Google Zurich): Meaningful Availability

21.11.2019 10:00

Thursday, 21. November 2019, 10:00-11:00 in CAB E 72

Speaker: Tamás Hauer, Technical Program Manager, SRE, Google Zurich

Title: Meaningful Availability


High availability is a critical requirement for cloud applications; having a metric that meaningfully captures it is useful for users and system developers. Commonly used benchmarks are either too complex to be actionable or fail to capture the true perception of users, which leads to miscommunication, and suboptimal engineering decisions. We propose two improvements to availability measurements. A novel metric, "user-uptime" directly models user-perceived availability and avoids the bias often found in alternatives. A presentation paradigm "windowed availability" supports a holistic view by integrating timescales from per-minute to monthly granularity and allows to distinguish between many short periods of unavailability or fewer longer ones. We demonstrate the benefits of windowed user-uptime on synthetic models and on production data from Google's G Suite. Today, all G Suite products are instrumented with this novel metric, it is used both to support engineering decisions and to communicate system health to enterprise customers.


Tamás Hauer holds a PhD in theoretical physics. After a brief career as a physicist at the Max-Planck-Institut and CERN, he worked as a research associate at the department of Applied Computer Science of the University of West of England, leading a group in the area of semantic web, grid services and health informatics. He was work package leader of the European FP5 Mammogrid and FP6 Health-e-Child projects. As the CTO of the Swiss startup Prodema Medical / McMRI, he led the development of appMRI, a brain MRI image analysis platform to launch and certification as a class IIa medical device. He joined Site Reliability Engineering of Google Zürich in 2016 as a Technical Program Manager, his current interest is data analysis of service level indicators and service level objectives.