Ce Zhang gave the talk entitled "Accessible Data Sciences with Efficient Data Systems" at IBM Zurich Research Lab.
One important problem for the current state of data science is that many of the techniques needed to unleash the next big thing are available but still far from accessible. Specifically, the current machine-learning ecosystems are difficult to use by non-computer science users and they are still far from achieving the full potential that can be provided by modern hardware. With more than five ongoing data sciences applications here at ETH Zurich, ranging from genomics, social sciences, and astronomy, our dream is to design the next generation of data science ecosystems that are fast, scalable, and easier to use. In this talk, I will first describe the abundant opportunities for data sciences at ETH Zurich, and then describe two enabling techniques that are being developed by my group. The general direction of these techniques is the co-design of machine learning (or artificial intelligence) with modern hardware and systems. I will talk about our recent work that introduced a data structure for dense linear regression. It can potentially reduce the memory bandwidth by 20x while training. Then I will introduce a novel database architecture, which makes the production system of a leading security company 100x faster. It contains an SMT solver to answer queries that it was not originally designed for.