Hardware Architectures for Machine Learning







The seminar will start on February 23rd with an overview of the general topics and the intended format of the seminar. Students are expected to present one paper in a 30 minute talk and complete a 4 page report on the main idea of the paper and how they relate to the other papers presented at the seminar and the discussions around those papers. The presentation will be given during the semester in the allocated time slot. The report is due on the last day of the semester (June 2nd). 

Attendance to the seminar is mandatory to complete the credit requirements. Active participation is also expected, including having read every paper to be presented in advance and contributing to the questions and discussions of each paper during the seminar.

Course Material




 Forsberg Björn Alexander  GraphPIM: Enabling Instruction-Level PIM Offloading in Graph Computing Frameworks, HPCA 2017. http://nailifeng.org/pubs/graphpim.pdf  16.3.2017   Torsten Hoefler
 Jokic Petar  Neurocube: A Programmable Digital Neuromorphic Architecture with High-Density 3D Memory, ISCA 2016. http://dl.acm.org/citation.cfm?id=3001178  16.3.2017  Onur Mutlu
 Mayrhofer Lucas Bastien  Towards Pervasive and User Satisfactory CNN across GPU Microarchitectures, HPCA 2017. http://plaza.ufl.edu/huyang.ece/papers/hpca_final.pdf  23.3.2017  Torsten Hoefler
 Nielsen Carsten Lau  Optimizing FPGA-based Accelerator Design for Deep Convolutional Neural Networks, FPGA 2015. http://dl.acm.org/citation.cfm?id=2689060  23.3.2017  Tal Ben Nun
 Tifrea Alexandru  Project Adam: Building an Efficient and Scalable Deep Learning Training System, OSDI 2014. https://www.usenix.org/node/186213  30.3.2017  Ce Zhang
 Shekhar Saurav  Omnivore: An Optimizer for Multi-device Deep Learning on CPUs and GPUs. https://arxiv.org/pdf/1606.04487.pdf  30.3.2017  Ce Zhang
 Hu Yuhuang  Deep learning with limited numerical precision, ICML 2015. https://arxiv.org/abs/1502.02551  6.4.2017  Tal Ben Nun
 Andri Renzo  EIE: Efficient Inference Engine on Compressed Deep Neural Network, ISCA 2016. http://dl.acm.org/citation.cfm?id=3001163  13.4.2017  Onur Mutlu
 Cavigelli Lukas Arno Jakob  FlexFlow: A Flexible Dataflow Accelerator Architecture for Convolutional Neural Networks, HPCA 2017. http://www.carch.ac.cn/~yan/download/LuW_HPCA_2017.pdf  13.4.2017  Muhsen Owaida
 Palossi Daniele  Transparent Offloading and Mapping (TOM): Enabling Programmer-Transparent Near-Data Processing in GPU Systems, HPCA 2017. http://ieeexplore.ieee.org/document/7551394/  27.4.2017  OnurMutlu
 Patel Minesh Hamenbhai  Triggered Instructions: A Control Paradigm for Spatially-Programmed Architectures, ISCA 2013. http://dl.acm.org/citation.cfm?id=2485935  27.4.2017  Gustavo Alonso
 Burman Gregory  Multi-FPGA Accelerator for Scalable Stencil Computation with Constant Memory Bandwidth, TPDS 2014. http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=6470606  4.5.2017  Muhsen Owaida
 Scheidegger Florian Michael  Efficient Frequent Item Counting in Multi-Core Hardware, KDD 2012. http://wan.poly.edu/KDD2012/docs/p1451.pdf  4.5.2017  Gustavo Alonso
 Wegmayr Viktor  Quantized Neural Networks: Training Neural Networks with Low Precision Weights and Activations. https://arxiv.org/pdf/1609.07061.pdf  11.5.2017  Ce Zhang


Seminar Hours

Thursdays, 15:00-17:00 in LEE C 104