HADP - Materials - Fall 2017

 Back to course website

 

 


Talks

"Parallel and Distributed Deep Learning" - Dr. Tal Ben-Nun

 "Opportunities and Challenges of Emerging Memory Technologies" - Prof. Onur Mutlu

"How to present research in a seminar" - Prof. Torsten Hoefler

 


Reading material

 

  1. [1] “A Scalable Processing-in-Memory Accelerator for Parallel Graph Processing”. ISCA 2015 
     
  2. [2] “Heterogeneity-aware Distributed Parameter Servers”. SIGMOD 2017 
     
  3. [3] “Asynchronous Methods for Deep Reinforcement Learning”. ICML 2016 
     
  4. [4] “ZNNi: Maximizing the Inference Throughput of 3D Convolutional Networks on CPUs and GPUs”. SC 2016
     
  5. [5] “A Many-core Architecture for In-Memory Data Processing”. MICRO 2017
     
  6. [6] “Hogwild!: A Lock-Free Approach to Parallelizing Stochastic Gradient Descent”. NIPS 2011
     
  7. [7] “Scalable Kernel Density Classification via Threshold-Based Pruning”, SIGMOD 2017
     
  8. [8] “Energy-Efficient Acceleration of Big Data Analytics Applications Using FPGAs”. BIG DATA 2015
     
  9. [9] “DianNao: A Small-Footprint High-Throughput Accelerator for Ubiquitous Machine-Learning”. ASPLOS 2014
     
  10. [10] “XGBoost: A Scalable Tree Boosting System”, KDD 2016
     
  11. [11] “Can Decentralized Algorithms Outperform Centralized Algorithms? A Case Study for Decentralized Parallel Stochastic Gradient Descent”. NIPS 2017
     
  12. [12] “Genomic Data Clustering on FPGAs for Compression”. ARC 2017
     
  13. [13] “Scalable Multi-Access Flash Store for Big Data Analytics”. FPGA 2014
     
  14. [14] “The microarchitecture of a real-time robot motion planning accelerator”. MICRO 2016
     
  15. [15] “Train longer, generalize better: closing the generalization gap in large batch training of neural networks”, NIPS 2017
     
  16. [16] “Genetic CNN”; and “Evolving Deep Neural Networks”. arXiv 2017
     
  17. [17] “In-Datacenter Performance Analysis of a Tensor Processing Unit”. ISCA 2017
     
  18. [18] “Hybrid Row-Column Partitioning in Teradata”. PVLDB 2016
     
  19. [19] ”Demystifying automata processing: GPUs, FPGAs or Micron's AP”. ISC 2017
     
  20. [20] “WarpLDA: a Cache Efficient O(1) Algorithm for Latent Dirichlet Allocation”, VLDB 2016
     
  21. [21] “Bridging the Gap between HPC and Big Data frameworks”. PVLDB 2017
     
  22. [22] “PipeLayer: A Pipelined ReRAM-Based Accelerator for Deep Learning”. HPCA 2017
     
  23. [23] “Accelerating Apache Spark Big Data Analysis with FPGAs”. CBDCom 2017
     
  24. [24] “HippogriffDB: Balancing I/O and GPU Bandwidth in Big Data Analytics”. PVLDB 2016
     
  25. [25] “Predicting statistics of asynchronous SGD parameters for a large-scale distributed deep learning system on GPU supercomputers”. BIG DATA 2016
     
  26. [26] “Gaia: Geo-Distributed Machine Learning Approaching LAN Speeds”, NSDI 2017
     
  27. [27] “Live Video Analytics at Scale with Approximation and Delay-Tolerance”. NSDI 2017
     
  28. [28] “FASTEN: An FPGA-based Secure System for Big Data Processing”. IEEE Design & Test PP(99)
     
  29. [29] “YellowFin: YellowFin and the Art of Momentum Tuning”. arXiv 2017
     
  30. [30] “BlueDBM: An Appliance for Big Data Analytics”. ISCA 2015
     
  31. [31] “Understanding and Optimizing Asynchronous Low-Precision Stochastic Gradient Descent”. ISCA 2017
     
  32. [32] “Compressed Linear Algebra for Large-Scale Machine Learning”. PVLDB 2016
     
  33. [33] “LDA*: A Robust and Large-scale Topic Modeling System”. VLDB 2017
     
  34. [34] “Ambit: In-Memory Accelerator for Bulk Bitwise Operations Using Commodity DRAM Technology”. MICRO 2017
     
  35. [35] “Scalable Linear Algebra on a Relational Database System”. ICDE 2017
     
  36. [36] “SERF: Efficient Scheduling for Fast Deep Neural Network Serving via Judicious Parallelism”. SC 2016
     
  37. [37] “A Cost-based Optimizer for Gradient Descent Optimization”, SIGMOD 2017
     
  38. [38] “The TileDB Array Data Storage Manager”. PVLDB 2017
     
  39. [39] “Deep learning with Elastic Averaging SGD”. NIPS 2015