HAML - Materials (Spring 2018)

 


Reading material

  1. [1] Regularized Evolution for Image Classifier Architecture Search. Arxiv 2018.
  2. [2] Deep Learning at 15PF. SC 2017.
  3. [3] Poseidon: An Efficient Communication Architecture for Distributed Deep Learning on GPU Clusters. USENIX ATC 2017.
  4. [4] Population Based Training of Neural Networks. Arxiv 2017.
  5. [5] AMPNet: Asynchronous Model-Parallel Training for Dynamic Neural Networks. Arxiv 2017.
  6. [6] Deep Gradient Compression: Reducing the Communication Bandwidth for Distributed Training. Arxiv 2017.
  7. [7] QSGD: Communication-Efficient SGD via Gradient Quantization and Encoding. Arxiv 2017.
  8. [8] Predicting statistics of asynchronous SGD parameters for a large-scale distributed deep learning system on GPU supercomputers. Big Data 2016.
  9. [9] Model Accuracy and Runtime Tradeoff in Distributed Deep Learning: A Systematic Study. Arxiv 2017.
  10. [10] UDP: a programmable accelerator for extract-transform-load workloads and more. MICRO 2017.
  11. [11] KV-Direct: High-Performance In-Memory Key-Value Store with Programmable NIC. SOSP 2017.
  12. [12] A Cloud-Scale Acceleration Architecture. IEEE Computer Society 2016.
  13. [13] A Reconfigurable Fabric for Accelerating Large-Scale Datacenter Services. ISCA 2014.
  14. [14] Caribou: Intelligent Distributed Storage. VLDB 2017.
  15. [15] Big Data Processing: Scalability with Extreme Single-Node Performance. Big Data 2016.
  16. [16] SCNN: An Accelerator for Compressed-sparse Convolutional Neural Networks. ISCA 2017.
  17. [17] Multi-GPU Graph Analytics. IPDPS 2017.
  18. [18] GraphBIG: Understanding Graph Computing in the Context of Industrial Solutions. SC 2015.
  19. [19] On the performance and energy efficiency of sparse linear algebra on GPUs. High Performance Computing Applications Journal 2016.
  20. [20] GraphOps: A Dataflow Library for Graph Analytics Acceleration. FPGA 2016.
  21. [21] High-Performance Recommender System Training Using Co-Clustering on CPU/GPU Clusters. ICPP 2017.
  22. [22] Graph analytics and storage. Big Data 2014.
  23. [23] Accelerating Large-Scale Graph Analytics with FPGA and HMC. FCCM 2017.
  24. [24] PRIME: a novel processing-in-memory architecture for neural network computation in ReRAM-based main memory. ISCA 2016.
  25. [25] Google Workloads for Consumer Devices: Mitigating Data Movement Bottlenecks. ASPLOS 2018.
  26. [26] Ambit: In-Memory Accelerator for Bulk Bitwise Operations Using Commodity DRAM Technology. MICRO 2017.
  27. [27] DRISA: a DRAM-based reconfigurable in-situ accelerator. MICRO 2017.
  28. [28] BlueDBM: Distributed Flash Storage for Big Data Analytics. ACM Trans. Computing Systems 2016.
  29. [29] In-memory Data Flow Processor. PACT 2017.
  30. [30] PIM-Enabled Instructions: A Low-Overhead, Locality-Aware Processing-in-Memory Architecture. ISCA 2015.
  31. [31] GraphR: Accelerating Graph Processing Using ReRAM. HPCA 2018.
  32. [32] SCALEDEEP: A Scalable Compute Architecture for Learning and Evaluating Deep Networks. ISCA 2017.
  33. [33] Gather-Scatter DRAM: In-DRAM Address Translation to Improve the Spatial Locality of Non-unit Strided Accesses. MICRO 2015.