With Intel’s $17.6B acquisition of Altera and the deployment of FPGAs in major cloud service providers including Microsoft, Amazon, and IBM, we are entering a new era of customized computing. In future architectures and systems, it is anticipated that there will be a sea of heterogeneous accelerators customized for important application domains, such as machine learning and personalized healthcare, to provide better performance and energy-efficiency. Many research problems are still open, such as how to efficiently integrate accelerators into future chips and commodity datacenters, and how to program such accelerator-rich architectures and systems. In this talk, I will first briefly explain how customized accelerators can achieve orders-of-magnitude performance improvement, based on our open-source simulator PARADE [ICCAD 2015, tutorials at ISCA 2015 & MICRO 2016]. Second, I will present our initial work on CPU-accelerator co-design, where we provide efficient and unified address translation support between CPU cores and accelerators [HPCA 2017 Best Paper Nominee]. It shows that a simple two-level TLB design for accelerators plus the host core MMU for accelerator page walking can be very efficient. On average, it achieves 7.6x speedup over the naïve IOMMU and there is only 6.4% performance gap to the ideal address translation. Third, I will present the open-source Blaze system that provides programming and runtime support to enable easy and efficient FPGA accelerator deployment in datacenters [HotCloud 2016, ACM SOCC 2016]. Blaze abstracts accelerators-as- a-service, and bridges the gap between big data applications (e.g., Apache Spark programs) and emerging accelerators (e.g., FPGAs). By plugging a PCIe-based FPGA board into each CPU server, it can improve the system throughput by several folds for a range of applications. Finally, I will talk about some future research areas that can enhance architecture, programming, compiler, runtime, and security support to accelerator-rich architectures and systems.
Dr. Zhenman Fang is a postdoc in the Computer Science Department, UCLA, working with Prof. Jason Cong and Prof. Glenn Reinman. He is a member of the NSF/Intel funded multi-university Center for Domain-Specific Computing (CDSC) and the SRC/DARPA funded multi-university Center for Future Architectures Research (C-FAR). Zhenman received his PhD in June 2014 from Fudan University, China and spent the last 15 months of his PhD program visiting University of Minnesota at Twin Cities. Zhenman's research lies at the boundary of heterogeneous and energy-efficient accelerator-rich architectures, big data workloads and systems, and system-level design automation. He has published 10+ papers in top venues that span across computer architecture (HPCA, TACO, ICS), design automation (DAC, ICCAD, FCCM), and cloud computing (ACM SOCC). He received several awards, including a postdoc fellowship from UCLA Institute of Digital Research and Education, a best paper nominee of HPCA 2017, a best demo award (3 rd place) at the C-FAR center annual review. More details can be found in his personal website.