Advancement: Performance Improvement of Storage Systems using Machine Learning

Speaker Name: 
Chandranil Chakraborttii
Speaker Title: 
Ph.D. Candidate
Start Time: 
Friday, December 13, 2019 - 1:00pm
Location: 
Engineering 2, Room 213

Flash-based storage drives such as solid-state disks are replacing traditional spinning disk drives for an increasing number of applications. User interfacing cloud-based applications benefit from the low, sub-millisecond access latency of solid-state drives (SSDs). Virtually all smartphones are using Flash as their storage media due to the large density, small footprint, low power consumption, and shock resistance. SSDs provide faster boot times, higher read and write bandwidth as well as improved durability. Nevertheless, Flash storage devices show several disadvantages. Technology scaling, 3D integration as well as multi-level bit cells have continuously increased storage density and capacity however, this has also reduced the reliability of Flash. Flash also suffers from overheads such as garbage collection which can reduce write bandwidth and introduce high tail latency. Furthermore, while NAND flash devices provide significantly low! er latency than spinning disks, Flash has still orders of magnitude higher latency than DRAM.

As Flash storage devices are becoming increasingly complex and need to serve a wide range of applications with different characteristics, we find that traditional algorithms fail in
addressing the above challenges efficiently. On the other hand, we found that data collection techniques as implemented in modern SSDs and operating systems provide a lot of data for analysis. We observe that Machine learning (ML) techniques represent an excellent solution to the problems above. Simulations in hardware and software are capable to produce large, clean and automatically labeled data sets which is useful for training, while ML techniques can optimize algorithms in an application-specific way.

In this report, we propose three novel approaches to improve the performance of flash-based storage systems by predicting failures, and reducing average and tail latency of response. We use anomaly detection techniques to predict failures in flash drives. Initial results indicate our machine learning models are able to accurately predict failures achieving a recall of 1, thereby capturing all predicted failures. For reducing the average latency of response, we train neural network-based pre-fetchers to learn the complex and dynamic IO access patterns in real-time workloads. Initial results indicate our time series machine learning models were successful in learning the spatial IO access patterns, achieving up to 82.5% accuracy. We also propose a novel machine learning-based approach to reduce the garbage collection overhead in flash drives, thereby alleviating the problem of high tail latencies in flash devices.

Event Type: 
Adancement/Defense
Advisor: 
Prof. Heiner Litz
Graduate Program: 
Computer Science PhD