Advancement: Zoroaster: Using Generative Adversarial Networks to generate synthetic workloads

Speaker Name: 
Sinjoni Mukhopadhyay
Speaker Title: 
PhD Student (Advisor: Darrell Long)
Speaker Organization: 
Computer Science
Start Time: 
Thursday, December 13, 2018 - 1:00pm
End Time: 
Thursday, December 13, 2018 - 3:00pm
Location: 
Engineering 2, Room 380
Organizer: 
Darrell Long

Abstract:  Evaluating design alternatives for future storage systems, tuning parameter values of existing systems, and assessing capacity and performance requirements when setting up systems for production use, all require the ability to capture the essence of how a system is typically used. Collecting and disseminating real-time enterprise workloads is difficult from a logistical, security and privacy standpoint; yet obtaining synthetic workloads that adequately represent real-time workloads for tuning tasks is difficult. Most companies run small workloads to test their systems which focus on a subset of system-specific characteristics. This provides them with an incomplete picture of the system’s performance on different or larger workloads. Even when a trace is successfully collected and publicly shared, translating a trace from one architecture to another is typically done in an ad-hoc manner that leaves room for misinterpretation, leading to costly over-provisioning and system projections.

To address these challenges, we propose Zoroaster, a self-improving, synthetic storage workload generator that can produce workloads of customizable scale and hybrid types, given a set system characteristics. Zoroaster will use Generative Adversarial Networks (GANs) to create complex synthetic workloads that dynamically and accurately map to the characteristics of the workload class that each model is built to emulate. GANs are an adversarial network framework consisting of a pair of deep neural networks: the generative model and the discriminative model. Our system will use real-time enterprise workloads to train the discriminative model. A randomly initialized generative model is then pitted against the pre-trained discriminative model, which learns to determine whether a sample is from the model distribution or the data distribution. The generative model will then create synthetic workloads with characteristics similar to those of the real-time workloads as outputs. Zoroaster will be considered successful if the generated models are statistically indistinguishable from real trace data at scale.