Peter Alvaro to work with eBay on Lineage-Driven Fault Injection project

Thursday, September 6, 2018
Erin Foley

Peter Alvaro, assistant professor in the Baskin School of Engineering at UC Santa Cruz, will collaborate with eBay on a project with the goal of improving the reliability of distributed systems. The project, “Lineage-Driven Fault Injection” (LDFI), is also supported by the National Science Foundation through Alvaro’s five-year CAREER grant.

Distributed systems, a network of computers that work together and appear as a single unit to the end-user, increase scalability to meet an increasing demand for data. Despite these advantages,  there are problems inherent to these systems such as machine crashes and interrupted communication.

Alvaro, Principal Investigator of Disorderly Labs, will address these issues with LDFI, which utilizes observability infrastructure such as logs, traces, and data lineage to build models of how a distributed system tolerates faults. Then, using these models, LDFI selects and deliberately injects faults into the system in order to identify bugs.

Unlike other state-of-the-art techniques, which involve manually injecting faults or injecting them at random, LDFI identifies faults likely to impact the system by observing successful runs and learns how the system masks or works around faults. This allows it to avoid injecting faults that the system is already known to tolerate and focus instead on faults likely to impact the system.

With his research, Alvaro hopes to show that LDFI is more effective at identifying user-visible bugs in production systems and software than other experiment selection techniques. “Ideally, we would like to measure the effect of LDFI on eBay’s overall availability,” Alvaro explained.

Faults in distributed systems can have a profound impact on e-commerce companies like eBay. An effective method of identifying user-visible faults in distributed networks would help eBay prevent, detect, and act on bugs that can negatively affect the experience of its 175 million users.

eBay’s investment in the LDFI project will enable the company to deliver better customer experiences with healthy software pipelines, says Ravi Punati, Senior Manager of Site Reliability Engineering at eBay. “Funding Disorderly Labs will enable Dr. Alvaro to focus on unsolved problems in distributed systems,” he said.

The partnership will provide Alvaro’s research team access to a large distributed system on which to test LDFI and allows Alvaro to stay in contact with those in industry. “In systems research, periodic ‘re-grounding’ with industry is essential. Otherwise, we researchers would have misgivings about whether we are identifying and solving real-world problems facing practitioners in the field,” Alvaro said.

Alvaro is currently seeking two graduate students to assist him with his research. The project will provide real-world experience to these students, who will spend a full academic year focusing on innovation, followed by a summer spent focusing on results.