Stay Informed:
Baskin Engineering COVID-19 Information and Resources
Campus Roadmap to Recovery
Zoom Links: Zoom Help | Teaching with Zoom | Zoom Quick Guide

AM Seminar: Winnowing with Gradient Descent

Speaker Name: 
Manfred Warmuth
Speaker Title: 
Professor Emeritus
Speaker Organization: 
Visiting faculty member, Google Brain
Start Time: 
Monday, March 9, 2020 - 4:00pm
End Time: 
Monday, March 9, 2020 - 5:00pm
Location: 
BE 372
Organizer: 
Abhishek Halder

Abstract

The performance of multiplicative updates is typically logarithmic in the number of features when the targets are sparse. Strikingly, we show that the same property can also be achieved with gradient descent (GD) updates. We obtain this result by rewriting the non-negative weights w_{i} of multiplicative updates by ρ_{i}^{2} and then performing a gradient descent step w.r.t. the new ρ_{i} parameters. We apply this method to the Winnow update, the Hedge update, and the unnormalized and normalized exponentiated gradient (EG) updates for linear regression. When the original weights w_{i} are scaled to sum to one (as done for Hedge and normalized EG), then in the corresponding reparameterized update, the ρ_{i} parameters are now divided by ||ρ||_{2} after the gradient descent step. We show that these reparameterizations closely track the original multiplicative updates by proving in each case the same online regret bounds (albeit in some cases, with slightly different constants). As a side, our work exhibits a simple two-layer linear neural network that, when trained with gradient descent, can solve a certain sparse linear problem (known as the Hadamard problem) with exponentially fewer examples than any kernel method. This is joint work with Ehsan Amid.

Bio

Mnfred Warmuth was a Professor of Computer Science for over 35 years. His main research area is machine learning. He is well-known for analyzing algorithms that are online in the sense that they continuously adapt to a changing data stream. He developed a technique of motivating and analyzing online algorithms using Bregman divergences and devised (with co-authors) some key new algorithms such as the Weighted Majority, the Exponentiated Gradient Descent, and the Last Step Minimax. He also introduced the matching loss and compression schemes, and introduced a list of core open problems in machine learning. He is currently a visiting faculty member at Google Brain in Mountain View. 

Event Type: 
Event