Stay Informed:
Baskin Engineering COVID-19 Information and Resources
Campus Roadmap to Recovery
Zoom Links: Zoom Help | Teaching with Zoom | Zoom Quick Guide

Defense: Optimization of Item Selection with Prediction Uncertainty

Speaker Name: 
Liang Dai
Speaker Title: 
Ph.D. Candidate
Start Time: 
Thursday, December 12, 2019 - 10:00am
Engineering 2, Room 280

Selecting items from a candidate pool to maximize the total return is a classical problem, which is faced by people frequently in real life, especially in IT industry. For example, web UI designers always try to find the best web design among a bunch of options to display to users. Sometimes, the selection needs to be personalized, e.g. Google needs to select users’ most interested ads to display to them given their historical online behaviors. No matter whether this is personalized or not, this selection problem would a big challenge if the true value of each item is unknown, which can only be estimated from observed historical data.

There are a large volume of significant research about building prediction models which are trained on historical data to estimate item values. Given data volume and computation resource restrictions, people can choose different models, e.g. deep neutral network, gradient boosting tree, or logistic regression to solve the problems. We will not dive into this area too much here. Instead, our focus is how to maximize the total return given these predictions, especially taking the prediction uncertainties into account for the value optimization.

In this dissertation, two kinds of item selection scenarios are discussed. The first one is inexplorable item selection. In this situation, we do not have opportunity to get interactive feedback after selecting items for exploration. The second one is explorable item selection, which provides us opportunities to do some exploration during selection process.
For the first scenario, the candidate pool is usually too large for us to do exploration. Actually, not to mention exploration, but estimating each item value through a complex prediction model is sometimes almost impossible due to the need of realtime response. For example, Apple needs to estimate users’ interested apps and recommends them to users when they visit Apple store. Google needs to select ads to display to users given a users’ search queries. There are millions of candidates needing to be estimated from prediction models. However, it is very challenging to support such a large scale of model prediction under the low-latency constraint. Besides that, to have a good prediction accuracy, the models used in industry are getting more and more complex, e.g hidden neutrons and layers of deep neural network increases rapidly in real applications. All of these make it infeasible to evaluate all candidates through one single complex model in large scale application. To! solve this problem, people usually adopt cascading waterfall filtering method to filter items sequentially, which means instead of using one complex model to estimate the values of all candidates, sort and filter them, multiple waterfalls are adopted to filter out candidates sequentially. For example, a simple model is used in the first waterfall to estimate candidates’ values, and then sort and filter to get a small subset from all candidates. The survivals are then passed to the second waterfall to be estimated by a more complex model. Intuitively, this waterfall filtering method provides a good trade-off between infrastructure cost and prediction accuracy, which can save lots of computation resources, and simultaneously also select most promising items accurately. However, there is no systematic study about how to efficiently choose the number of waterfalls and how many filtered items in each waterfall. People usually tune the settings of this system heuristically thr! ough personal experience or online experiments. We will discus! s these problems and propose their solutions under different constraints in this dissertation.

There are also some scenarios in which we are able to explore to help us select items. A typical case is online experimentation, which is widely used to test and select items in real applications. In this situation, we can get interactive feedback to evaluate items. Taking online experiment for example, we usually randomly segments users into several groups, show them different candidates, and then compare the overall performance of each candidate to find the item with the largest value. Among all designs, A/B testing, which usually segments users into two groups with equal probabilities to measure the difference between two versions of a single variable, is the most popular. For example, in order to compare the impact of an ad versus another, or counterfactual, we need to see the impact of exposing a user to viewing the first ad, and not the second, and compare with the converse situation. However, a user cannot both see the first ad and not see it. Consequently, we need to! create two “statistically equivalent populations” and expose users randomly to one or the other. This method is straightforward. However, the defect of this method is also obvious: to measure both versions, this method cannot expose all users to the best version, which leads to potential value loss. Some multi-armed bandit algorithms, which target for maximizing the total return in experiment, are proposed for improvement. However, these methods does not take the reliability of the final result in the experiment and the corresponding impact on the subsequent item selection into account. To solve this problem, we develop algorithms to balance the trade-off between two goals: reducing statistical uncertainty and maximizing cumulative reward, to maximize the total reward of item selection in both the current experiment stage, as well as the post-experiment stage.

Event Type: 
Ram Akella
Graduate Program: 
Technology & Information Management PhD