Advancement: Bayesian Structured Variable Selection for High Dimensional Data

Speaker Name: 
Laura Baracaldo
Speaker Title: 
PhD Student
Speaker Organization: 
Applied Mathematics & Statistics
Start Time: 
Wednesday, June 13, 2018 - 9:00am
End Time: 
Wednesday, June 13, 2018 - 11:00am
Engineering 2, Room 553
Rajarshi Guhaniyogi

Abstract:  With recent technological progress, big data are ubiquitous in many areas including biology, genetics, medicine, finance, social science, environmental science, and so on. To extract useful information from such data and build an interpretable model with high prediction power, variable selection needs to be employed. Most of the existing literature has been focused on the selection of main predictors exclusively, however main effects may not be sufficient to characterize the relationship between the response and covariates in complex situations. These main effects may behave in a more complex way, such as work together explaining the variable of study. In these cases, interactions between predictors when included as covariates may help explaining the response surface more accurately. The motivation for our work arises from the need to develop a Bayesian, model-based algorithm, capable to capture the hierarchical structure generated by the! interactions between predictors under the context of high dimensional data. We propose a new sequential spike and slab prior distribution on the model coefficients to ensure the strong heredity condition, then we present some simulation studies based on different levels of sparsity of the design matrix and different covariance structures for the predictors and compare them to other state-of-the-art methods. Finally, we illustrate the method by using a data set that studies the longitude of the location of origin of some musical tracks based on different continuous audio features.