Abstract:

P-values in scientific studies are used to determine whether a null hypothesis formulated before the performance of the study is to be rejected or not. In exploratory studies, p-values enable the recognition of any statistically noteworthy findings. An understanding of p-values is essential for the evaluation of scientific articles. In the first part of my talk, I lecture the definition, calculation, interpretation and criticism of p-values.



In the second part of my talk, I present my journey of expanding research interests on massive data. Spline smoothing is a promising approach for extracting information and identifying subtle patterns from noisy data. One

of the limitations of smoothing spline is its heavy computational cost. To alleviate the storage and computation burden, my doctoral research developed two approaches for fitting smoothing splines to large datasets,

including divide and recombine (D&R) method and low rank approximation method via eigen-system. Extensive simulations show that both approaches are scalable and have comparable performance as the method that uses the

whole data. We provide approximation error bounds for low rank approach in both univariate and multivariate cases. As an extension of developing statistical methods for big data, my post-doctoral research involves predicting functional effect of noncoding region in human genome and quantifying phenotype risk from electronic health records (EHR). We describe here a co-localization approach that aims to identify constrained sequences that co-localize with tissue/cell type specific regulatory regions. For phenotype derivation, we propose unsupervised methods and applications to phenotypic and genomic data on approximately 100,000 individuals in eMERGE network, and focus on several complex diseases, including Chronic Kidney Diseases, Coronary Artery Disease, and a few

others. We show that the proposed methods can be helpful in patient risk stratification, and can help identify undiagnosed cases based on phenotypicfeatures available in the EHR.

Bio:

Dr. Danqing Xu is a Post-doctoral Research Scientist in the Department of Biostatistics at Columbia University. Before joining Columbia University, Danqing received her Ph.D. degree in Statistics from University of California, Santa Barbara (UCSB) in June 2018, supervised by Yuedong Wang. Danqing’s current research interests lie in the interdisciplinary area of statistical learning, statistics and genetics. The application areas of her research include genetic and genomic data analysis (e.g. predicting functional effect of non-coding variants), integrative analysis of genetics and electronic health records data, and biomedical data analysis. Her goal as a scientist is to develop and promote robust statistical learning methods to advance the science of public health and medicine.