One-Day Short Course on Bayesian Modeling, Inference and Prediction Presenter: David Draper Department of Applied Mathematics and Statistics University of California, Santa Cruz Fri 10 Dec 2004, 8am-5.30pm (7 hours of material covered in an 9.5-hour time slot, with a 1-hour check-in and coffee period first thing, 15-20 minutes of breaks in each of the morning and afternoon sessions, and a one-hour break for lunch) Location: Hotel@MIT 20 Sidney Street, Cambridge, Massachusetts USA 02139 Telephone: 617.577.0200 www.hotelatmit.com/home/home.html Sponsored by the Boston Chapter of the American Statistical Association (ASA) Initial registration deadline: --> Tuesday 23 Nov 2004 (please see below for details) Summary of Short Course Contents This is an award-winning short course on Bayesian modeling, inference and prediction, based on a series of case studies and assuming no previous exposure to Bayesian ideas or methods. Topics will include a review of classical, frequentist, and Bayesian definitions of probability; sequential learning via Bayes' Theorem; coherence as a form of internal calibration; Bayesian decision theory via maximization of expected utility; review of frequentist modeling and maximum-likelihood inference; exchangeability as a Bayesian concept parallel to frequentist independence; prior, posterior, and predictive distributions; Bayesian conjugate analysis of binary outcomes, and comparison with frequentist modeling; integer-valued outcomes (Poisson modeling); continuous outcomes (Gaussian modeling); multivariate unknowns and marginal posterior distributions; introduction to simulation-based computation, including rejection sampling and Markov chain Monte Carlo (MCMC) methods; MCMC implementation strategies; introduction to Bayesian hierarchical modeling; fitting and interpreting fixed- and random-effects Poisson regression models; hierarchical modeling with latent variables as an approach to mixture modeling; Bayesian model specification via out-of-sample predictive validation (as a form of external calibration) and the deviance information criterion (DIC). The case studies will be drawn from medicine (diagnostic screening for HIV; hospital-specific prediction of patient-level mortality rates; hospital length of stay for premature births; a randomized controlled trial of in-home geriatric assessment) and the physical sciences (measurement of physical constants), but the methods illustrated will apply to a broad range of subject areas in the natural and social sciences, business (including topics of direct relevance to pharmaceutical companies), and public policy. The course will liberally illustrate user-friendly implementations of MCMC sampling via the freeware program WinBUGS. The course is intended mainly for people who often use statistics in their research; some graduate coursework in statistics will provide sufficient mathematical background for participants. To get the most out of the course, participants should be comfortable with hearing the course presenter mention (at least briefly) (a) differentiation and integration of functions of several variables and (b) discrete and continuous probability distributions (joint, marginal, and conditional) for several variables at a time, but all necessary concepts will be approached in a sufficiently intuitive manner that rustiness on these topics will not prevent understanding of the key ideas. Registration Fee: $ 95 for full-time students $145 for non-student members of the Boston ASA chapter $195 for all other participants (The registration fee includes extensive materials [see below], lunch, and refreshments for AM and PM breaks. As a point of reference, the LearnSTAT program run by the national office of the American Statistical Association charges $500 for ASA members and $600 for non-members for one-day courses like this one, with no special fee for students.) Participants will be provided with 225-250 pages of materials (essentially they will receive a copy of the draft book the short course presenter is writing on this topic), including detailed computer sessions with (a) a leading statistical computing (freeware) package (R); (b) one of the two most widely used symbolic computing packages (Maple); and (c) WinBUGS. Initial registration deadline: Tuesday 23 Nov 2004 If not enough people have paid their registration fees by this date, the course may need to be postponed (in that case people who have registered by 23 Nov will have their checks returned by mail). Assuming that the course does go ahead, as is highly likely, registration will remain open until the day of the short course, as long as there is still room for additional participants, but --> to ensure yourself a place, please register early. Registration: Please send check payable to "Boston Chapter of the ASA" or "BCASA" to BCASA, c/o Michael Posner, Treasurer 313 Summit Ave #3, Brighton, MA 02135 Please include your name, address, phone number, and e-mail with your check. Additional information may be found online at www.amstat.org/chapters/boston/ The web page for the short course is www.ams.ucsc.edu/~draper/Boston2004.html Brief Biography of Instructor David Draper is a Professor in, and Chair of, the Department of Applied Mathematics and Statistics in the Baskin School of Engineering at the University of California, Santa Cruz. From 2001 to 2003 he served as the President-Elect, President, and Past President of the International Society for Bayesian Analysis (ISBA). His research is in the areas of Bayesian inference and prediction, model uncertainty and empirical model-building, hierarchical modeling, Markov Chain Monte Carlo methods, and Bayesian semi-parametric methods, with applications mainly in health policy, education, and environmental risk assessment. When he gave an earlier version of this short course at the Anaheim Joint Statistical Meetings (JSM) in 1997 it received the 1998 ASA Excellence in Continuing Education award, and a short course he gave on intermediate and advanced-level topics in Bayesian hierarchical modeling at the San Francisco JSM in 2003 received the 2004 ASA Excellence in Continuing Education award. He has won or been nominated for major teaching awards everywhere he has taught (the University of Chicago; the RAND Graduate School of Public Policy Studies; the University of California, Los Angeles; the University of Bath (UK); and the University of California, Santa Cruz). He has a particular interest in the exposition of complex statistical methods and ideas in the context of real-world applications. Approximate Structure of the Short Course 8.00-9.00am: Check-in and coffee 9.00-9.30am: Quantification of uncertainty. Classical, frequentist, and Bayesian definitions of probability. Subjectivity and objectivity. Sequential learning; Bayes' Theorem. Inference (science) and decision-making (policy and business). Bayesian decision theory; coherence. Maximization of expected utility. Case study: Diagnostic screening for HIV. 9.30-11.00am: Exchangeability and conjugate modeling. Probability as quantification of uncertainty about observables. Binary outcomes. Review of frequentist modeling and maximum-likelihood inference. Exchangeability as a Bayesian concept parallel to frequentist independence. Prior, posterior, and predictive distributions. Inference and prediction. Coherence and calibration. Conjugate analysis. Comparison with frequentist modeling. Case Study: Hospital-specific prediction of patient-level mortality rates. 11.00-11.15am: Coffee break 11.15am-noon: Integer-valued outcomes; Poisson modeling. Case Study: Hospital length of stay for birth of premature babies. noon-12.30pm: Continuous outcomes; Gaussian modeling. Multivariate unknowns; marginal posterior distributions. Case Study: Measurement of physical constants (NB10). 12.30-1.30pm: Lunch break 1.30-3.30pm: Simulation-based computation. IID sampling; rejection sampling. Introduction to Markov chain Monte Carlo (MCMC) methods: the Metropolis-Hastings algorithm and Gibbs sampling. User-friendly implementation of Gibbs and Metropolis-Hastings sampling via BUGS and WinBUGS. MCMC implementation strategies. Case Study: the NB10 data revisited. 3.30-3.45pm: Coffee break 3.45-4.40pm: Hierarchical models: formulation, selection, and diagnostics. Poisson fixed-effects modeling. Additive and multiplicative treatment effects. Expansion of a simple model that does not satisfy all diagnostic checks, by embedding it in a richer class of models of which it's a special case. Random-effects Poisson regression: hierarchical modeling with latent variables as an approach to mixture modeling. Case study: a randomized controlled trial of in-home geriatric assessment (IHGA). 4.40-4.45pm: Get-up-and-move-around break 4.45-5.30pm: Bayesian model specification. Predictive diagnostics. Model selection as a decision problem. Bayesian cross-validation as an approach to diagnostics: comparing outcomes from omitted cases with their predictive distributions given the rest of the data. 3CV: 3-way cross-validation. The log score as a model-selection method, and its relationship to the deviance information criterion (DIC). Case study: continuation of IHGA example. (yes, this looks like a lot to cover in a single day :-), but I've given this course a number of times to more than 600 participants, and almost everybody seems happy with how things go)