-----------------------------------------------------------------------
Thanasis Kottas
Applied Math & Statistics 
University of California, Santa Cruz
-----------------------------------------------------------------------

************************************************************************
R program to fit location normal Dirichlet process (DP) mixture model:

      y_i | theta_i,phi  ind.  Normal(y_i|theta_i,phi), i=1,...,n
            theta_i | G  ind   G, i=1,...,n
      G | alpha,mu,tau2   ~    DP(alpha,G_0), G_0=Normal(mu,tau2)
                    phi   ~    IGamma(a.phi,b.phi)

     hyperpriors: alpha ~ Gamma(a.alpha,b.alpha)
                     mu ~ Normal(a.mu,b.mu)
                   tau2 ~ IGamma(a.tau2,b.tau2)                  
************************************************************************

The file "mcmc.r" includes the R code. After reading it in R, with
source("mcmc.r"), the Gibbs sampler can be implemented by calling 
the function 

mcmc(IB,B,y,a.alpha,b.alpha,a.mu,b.mu,a.tau2,b.tau2,a.phi,b.phi)

where,
--> IB is the burn-in period for the Gibbs sampler (say, IB=5000)
--> B is the posterior sample size after burn-in (say, B=10000)
(note that, for this model, a burn-in period as small as, say, 1000
iterations should be sufficient for convergence; also, B between 
2000 and 5000 yields very similar posterior inference with larger 
posterior sample sizes) 
--> y is the data vector (say, y <- scan("sim_data.txt"))
for the data set provided in the file "sim_data.txt", which includes
simulated data (250 observations) from a mixture of three normals,
0.2N(-5,1) + 0.5N(0,1) + 0.3N(3.5,1)
--> a.alpha,b.alpha,a.mu,b.mu,a.tau2,b.tau2 are the parameters of 
the hyperpriors, and a.phi,b.phi are the prior parameters for phi

For example, the following prior specification can be used for the 
simulated data:
a.alpha = 2.0, b.alpha = 1.8 (which yields, approximately, E(n^{*})=6) 
a.mu = 0.0, b.mu = 4.0
a.tau2 = 2.0, b.tau2 = 4.0
a.phi = 2.0, b.phi = 4.0
(which yield infinite prior variance for tau2 and phi, and marginal 
prior moments for y that correspond to rough prior guesses of 0 and 
14 for the "center" and the "range" of the data, respectively,
based on the limiting case of the DP mixture model with one cluster, 
i.e., for alpha --> 0. Note that under the model with one cluster,
marginally, we have E(y) = a.mu, and Var(y) = b.phi + b.tau2 + b.mu.
Therefore, a.mu=0, and the choice b.phi=b.tau2=b.mu=4 arises by 
setting b.phi=b.tau2=b.mu and estimating Var(y) by (range/4)^2.)
Sensitivity of posterior predictive inference to the prior
specification can be studied by considering other prior choices. 

The starting values for the Gibbs sampler are specified in the "mcmc"
function of the R program. Their choice is not that important for this 
model, but they can be changed as needed.

The output from the R code includes 8 files. Files "alpha.txt",
"mu.txt", "tau2.txt", "phi.txt", "n_star.txt" include the posterior
samples for the hyperparameters and for the number of clusters. 
File "post_pred_theta.txt" includes the sample from the posterior
predictive distribution for a ``new'' theta_0, p(theta_0|data), 
associated with a ``new'' observation y_0. For instance, a histogram
of this sample indicates the clustering in the imputed means theta_i.
Finally, files "prior_pred_y.txt" and "post_pred_y.txt" include 
samples from the prior, p(y_0), and posterior, p(y_0|data), predictive 
distribution, respectively, for the ``new'' observation y_0. 
(Recall, that the prior predictive distribution for y_0 is defined as 
the average of the Normal(y_0|theta,phi) distribution over the prior 
for phi and theta, where theta|mu,tau2 ~ G_0=Normal(mu,tau2), with the 
specified priors for mu and tau2).
For instance, a density plot based on the sample in "post_pred_y.txt"
provides the Bayesian density estimate arising from the DP mixture 
model. The corresponding plot based on the sample in
"prior_pred_y.txt" indicates the amount of prior-to-posterior learning 
based on the data. For instance, for the simulated data, 

> y <- scan("sim_data.txt")
> y.0.post <- scan("post_pred_y.txt")
> y.0.prior <- scan("prior_pred_y.txt")
> hist(probability=T,xlim=c(-8,7),ylim=c(0,0.2),main="",xlab="",ylab="",y)
> par(new=T)
> plot(density(y.0.prior),xlim=c(-8,7),ylim=c(0,0.2),main="",ylab="",type="l",lty=5,
+ xlab="p(y_0) dashed line, p(y_0|data) solid line, overlaid on data histogram")
> par(new=T)
> plot(density(y.0.post),xlim=c(-8,7),ylim=c(0,0.2),main="",xlab="",ylab="",type="l",lty=1)

********* NOTE *********

The program in R for the simulated data executes at a rate of about
100 iterations per minute (on a SunBlade 1000 workstation). The
computing time increases with the data sample size. Substantial gains 
in computing speed emerge by using C or Fortran instead.