##### R code for Bayesian analysis in the length-of-stay (LOS) case study

##### the outcome variable y = length of hospital stay (rounded to
##### the nearest integer) for a random sample of n = 14 women 
##### who came to a hospital in Santa Monica, CA, in 1988 to
##### give birth to premature babies (0 is a possible outcome of
##### the woman left the hospital, presumably without delivering,
##### after 4 hours or less)

y.los <- c( 1, 2, 1, 1, 1, 2, 2, 4, 3, 6, 2, 1, 3, 0 )

table( y.los )

# 0 1 2 3 4 6 
# 1 5 4 2 1 1 

stem( y.los, scale = 2 )

# The decimal point is at the |

#   0 | 0
#   1 | 00000
#   2 | 0000
#   3 | 00
#   4 | 0
#   5 | 
#   6 | 0

mean( y.los )

# 2.071429

sd( y.los )

# 1.54244

var( y.los )

# 2.37912

var( y.los ) / mean( y.los )

# 1.148541

##### y.los is a non-negative integer-valued quantity whose
##### variance-to-mean ratio is around 1; this suggest a
##### Poisson model for the sampling distribution as a first model

##### recall that the MLE of the population mean lambda
##### in the Poisson model is the sample mean

lambda.hat <- mean( y.los )

dpois( 0:7, lambda.hat )

# 0.126005645 0.261011693 0.270333539 0.186658872 0.096662630 
# 0.040045947 0.013825386 0.004091186

print( n.los <- length( y.los ) )

# 14

table( y.los ) / n.los

#          0          1          2          3          4          6 
# 0.07142857 0.35714286 0.28571429 0.14285714 0.07142857 0.07142857 

cbind( c( dpois( 0:6, lambda.hat ), 
  1 - sum( dpois( 0:6, lambda.hat ) ) ),
  apply( outer( y.los, 0:7, '==' ), 2, sum ) / n.los )

# 0.126005645 0.07142857
# 0.261011693 0.35714286
# 0.270333539 0.28571429
# 0.186658872 0.14285714
# 0.096662630 0.07142857
# 0.040045947 0.00000000
# 0.013825386 0.07142857
# 0.005456286 0.00000000

##### the Poisson seems to be a reasonably good fit to the data

##### with an IID sample of size n from the Poisson distribution,
##### the likelihood function is of the form 
##### c * lambda^s * exp( - n * lambda ), where s is the sum of the
##### y values

##### if there were a conjugate prior for this likelihood, it would
##### also have to be of the form c * lambda^( non-negative power ) * 
##### exp[ - ( non-negative number ) * lambda ] 

##### fortunately, there is such a distribution; it's is the Gamma family,
##### which is usually written 

#####   c * lambda^( alpha - 1 ) * exp( - beta * lambda )

##### for positive alpha and beta

##### let's explore this family a bit as a function of alpha and beta

help( dgamma )

##### The Gamma Distribution

##### Description

##### Density, distribution function, quantile function and 
##### random generation for the Gamma distribution with parameters shape and scale.

##### Usage

##### dgamma(x, shape, rate = 1, scale = 1/rate, log = FALSE)
##### pgamma(q, shape, rate = 1, scale = 1/rate, lower.tail = TRUE, log.p = FALSE)
##### qgamma(p, shape, rate = 1, scale = 1/rate, lower.tail = TRUE, log.p = FALSE)
##### rgamma(n, shape, rate = 1, scale = 1/rate)

##### Arguments

##### x, q	
##### vector of quantiles.

##### p	
##### vector of probabilities.

##### n	
##### number of observations. If length(n) > 1, the length is taken to be 
##### the number required.

##### rate	
##### an alternative way to specify the scale.

##### shape, scale	
##### shape and scale parameters. Must be positive, scale strictly.

##### log, log.p	
##### logical; if TRUE, probabilities/densities p are returned as log(p).

##### lower.tail	
##### logical; if TRUE (default), probabilities are P[X = x], otherwise, P[X > x].

##### Details

##### If scale is omitted, it assumes the default value of 1.

##### The Gamma distribution with parameters shape = a and scale = s has density

##### f( x ) = 1 / ( s^a Gamma( a ) ) x^( a - 1 ) e^-( x / s )

##### unfortunately this is not the same way I wrote the gamma density 
##### in the lecture notes: I used the alternative parameterization

##### f( x ) = 1 / ( s^alpha Gamma( alpha ) ) x^( alpha - 1 ) e^-( beta * x )

##### so the relationship between the short course notation and the notation
##### in R is

#####   ( alpha, beta ) = ( a, 1 / s )

##### but notice that the people who wrote R have anticipated this:
##### they've made the definitions

#####   s = scale, rate = 1 / scale

##### so, for example, when we want what in the short course is called the
##### Gamma( alpha = 1, beta = 2 ) distribution, we can get it in R with

#####   dgamma( lambda.grid, shape = 1, rate = 2 )

lambda.grid <- seq( 0, 14, length = 500 )

plot( lambda.grid, dgamma( lambda.grid, shape = 1, rate = 1 ),
  type = 'l', xlab = 'lambda', ylab = 'Density' )

lines( lambda.grid, dgamma( lambda.grid, shape = 2, rate = 1 ), lty = 2 )

lines( lambda.grid, dgamma( lambda.grid, shape = 3, rate = 1 ), lty = 3 )

lines( lambda.grid, dgamma( lambda.grid, shape = 6, rate = 1 ), lty = 4 )

text( 3, 0.8, '( alpha, beta ) = ( 1, 1 )' )

text( 4.25, 0.35, '( alpha, beta ) = ( 2, 1 )' )

text( 6, 0.225, '( alpha, beta ) = ( 3, 1 )' )

text( 11.5, 0.1, '( alpha, beta ) = ( 6, 1 )' )

##### alpha controls the shape of the gamma distribution

lambda.grid <- seq( 0, 7, length = 500 )

plot( lambda.grid, dgamma( lambda.grid, shape = 2, rate = 1 ),
  type = 'l', xlab = 'lambda', ylab = 'Density',
  ylim = c( 0, 1.1 ) )

lines( lambda.grid, dgamma( lambda.grid, shape = 2, rate = 2 ), lty = 2 )

lines( lambda.grid, dgamma( lambda.grid, shape = 2, rate = 3 ), lty = 3 )

text( 4.5, 0.2, '( alpha, beta ) = ( 2, 1 )' )

text( 2.5, 0.5, '( alpha, beta ) = ( 2, 2 )' )

text( 2, 0.9, '( alpha, beta ) = ( 2, 3 )' )

##### 1 / beta controls the scale of the gamma distribution

##### prior-likelihood-posterior plot with the LOS data
##### and the gamma( epsilon, epsilon ) prior with epsilon = 0.001

lambda.grid <- seq( 0.001, 4, length = 500 )

##### likelihood first

plot( lambda.grid, dgamma( lambda.grid, shape = 30, rate = 14 ),
  type = 'l', xlab = 'lambda', ylab = 'Density', col = 'blue' )

##### posterior next

lines( lambda.grid, dgamma( lambda.grid, shape = 29.001, rate = 14.001 ),
  lty = 2, col = 'black' )

##### prior last

lines( lambda.grid, dgamma( lambda.grid, shape = 0.001, rate = 0.001 ),
  lty = 3, col = 'red' )

text( 0.5, 0.1, 'prior', col = 'red' )

text( 1.25, 0.9, 'posterior', col = 'black' )

text( 2.75, 0.8, 'likelihood', col = 'blue' )

##### prior-likelihood-posterior plot with the LOS data
##### and the gamma( epsilon * y.bar, epsilon ) prior,
##### epsilon = 0.001

lambda.grid <- seq( 0.001, 4, length = 500 )

##### likelihood first

plot( lambda.grid, dgamma( lambda.grid, shape = 30, rate = 14 ),
  type = 'l', xlab = 'lambda', ylab = 'Density', col = 'blue' )

##### posterior next

epsilon <- 0.001

y.bar <- 29 / 14

lines( lambda.grid, dgamma( lambda.grid, 
  shape = 29 + epsilon * y.bar, rate = 14 + epsilon ),
  lty = 2, col = 'black' )

##### prior last

lines( lambda.grid, dgamma( lambda.grid, 
  shape = epsilon * y.bar, rate = epsilon ), lty = 3, col = 'red' )

text( 0.5, 0.1, 'prior', col = 'red' )

text( 1.25, 0.9, 'posterior', col = 'black' )

text( 2.75, 0.8, 'likelihood', col = 'blue' )

##### virtually identical to the results from the previous prior