Application and Analysis of Microarrays
BME 210, Winter 2004

Lab 3: Finding Archae genes responsive to stress


 

 

Objective

 

The goal of this lab is to find out which Pyrobaculum aerophilum (Pae) genes are significantly up- or down-regulated after exposure to stressfully cold conditions (stressfully cold that is, for a heat-adapted extremophile!).

 

I. Get the data

 

Download the 5 Pae cold-shock hybridization results (Pae-A007.gpr,  Pae-A026.gpr,  Pae-A027.gpr, Pae-A028.gpr,  Pae-A029.gpr) and the two reference versus reference hybridization results (Pae-A012.gpr and Pae-A013.gpr).  Download all 7 of the files into a new directory called “cold”.  Put the Pae-printA.gal and Pae-arrays.txt files into the cold directory as well.  Finally, make sure all of the GenePix result files end with a *.gpr extension.  If any file ends in something else (e.g. sometimes they’re saved with a *.gpr.txt extension by Internet Explorer), change the file names to end with .gpr.  Note that when you download the following scripts they should end in .R (not .R.txt for example).

 

Within R, change your working directory to the cold directory.  Run the load-pae.R script to load all of the GenePix result files in the cold directory into a single microarray object called pae.raw.  To run the script type source(“load-pae.R”) in R.  Check that you now have the object pae.raw loaded into your workspace.

 

II. Visualize and normalize

 

As a refresher, let’s make the standard log-ratio versus intensity scatterplots (“MA-plots”) for each of the 7 hybridizations.

 

a. Make a MA-plot of each hybridization result.  Color the genes with log-ratios in the top and bottom 5% a different color so you can see where they are on the MA-plot.  Try it on your own first and then, if you need help, download and run the maplot-pae.R script.

 

b. See if you can identify common trends in the data, indicating the need for normalization.  Normalize each array using print-tip loess.  Again, try to do this yourself first and then, if you need help, download and walk through the norm-pae.R script.  Make the same MA-plots you made in part a, but this time use the normalized data.

 

c. Again, make a MA-plot of the raw (non-normalized) log-ratios as in part a but this time color those genes whose normalized log-ratios are in the top or bottom 5%.  Explain what you see.

 

d. Make a MA-plot of the normalized log-ratios and color genes whose raw (non-normalized) log-ratios are in the top or bottom 5%.  Explain what you see.

 

e. Make a heat map of the entire cold-shock expression matrix (Hint: type ?image).  Missing data should appear as blanks in the matrix.

 

f. To make subsequent analysis easier, remove genes containing any missing data in any of the 7 hybridizations.   How many genes did you eliminate with this filter?  Make a heat map of the resulting data to verify that you eliminated the right genes (all of the blank entries should be gone now).

 

Here’s Josh’s script for making a heat-map, heatmap-pae.R, in case you need it.

 

III. Identify cold-shock responsive genes

 

It is now the moment of truth!  We will next identify genes that are candidates for involvement in the Archeal cold-shock response.  We want to find which genes have log-ratios consistently higher in the cold-shock hybridizations relative to the reference hybridizations.  For each gene, we’ve observed 5 log-ratios under cold-shock and 2 log-ratios under the control conditions.  We want to test whether the mean level in the 5 cold-shock conditions is different than the mean level in the 2 control conditions.

 

a. Use the t-test.  Follow along with Josh to perform a t-test that will identify genes up- and down-regulated with respect to the cold-shock condition.  We will be using the R function t.test() to do the computation.  Write each step of the procedure into a text file called ttest-cold.R using your favorite editor.  You will need this file later so make sure you save it.  Print out a subset of the normalized expression matrix that shows the normalized log-ratios of the top 10 and bottom 10 candidates across the 7 conditions.  Print it out in such a way so that the reference conditions are on the left and the cold-shock conditions are on the right.

 

If you need a working t-test routine, download Josh’s ttest-cold.R script.

 

My top 10 genes were: PAE1557 PAE3383 PAE.i277 EMPTY YBR244W PAE1279 PAE2028 PAE0604 PAE0991 PAE1199

 

My bottom 10 genes were: PAE.i317 PAE2282 EMPTY EMPTY PAE1695 PAE0524 PAE3003 PAE1293 PAE.i280 PAE2549

 

You plot of the expression profiles of these genes should look like my plot.

 

b. Use a more conservative (non-parametric) test.  The t-test assumes the log-ratios observed within each group are normally distributed.  This is often not the case with microarray data.  However, we still would like to know which genes have significantly higher (or lower) log-ratios under cold-shock relative to the control conditions even if the log-ratios are not normally distributed.  Non-parametric tests allow us to compute significance levels and confidence intervals without assuming a particular distribution.  The Wilcoxon Rank Sum test is the non-parametric version of the t-test.  It will give us a more conservative estimate of whether the mean log-ratio of the cold-shock measurements for a particular gene are more extreme relative to the gene’s values in the reference conditions.

 

Perform a Wilcoxin Rank Sum test (Hint: type ?wilcox.test for help).  Copy your ttest-cold.R file to another file called wilcox-cold.R.  Modify the wilcox-cold.R file to perform the Wilcoxin Rank-Sum test on the cold-shock dataset (Hint: you should have to change very little!).  Print out the top 10 scoring genes (and the bottom 10 scoring genes) along with their expression across the 7 hybridizations and make a heat-map as we did in part a.  Email Josh (jstuart@soe.ucsc.edu) print outs of each.

 

Make a scatterplot in which t-test P-values are plotted against the Wilcoxin P-values (Hint: type ?plot).  What’s reassuring about the plot?  What seems to be funny about the plot?  As a follow-up, plot the actual P-values of the top candidates.  What do you think is going on?  Given this finding, do you think it’s appropriate to perform a Wilcoxin test in this case?

 

You’re done!

 

Comments to Josh Stuart