| Date | Lecture Topic(s) | Due |
|---|---|---|
| Fri 26 Sept 2008 | started late because of fire alarm, administrivia, fundamental dogma. Error rates of replicases (1/125,000 for Taq to 1/2,300,000 for Pfu, 1/300,000--1/500,000 for mitochondrial pol gamma, 1/18-1/38 for pol-eta), reverse transcriptases (1/1,700-1/8,000 for HIV, 1/19,000 for SIV, 1/62,000 for Accuscript RT), and RNA->RNA transcriptase (1/10,000 for flu, 1/3,000 for Qbeta). | intake survey |
| Mon 29 Sept 2008 | DNA bases and structure. Ambiguity codes for DNA. Some talk of RNA processing. | |
| Wed 1 Oct 2008 | protein backbones, primary/secondary/tertiary/quartenary heirarchy, CORN heuristic for handedness, phi/psi/omega torsion angles (but forgot to define how to measure torsion angles!), residues GPFYA | |
| Fri 3 Oct 2008 | alpha helix, anti-parallel and parallel beta sheet. Rest of residues. I was somewhat spacey on the structures of the residues. | |
| Mon 6 Oct 2008 | what makes a good fellowship application, writing a (request for) recommendation letter, stochastic model=probability mass function, i.i.d. models of sequences, need for length modeling (and when it can be skipped), need for log(prob) to avoid floating-point underflow | |
| Wed 8 Oct 2008 | Maximum likelihood estimation. Lagrangian multipliers, deriving MLE for i.i.d. models. first-order Markov model. MLE for first-order model (not derived). Efficient computation of ln(e^x+e^y) for summation in log-prob space. | |
| Fri 10 Oct 2008 | Using stop character in first-order Markov. Geometric length model implied by stop character. Bayes Rule. Pseudocounts. Dirichlet distributions as reason for pseudocounts (informal) add-one prior (all models equally likely) | perl1 |
| Mon 13 Oct 2008 | Went over homework 1, commenting on common problems with perl programming and documentation. train/crosstrain/test, n-fold cross-validation. | |
| Wed 15 Oct 2008 | Detailed example of training and testing a 1st-order Markov chain, using both MLE and MAP estimates (add-one prior). Discussion of "encoding cost" (-log2 P(seq)) and encoding cost per character (-log2 P(seq))/number_of_characters. Higher-order Markov chains. | |
| Fri 17 Oct 2008 | Meet in PSB 305 P-value, E-value, E-value same as P_N for sufficiently small E-value. Gaussian distribution, Gumbel distribution, difference in fatness of tails. Use of log-P for plotting distributions. Warnings about danger of using Z-value for anything but Gaussian. Warnings about how far out we extrapolate when we are getting reasonable E-values. Entropy. | fellowship |
| Mon 20 Oct 2008 | Entropy, relative entropy, information gain. Sequence logos. Examples of sequence logos from multiple sequence alignments (height=conservation) and from secondary structure prediction (height=confidence). Examples taken from http://www.soe.ucsc.edu/~karplus/casp8/T0387/summary.html | |
| Wed 22 Oct 2008 | Contingency tables, joint, marginal, and conditional probabilities. Mutual information (with example). Discussion of use of MI for contact prediction and evaluating similarity of alphabets. Classifiers: TP, FP, FN, TN, sensitivity, accuracy, ... Pointed to Wikipedia page: Receiver operating charateristic for definitions. | |
| Fri 24 Oct 2008 | specificity, positive predictive value, negative predictive value, false discovery rate, Matthews correlation. Argument for true positive rate (=recall=sensitivity) and positive predictive value (=precision) as appropriate measures when postives and negatives are as imbalanced as we usually see. ROC plots, area under ROC curve, reasons for using TP vs. log(FP) instead of ROC plot. Substitution matrices: how the BLOSUM matrices are derived. | perl2 |
| Mon 27 Oct 2008 | Go over homework, explain common problems in Perl. PAM matrices, rate matrices, e^(Rt). | |
| TUES 28 Oct 2008 | 10-12 Perl help session, PSB 313 | |
| Wed 29 Oct 2008 | ||
| Fri 31 Oct 2008 | Bring laptop! Using the Science Library. Christy Hightower. | Darling models |
| Mon 3 Nov 2008 | review of Darling model homework---main problems with expected from non-spacefilling models (mainly CB too close) and from wanting to make H-bonds with the D and K residues of beta hairpin (which are usually solvated). Gave pointer to average-spacing.html, which has approximate average distances between CA atoms and CB atoms. Discussed upcoming perl3 assignment a bit, in response to questions requesting clarification. Discussed global and local alginment, and got out full recurrence (with initial and end conditions) for global alignment with arbitrary gaps. | |
| Wed 5 Nov 2008 | Local alignment with arbitrary gap costs, both recurrence relations and pseudocode. Traceback for arbitrary gap cost. Explained constant/linear/affine gap costs, developed recurrence for linear gap cost. | |
| Fri 7 Nov 2008 | Smith-Waterman alignment (local, affine gap cost) recurrences and example, including traceback | perl3 |
| Mon 10 Nov 2008 | perl tips based on perl3 (including List::Util). Global affine gap cost, recurrences, initial conditions, example, traceback. Contrasted with local alignment. | |
| Wed 12 Nov 2008 | Overview of the contents of the summay page returned by SAM-T08. Introduction to HMMs, using biased coin example. | |
| Fri 14 Nov 2008 | Some discussion of search assignment using VCA0176 as "unknown"; HMMs forward algorithm | web |
| Mon 17 Nov 2008 | Feedback on results of search assignment; some questions about meaning of SAM sequence logos; repeat of forward algorithm to introduce notation in book; Viterbi algorithm; backward algorithm | |
| Wed 19 Nov 2008 | Baum-Welch (EM) training for HMMs, Posterior decoding, something else??? | |
| Fri 21 Nov 2008 | Better than chance: the importance of null models | perl4 |
| Mon 24 Nov 2008 | Review of alignment homework. Short repeat of traceback to explain (again) the classic error in affine-gap traceback. Local alignment to HMMs. Henikoff sequence weighting. Progressive alignment. | |
| Wed 26 Nov 2008 | Tree building: UPGMA and neighbor joining | |
| Fri 28 Nov 2008 | Thanksgiving break. | |
| Mon 1 Dec 2008 | David Bernick: Regulatory RNA in prokaryotes Slides (Kevin leaving for CASP8) | perl5 |
| Wed 3 Dec 2008 | Jim Kent: BLAST, BLAT, and other efficient sequence search methods (Kevin at CASP8) | |
| Fri 5 Dec 2008 | Jim Kent (Kevin at CASP8) | |
| Tuesday 9 Dec 2008 | All redone assignments due | perl7 |
| THURSDAY 11 Dec 2008 8a.m.--11a.m. | final exam slot |
|
|
| BME 205 home page | Karplus's lab page | UCSC Bioinformatics research |
Questions about page content should be directed to
Kevin Karplus
Biomolecular Engineering
University of California, Santa Cruz
Santa Cruz, CA 95064
USA
karplus@soe.ucsc.edu
1-831-459-4250
318 Physical Sciences Building