CMPE 280B Winter 2006 Home Page

Bioinformatics Research Seminar

(Last Update: 07:54 PST 13 March 2006 )

This course is the department seminar series for the Biomolecular Engineering Department.

It is a weekly research seminar that assumes that students have substantial background in biology, chemistry, computer science, or statistics (but not all of them). Because of the varied backgrounds of the audience, the bioinformatics research seminar is probably the most accessible research seminar for undergraduates of all the science and engineering seminars.

Room:
Baskin 330
Time:
2-3:45 Thursdays

The seminar will consist of a mixture of student presentations, outside speakers, and UCSC speakers. The student presentations will be primarily from their own research, though students who wish to take the seminar but have no research of their own to report may present papers from the literature.

Because this is faculty recruiting season and we would like to have faculty recruiting talks in the weekly seminar series, we may have to rearrange the schedule on short notice.

I would like a title and abstract from each presenter at least a week ahead of time to put on this web page.

Evaluation will be based on the student presentation and on attendance and participation at other presentations. (Note: presentation by students is not a requirement—those students who do not present will be evaluated just on attendance.)


Tentative schedule

5 Jan 2006 Administrative details, choosing dates for student speakers

12 Jan 2006
Jim Kent BLAT and BLAST: fast search using approximate matching

19 Jan 2006
Corey Powell. M.S. Thesis presentation An Iterative Bayesian Updating Method for Biological Pathway Prediction
There is a diversity of functional genomics data, such as gene expression data from microarray experiments, phenotypic data from gene deletion experiments, protein-protein interaction data, and data from manually curated databases of gene function. Each experimental method finds certain types of functional relationships between genes and misses others. A biological pathway prediction method that can combine multiple data sources might uncover more functional relationships than a pathway prediction method that depends on a single data source.

This talk presents a biological pathway prediction method that uses an iterative Bayesian updating technique to combine data from multiple sources, represented as undirected weighted graphs, in order to estimate the probability that a gene is part of a given biological pathway.

This method resulted in improved performance over a guilt-by-association approach, which makes pathway membership predictions for a gene based directly on the pathway membership of the neighbors of the gene, for several well characterized biological pathways.

Craig Lowe (15-minute talk—practice for conference)
Origin of an Ultra-conserved Element in the Human Genome

Abstract removed at Craig's request, to avoid problems with journal having strict media embargo

26 Jan 2006 (meet in E2-280, since Baskin 330 reserved)
George Shackelford Current Research in Using Multiple Statistics to Predict Correlated Mutations

CASP (Critical Assessment of Techniques for Protein Structure Prediction) has a category for predicting residue-residue contacts in protein structures. One of the methods for making such predictions involves correlated mutations: the assumption that a residue mutation in a protein is likely to have a nearby (compensating) mutation. This talk will analyze several correlation statistics used in making predictions and discuss the advantage (or disadvantage) of using multiple statistics in a neural network.

2 Feb 2006
Josue Samayoa Utilizing Rosetta and NMR Data in a Homology-based Modeling Approach

Given the rate of novel protein folds deposited into the Protein Data Bank (PDB) and empirical estimates on the number of unique folds in nature, it is generally believed that, other than membrane proteins, most of the naturally occurring protein folds already have a representative of known structure. In addition, the field of structural genomics is attempting to rapidly solve the structures of the remaining orphan protein folds. This fact means that in the future most structure predictions will be made using information derived from a parent structure. Therefore, the development of methods that improve homology-based structure predictions will enable the field of structure prediction to advance further and produce higher quality models.

Given that the alignment problem is such a vital component of the homology- based process, it makes sense to investigate new methods for creating better alignments between query and parent. One strategy is to use limited amounts of experimental data to help.

The focus of this research is to combine experimental nuclear magnetic resonance (NMR) data, in the form of residual dipolar couplings (RDCs), with a computational energy potential (Rosetta) to discriminate quality alignments from an ensemble of potential alignments in a homology-modeling-based strategy. Specifically, we hoped to show a utility for discriminating good alignments from a pool of possible alignments for pairs of remote homologs.

My research shows that RDC data has only a limited utility for selecting an optimal alignment from an ensemble of alignments. RDC data appeared to be slightly helpful at best and potentially devastating at worst. Additional work with NOEs for the protein 1d2b showed that experimental data can be used to select quality alignments from a pool of possible alignments. Therefore, our approach can work with the right source of experimental data. RDCs might still off

9 Feb 2006
Alex Williams Computational Prediction of Gene Targets in Yeast

Determining the specific effect of a drug on an organism is a challenging and time-consuming undertaking. Even for drugs whose physiological effects are well characterized, the exact mechanism by which the drug acts is often unknown. Identifying the specific gene products whose functions are disrupted by a drug of interest can shed light on the molecular pathways affected. We have developed a computational method that uses genome-wide knockouts to predict drug targets. As input, it takes both the chemical sensitivity profiles of a drug and also a network of known yeast synthetic lethal interactions, and uses this information to output a list of genes that are predicted targets. It uses a statistical approach to match the drug's sensitivity profile to each gene's set of synthetic lethal partners. Similarities between a drug's profile and a specific gene knockout's profile may indicate that the drug targets the product of that gene.

To gauge the generality of this approach, we tested our procedure on actinomycin and gliotoxin, two commonly used pharmaceuticals with different physiological effects. In both cases, the method identified novel targets that were supported by follow-up experiments. For actinomycin, a gene involved in Golgi trafficking was found to be a putative target, raising the possibility that this DNA-intercolator has a side effect on protein secretion. Our preliminary results indicate that the approach will be an effective method for finding targets of various drugs with a wide variety of mechanisms of action.

16 Feb 2006
Charlie Vaske Inferring Causal Transcript Networks

Gene networks have been applied to infer gene function and interaction from microarray data, by grouping genes that are expressed in similar contexts. However, by combining all a gene's splice forms into a single entity, they lose important information about the contextual expression of a protein isoform, as well as possible functional variations of protein isoforms. I present an analysis of a splicing microarray that shows this loss of information. In addition, gene networks can only weakly predict interactions, but RNAi and focussed perturbation now allows causal interactions to be inferred. I show the application of such a method on a cancer microarray dataset. Finally I present thesis aims for combining both methods to infer causal transcriptional networks.

23 Feb 2006
Brian Raney Paleogenomics—climbing back up the tree of life

About 100 million years ago the ancestors to modern elephants branched off from the rest of the mammals. Since that time humans have branched off from existing mammals twelve times. The sequencing of existing mammals on the other side of these branch points allows us to infer the genomes of each of the twelve ancestral nodes leading to human. Methods for determining ancestral nucleotides and rearrangement histories are presented.

2 Mar 2006
Colin Dewey (E2-599) Whole-genome alignments and polytopes for comparative genomics

Whole-genome sequencing of many species has presented us with the opportunity to deduce the evolutionary relationships between each and every nucleotide. In this talk, I will present algorithms for this problem, which is that of multiple whole-genome alignment. The sensitivity of whole-genome alignments to parameter values can be ascertained through the use of alignment polytopes, which will be explained. I will also show how whole-genome alignments are used in comparative genomics, including the identification of novel genes, the location of micro-RNA targets, and the elucidation of cis- regulatory element and splicing signal evolution.

Colin Dewey was an undergraduate at the University of California, Berkeley, where he majored in Electrical Engineering and Computer Sciences with an honors breadth area in Molecular Biology. After receiving his B.S. with high honors in 2001, he continued on as a graduate student at Berkeley under the guidance of Lior Pachter. He will receive his Ph.D. in Electrical Engineering and Computer Sciences with a Designated Emphasis in Computational and Genomic Biology in May 2006. Driven by his interests in molecular evolution and algorithm design, Colin has focused his graduate research on the development of algorithms for comparing multiple whole genome sequences. He has participated in the international sequencing projects for the mouse, rat, and chicken genomes and is currently a member of the ENCODE Consortium, which aims to construct a catalog of all functional elements in the human genome. He has also collaborated with scientists at the National Center for Biotechnology Information.

Extra talk Mon 6 March 2006 11 a.m. E2-599
Dean Ho Cytomimicry: Fabrication of Biofunctionalized Devices Through Biotic-Abiotic Interfacing

9 Mar 2006 (E2-599)
Adam Pavliceck Repetitive elements and rearrangements in the human genome

Genomic rearrangements represent an important source of genetic variability in mammals. On one hand, they create evolutionary novelties and can contribute to speciation by formation of reproductive barriers. On the other hand, deletions, duplications, and chromosomal translocations represent an important source of both germline and somatic mutations associated with genetic disorders and cancer. Many such rearrangements are stimulated by repair of DNA breaks via homologous recombination between ubiquitous repetitive elements. Using sequence comparisons and analysis of human mutations I investigated the role of interspersed repeats such as Alu elements in human deletions and duplications. During this seminar, I will focus on three specific questions: (1) what are the specific properties of repeats prone to homologous recombination (2), which molecular pathways of homologous recombinations are involved, and (3) which factors stimulate DNA breaks repaired by homologous recombination between repeats.

Extra talk Mon 13 March 2006 11 a.m. E2-599
Ben Raphael
Rearrangements and Duplications in Tumor Genomes: Towards a Cancer Genome Project

Cancer is a disease driven by mutations in the genome that alter the structure, function, and regulation of genes. These mutations range from single letter changes in the DNA sequence to more drastic rearrangements, gains, or losses of large pieces of DNA. In some types of cancer these large-scale alterations are directly implicated in the pathogenesis of cancer and provide targets for cancer diagnostics and therapeutics.

I will describe computational methods for reconstructing tumor genome architectures and analyzing rearrangements in tumor genomes at high resolution using a technique called End Sequence Profiling (ESP). These methods are inspired by techniques in comparative genomics and view a tumor genome as a rearranged version of the normal human genome. In this framework, both the human and tumor genomes are represented by permutations and the problem is to find a parsimonious sequence of rearrangement operations that transform one permutation into another. I will also describe how computational analysis of ESP data suggests mechanisms that produce complicated patterns of overlapping rearrangement and duplication events that are observed in some tumor genomes.

Another experimental technique called array comparative genomic hybridization (aCGH) has become indispensable in the identification of duplicated and deleted segments of DNA in tumor genomes. ESP provides an effective complement to aCGH, and I will discuss how to combine data from both types of experiments using network flow techniques in order to obtain a comprehensive view of tumor genome architecture. I will demonstrate the application of these methods to ESP and aCGH data from breast cancers. Finally, I will describe the implications of this work for the recently proposed Cancer Genome Atlas, a genome project for cancer.

Extra talk Tues 14 March 2006 2 p.m. Baskin 330
Dietlind Gerloff Protein-Protein Interactions in Modern Biology and Bioinformatics—Appreciating Molecular-Structural Context

Many important regulatory functions in biology involve interactions between several proteins. Interestingly the notion of what is an interaction differs between structural biologists and functional genomics researchers. Finding effective ways in which all various types of post-genomic data can be utilized to yield biological insight can be challenging but it is obviously crucial. My group's research is concerned with facilitating this task. In my talk I will highlight some recent, and some ongoing, work pertaining to this goal.

Part of my talk will place particular emphasis on how structural models of proteins can be helpful for elucidating (physical) protein-protein interactions, e.g. for predicting interacting pairs amongst sets of paralogous proteins (for example: which CDK is likely to interact with which cyclin) using contemporary bioinformatics programs. Our recent structure prediction for the malaria gamete surface protein Pfs230 allows us to generate models for the members of its protein family, and explore hypotheses regarding their interactions in similar ways.

In addition to such applied bioinformatic investigations we are seeking to develop new methods for predicting protein-protein interactions. Promising first results have emerged from looking for differences between sequence-based evolutionary trees and trees depicting similarities between surface profiles, 1-D representations of the electrostatic properties on their molecular surfaces. When applied to Complement Receptor 1 as an example of a multimodular protein, the results obtained through this new approach seem to pin-point modules that are functionally interesting.

As a non-structural example of our research I will introduce YETI (Yeast Exploration Tool Integrator), an uncomplicated JAVA workbench for investigating S.cerevisiae functional genomics data visually (www.yetibio.com).

16 Mar 2006 (E2-599)
Phil Bradley Predicting Protein Structure from Sequence

I will describe recent work in ab initio protein structure prediction, highlighting progress toward high-resolution structure prediction as well as the challenges that remain and current efforts to overcome them.

Extra talk Tues 21 March 2006 2 p.m. E2-599
Jason McDermott Exploration of Biological Systems with the Bioverse

Biological systems are composed of components including proteins, DNA, RNA, and small molecules, with large numbers of direct and indirect relationships between them. The Bioverse, a computational framework for representation, exploration and use of biological data and systems, incorporates sequence, structure and function information for biological components in over 50 organisms. It also predicts relationships between these components, including evolutionary relationships and protein-protein interaction and regulatory networks. Understanding the behavior of these systems on both a local and global scale is essential for the understanding of disease, development and evolution. My research focuses on elucidation of the underlying pattern and organization of the biological world using the burgeoning amount of large-scale genomic and proteomic data and methodologies. An overview of the concepts behind the Bioverse, its implementation and some current applications will be discussed.


Past end of quarter:

23 Mar 2006 (E2-599)
Dannie Durand title??

6 April 2006 (E2-599)
Alexander Zambon title??


slug icon to go to Scool of Engineering home page
SoE home
sketch of Kevin Karplus by Abe
Kevin Karplus's home page
BME-slug-icon
Biomolecular Engineering Department
Karplus's lab page UCSC Bioinformatics research

Questions about page content should be directed to

Kevin Karplus
Biomolecular Engineering
University of California, Santa Cruz
Santa Cruz, CA 95064
USA
karplus@soe.ucsc.edu
1-831-459-4250
318 Physical Sciences Building
Locations of visitors to pages with this footer (started 3 Nov 2008)