The room the Registrar assigned us was unsuitable, but we have a new room: Baskin 330.
The seminar will consist mainly of student presentations, primarily from their own research, though students who wish to take the seminar but have no research of their own to report may present papers from the literature. When outside speakers are available, they will present instead of students.
I would like a title and abstract from each presenter at least a week ahead of time to put on this web page.
Evaluation will be based on the student presentation and on attendance and participation at other students' presentations.
Development of lab-on-a-chip devices for cellular biology is focused on increasing the throughput and quality of physiological data at the single cell level. The main thrust is the production of microfabricated elements that can manipulate single cells and perform physiological measurements using both electrical and optical data collection coupled with automated data processing. This talk will present the collection of ion channel functionality data in response to pharmacological agents, fast reagent application to the extracellular and intracellular space, and automated morphometric analysis of cell behavior in controlled microenvironments. These microfluidic elements can be easily integrated into platforms capable of performing complex cell-based experiments in an automated fashion, and will lead to order of magnitude increases in throughput over current laboratory practice.
Mass spectrometry (MS) has emerged as the fastest and most sensitive technique for analyzing protein samples. Typically, a protein sample (from a tissue, organelle, bodily fluid, etc.) is digested into a complex mixture of peptides, approximately 10 to 30 residues long, and peptide identifications are combined to achieve protein identification (and very rough quantitation). Peptide identification can be inferred from tandem spectra with either "database search" or "de novo" methods, or some hybrid of the two. Database search can make use of lower-quality spectra, but de novo offers greater flexibility and robustness to protein modifications and mutations. In this talk, I will show how to use graph algorithms (most notably the eigenvectors of a matrix associated with the graph) to achieve unprecedented accuracy in de novo sequencing.
Automatic Identification and Classification of Protein Domains (or aiming for an automatic Pfam)
A very large number of protein sequences are already known, however, our knowledge about higher properties of proteins, such as their structure and function is scarce. A large scale classification of all proteins into families can help bridging this gap by facilitating homology modeling - e.g. the inference of a proteins function from the functions of other proteins of the same family, and by identifying unknown families of proteins as targets for future research.
Proteins are typically composed of several domains - (semi) autonomous functional subunits, that are shuffled in a mix and match evolutionary process generating new proteins.
I will present EVEREST, a process we have developed for the identification and classification of protein domains in a comprehensive database of protein sequences. EVEREST combines methodologies of sequence similarity identification, graph based clustering, machine learning, statistical modeling and iterative refinement. We achieve state of the art results, recovering 63% of the known domain families and suggesting new families with about 40% fidelity.
This is joint work with Michal Linial and Nati Linial.
How much do our genes determine our sexual identity? I will review the impact of genes on sexuality, discuss and dispel common myths, and address key questions about how genetic information impacts our lives today. Now that we understand so much more about our DNA, we have to be careful about how this knowledge is used. As scientists, engineers, and members of the human community, we need to ensure that genetic information is used for purposes that benefit all people equally, without discrimination, and that it isn't used to cause harm.
I will describe the Biozon system (biozon.org) which is a knowledge resource of heterogeneous biological data. Informally, Biozon can be described as Amazon and Google, combined together and applied to the diverse biological knowledge domain.
Biozon merges the holdings of more than a dozen molecular biology collections, including SwissProt, KEGG, PDB, BIND, and others, and augments this data with novel in-house derived data such as sequence or structure similarity, predicted interactions, and predicted domains. Currently, Biozon holds more than 90 million biological documents and 2.5 billion relations between them.
Biozon allows complex searches on the data graph that specify desired interrelationships between types (for example, 3D structures for all proteins that interact with the protein BRCA1). Moreover, Biozon has a fuzzy searches engine that extends complex searches to include homologous sequences or structures as a search step, or even genes with similar expression profiles. One can search, for example, for all proteins that are known to take part in a specific pathway or proteins with similar expression profiles (associated with the corresponding mRNA sequences) to these proteins. Biozon also integrates first-of-a-kind biological ranking system which resembles the methods implemented in Google.
If time permits I will also talk about other research projects in my lab, such as pathway prediction, domain-based protein hierarchy and detection of semantically significant domain architectures.
In the human, mouse, and rat genomes, there are 481 regions of at least 200 base pairs that are completely conserved (ultra-conserved elements). These ultra-conserved elements are dispersed along the genomes and are found on all chromosomes except 21 and Y in human, yet are often found in clusters. It is believed many of the ultra-conserved elements function as transcriptional enhancers for key developmental genes or genes encoding transcription factors. In this presentation I will discuss my computational efforts to characterize the clusters of ultra-conserved elements as possible enhancers and identify potential targets for their action. I will also discuss experimental methods which may be employed to verify enhancer activity. This work represents part of my current rotation project in the Haussler lab.
David Ng (short presentation)
Investigation of Enhancements to tRNAscan-SE
tRNAscan-SE is a program that identifies tRNA genes in genomic sequences. It has high sensitivity (identifying 99-100% of the tRNA genes), high selectivity (less than one false positive per 15 gigabases), and good speed. Fast "prefilters" are used to quickly identify candidate tRNA genes, and a highly selective covariance model is used to screen the candidate genes. In this talk I will describe the architecture of tRNAscan-SE, discuss approaches to improving the performance of the program, and focus on my investigation of a program "Aragorn" for possible use as an additional prefilter. This was my Fall 2004 rotation project in the Lowe Lab.
The C-terminal domain of the agouti-related protein is 45 residues and contains five disulfide bonds. This domain forms a structural motif known as a cysteine knot and is responsible for antagonizing many of the melanocortin receptors. The C-terminal domain has been shortened to 34 residues and four disulfide bonds, while keeping the same structure and specificity for receptors. The domain is now being simplified further by removing all cysteine residues, with the intent of keeping the same functional signature and three-dimensional structure. The protein design is being done computationally and then will be verified experimentally. This presentation will cover the first half of my current rotation in the Millhauser Lab.
The current pace of discovery in the biological sciences demands increasingly sophisticated mathematical tools for cataloging, sharing, modeling, and ultimately—understanding biological systems. However, construction of mathematical models can be tedious, error prone and require a broad range of expertise. As a result it is still a practice largely limited to experts, and most biological models are currently written in a monolithic, unsharable form. For modeling to become a part of mainstream biology, it is important to develop tools that allow models to be shared and reused.
Little b (b) is a programming language that enables the construction of models from reusable fragments of knowledge. This approach is inspired by pioneer work in the field of qualitative physics (QP), a branch of artificial intelligence. Whereas the goal of QP was to provide human-like qualitative reasoning about physical situations, b is intended for construction of precise mathematical models which may be used to discover potentially non-obvious properties of systems.
In b, a mathematical model is formulated by describing a physical situation in terms of shared objects, relations, quantitative data, as well as choices of theoretical approaches. The language reasons about an initial description of a situation, inferring the presence of new objects and relations (e.g., complexes, species, reactions, equations). The inference procedure together with type checking and symbolic mathematics make it easier to write concise and correct models based on shared knowledge. The resulting mathematics may be translated to a form suitable for analysis, such as simulation, null-cline analysis, flux balance, metabolic control analysis, etc. I'll talk about the current state of the work with b, as well as aspects of computer science that I think will play an important role in mathematical systems biology.
|
|
| Karplus's lab page | UCSC Bioinformatics research |
Questions about page content should be directed to
Kevin Karplus
Biomolecular Engineering
University of California, Santa Cruz
Santa Cruz, CA 95064
USA
karplus@soe.ucsc.edu
1-831-459-4250
318 Physical Sciences Building