Genome Assembly: From Short To Long Reads

Speaker Name: 
Pavel A. Pevzner
Speaker Title: 
Department of Computer Science and Engineering, University of California at San Diego
Start Time: 
Thursday, February 27, 2020 - 12:00pm
End Time: 
Thursday, February 27, 2020 - 1:15pm
Location: 
E2-180

Abstract

Long-read assemblies improved over the short-read assemblies because of their greater ability to disambiguate genomic repeats. However, most algorithms for assembling long reads construct contiguous genomic segments (contigs) but do not provide accurate repeat characterization (repeat graph) necessary for producing optimal assemblies. We present the Flye algorithm (Kolmogorov et al., Nature Biotech 2019) that does not attempt to construct contigs at the initial assembly stage but instead generates arbitrary paths (disjointigs) in the unknown repeat graph and constructs a repeat graph from these error-riddled disjointigs. Counter-intuitively, this seemingly reckless approach results in an accurate repeat graph and improves on the state-of-the-art long-read assemblers. We further describe the development of the Flye assembly toolkit that includes metaFlye (metagenome assembly), centroFlye (centromere assembly), and mosaicFlye (assembly of segmental duplications).

This is a joint work with Mikhail Kolmogorov, Anton Bamkevich, Andrey Bzikadze, and Jeffrey Yuan.

Biography

Pavel Pevzner is the Ronald R. Taylor Chair and Distinguished Professor of Computer Science and Engineering at University of California, San Diego where he directs the NIH Center for Computational Mass Spectrometry. He holds a Ph.D. from Moscow Institute of Physics and Technology in Russia.

He was named Howard Hughes Medical Institute Professor in 2006. He was elected the Association for Computing Machinery Fellow (2010) for "contribution to algorithms for genome rearrangements, DNA sequencing, and proteomics”, International Society for Computational Biology Fellow (2012), European Academy of Sciences (2016), and American Association for Advancement in Sciences (2018). He was awarded a Honoris Causa (2011) from Simon Fraser University in Vancouver, was a recipient of the Senior Scientist Award from the International Society for Computational Biology (2017), and the Kanellakis Theory and Practice Award from the Association for Computing Machinery (2019). Dr. Pevzner co-authored textbooks "Computational Molecular Biology: An Algorithmic Approach", "Introduction to Bioinformatics Algorithms", “Bioinformatics Algorithms: an Active Learning Approach”, and Learning Algorithms Through Programming and Puzzle Solving  (2019). In 2015, jointly with Phillip Compeau, he developed a Bioinformatics bioinformatics specialization on Coursera. In 2016, he co-developed a Data Structures and Algorithms specialization on Coursera and MicroMaster Program at EdX.

Learn more at https://bioalgorithms.ucsd.edu/.

Event Type: 
Event