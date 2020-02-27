Abstract:

Genome analysis with next-generation short-read sequencing technology is limited to the unambiguous portions of the genome and is unable to resolve complex variants. The third-generation long-read sequencing technology like the Oxford Nanopore is portable, affordable, and can provide a better resolution to the complex areas of the genome. Recent advances in the nanopore-based assembly tools demonstrated the ability to assemble human scale genomes efficiently. Genome analysis using nanopore-based genome assemblies remains challenging due to the high error rate. For my thesis, I propose to develop a pipeline based on deep neural networks, designed to polish erroneous diploid human genome assemblies and further identify genomic variations. I have developed a genome assembly polisher based on recurrent neural network, HELEN, that can improve the base qualities of a haploid genome assemblies. I have also developed a variant caller called FRIDAY that uses a deep neural network to identify true variants. First, I intend to improve HELEN to be a haplotype aware error-correction method so it can provide candidate variants to perform accurate variant calling. Next, I will utilize FRIDAY to classify which candidates are correct variations in the genome. Finally, I aim to design an encoder based on a deep neural network that will enable the embedding of current signals of the Oxford Nanopore sequencing machine as features for downstream analysis tools.