Human-Inspired Structured Prediction for Language and Biology

Speaker Name: 
Liang Huang
Speaker Title: 
Principal Scientist; Assistant Professor
Speaker Organization: 
Baidu Research USA; Oregon State University
Start Time: 
Wednesday, February 6, 2019 - 11:00am
End Time: 
Wednesday, February 6, 2019 - 12:15pm
Location: 
E2-599
Organizer: 
Luca de Alfaro

Abstract:

Human sentence processing is well-known to be incremental and linear-time: you never wait until the end of a sentence to start parsing it. This is in sharp contrast with computer processing of natural language, which often needs a full sentence as input (thus not incremental) and is slow in speed (superlinear time). Can we teach computers to understand, generate, and translate human languages in a way similar to what we do everyday? This talk presents two success stories of human-inspired incremental algorithms in NLP. First, we showcase our recent breakthrough in the extremely challenging task of simultaneous translation, where the translation happens concurrently with the source language speech. Inspired by human interpreters, I invent a prefix-to-prefix framework tailored to this problem that naturally enables anticipation and controllable latency. This result has been covered by numerous news media (https://simultrans-demo.github.io/). Second, inspired by psycholinguistics and compiler theory, I design the first linear-time dynamic programming algorithm for incremental parsing that searches over exponentially many candidates in linear time, mimicking local ambiguity packing in psycholinguistics, and generalizing linear-time parsing from (unambiguous) programming languages to (ambiguous) human languages.

 

More interestingly, incrementality (and linear-time) is also ubiquitous in nature, where biological sequences such as proteins and RNAs incrementally (and instantly) fold while being generated. Thanks to a deep connection between natural language syntax and biological structures (both are modeled by context-free grammars), the same linear-time dynamic programming algorithm I developed for natural language parsing can be easily adapted to RNA structure prediction, achieving the very first linear-time prediction algorithm. This results in orders of magnitude faster prediction (and even slightly higher accuracy) than the standard cubic-time non-incremental algorithm (which also came from NLP). Our algorithm is being used by the Stanford Medical School to speed up RNA design to detect diseases.

Bio:

Liang Huang is Principal Scientist at Baidu Research USA and Assistant Professor at Oregon State University. He received his PhD from the University of Pennsylvania in 2008 (under the late Aravind Joshi) and BS from Shanghai Jiao Tong University in 2003. He has been a research scientist at Google, a research assistant professor at USC/ISI, an assistant professor at CUNY, a part-time research scientist at IBM, and an assistant professor at Oregon State University. He is a leading expert in natural language processing (NLP), where he is best known for his work on fast algorithms and provable theory in parsing, machine translation, and structured prediction. He received a (single-authored) Best Paper Award at ACL 2008, a Best Paper Honorable Mention at EMNLP 2016, several best paper nominations (ACL 2007, EMNLP 2008, ACL 2010, and SIGMOD 2018), two Google Faculty Research Awards (2010 and 2013), a Yahoo! Faculty Research Award (2015), and a University Teaching Prize at Penn (2005). The NLP group he runs at Oregon State University ranks 15th on csrankings.org. He also enjoys teaching algorithms and co-authored a best-selling textbook in China on algorithms for programming contests. As a professor, he is most proud of the four (4) PhD students he has graduated, and his MS-level Algorithms class which became the most popular graduate CS course everywhere he taught.