Advancement: Generalization of Neural Natural Language Generation Models

Speaker Name: 
Lena Reed
Speaker Title: 
PhD Student (Advisor: Marilyn Walker)
Speaker Organization: 
Computer Science
Start Time: 
Monday, December 3, 2018 - 9:00am
End Time: 
Monday, December 3, 2018 - 11:00am
Location: 
Engineering 2, Room 280
Organizer: 
Marilyn Walker

Abstract:  Traditional statistical NLG systems require substantial hand-engineering: many of the components, such as content planners, sentence planners and surface realizers must all be designed and created manually and updated when a new type of utterance is required. Neural natural language generation (NNLG) models, on the other hand, learn to generate text through processing massive amounts of data in end-to-end encoder-decoder frameworks, where syntactic properties are learned automatically by the model. While the learning components of NNLG models are mostly accomplished automatically, they do require, however, that the training data be collected and labeled, which can often be a laborious process. The advantages of not needing handcrafted templates and syntax-to-semantics dictionaries may thereby be offset by the need to retrain neural models with new data as new domains are added, effectively replacing a knowledge bottleneck with a data bottleneck. To overcome the data bottleneck, we experiment with methods to leverage existing datasets to allow our NNLG models to generalize to novel meaning representations and sentence planning operations. We explore the generation of artificial data and mixing data from different sources as a way to augment existing training data available to the NNLG. Given our different methods for augmentation, we evaluate whether NNLGs can learn stylistic and semantic generalizations. We propose experimenting with two types of generalization. The first form of NNLG generalization we propose to evaluate is generalization of sentence planning operations. Sentence planning is the module of NLGs that affects the way in which individual propositions are combined into sentences, which usually has an effect on the final style of the realization. Our work to date has shown that we can generalize specific NNLG sentence planning operations, such as sentence scoping, and contrastive discourse structuring, beyond what was seen in original training. The second form of generalization we propose to study is the ability to use training data from different sources (restaurant data from different databases) to produce outputs for meaning representations that are different than what was seen in either source alone. We will present several planned experiments directly testing whether different representations and architectures are needed to help models generalize. Characterizing the ability of NNLG models to generalize and thereby expand beyond their original training data will be an important extension of results to date and a significant step in making NNLG models more useful at low data cost.