Humans learn to perceive the world through multiple modalities
including visual, auditory, and kinesthetic stimuli. The need for
perception is self-evident while humans invented language
for communication and documentation. Therefore, language and perception
lay foundations for artificial intelligence, and how to ground natural
language onto real-world perception is a fundamental challenge to
empower various practical applications that require human-machine
communication.
In this talk, I will mainly present two of my research thrusts on
developing intelligent embodied agents that connect language, vision,
and actions, and that communicate with humans in the real world. First,
moving beyond natural language understanding from text-only corpora, I
have situated natural language inside interactive environments where
communication often takes place (language—>vision). So I will discuss
how to effectively ground natural language instructions and visual
inputs to actions in real-world navigation tasks with reinforcement
learning and imitation learning. Second, in order to enable an agent to
describe the visual surroundings for humans (vision—>language), I will
explore challenges of language generation conditioned on visual context,
and present novel solutions from coarse-grained to fine-grained caption
generation, and then to humanlike story generation. In the end, I will
conclude with my future research plan.
Xin Wang is a Ph.D. candidate at the University of California, Santa
Barbara. His research interests include natural language processing,
computer vision, and machine learning, especially the intersection of
them. He works on fundamental research directions that enable
intelligent embodied agents to communicate with humans in the real
world. He published over 17 papers (including 7 oral presentations) at
top-tier CV, NLP, and ML venues such as CVPR, ICCV, ECCV, ACL, NAACL,
EMNLP, AAAI, TPAMI. He received the CVPR Best Student Paper Award in
2019. Xin is also professionally active and have organized multiple
academic events on the topic of his research, including workshops at ACL
2020, CVPR 2020, and ICCV 2019, and a tutorial at AACL-IJCNLP 2020. He
also served as a session chair for the NLP session at AAAI 2019. He
worked at Google AI, Facebook AI Research, Microsoft Research (Redmond),
and Adobe Research.
