Abstract:

Humans learn to perceive the world through multiple modalities

including visual, auditory, and kinesthetic stimuli. The need for

perception is self-evident while humans invented language

for communication and documentation. Therefore, language and perception

lay foundations for artificial intelligence, and how to ground natural

language onto real-world perception is a fundamental challenge to

empower various practical applications that require human-machine

communication.



In this talk, I will mainly present two of my research thrusts on

developing intelligent embodied agents that connect language, vision,

and actions, and that communicate with humans in the real world. First,

moving beyond natural language understanding from text-only corpora, I

have situated natural language inside interactive environments where

communication often takes place (language—>vision). So I will discuss

how to effectively ground natural language instructions and visual

inputs to actions in real-world navigation tasks with reinforcement

learning and imitation learning. Second, in order to enable an agent to

describe the visual surroundings for humans (vision—>language), I will

explore challenges of language generation conditioned on visual context,

and present novel solutions from coarse-grained to fine-grained caption

generation, and then to humanlike story generation. In the end, I will

conclude with my future research plan.

Bio:

Xin Wang is a Ph.D. candidate at the University of California, Santa

Barbara. His research interests include natural language processing,

computer vision, and machine learning, especially the intersection of

them. He works on fundamental research directions that enable

intelligent embodied agents to communicate with humans in the real

world. He published over 17 papers (including 7 oral presentations) at

top-tier CV, NLP, and ML venues such as CVPR, ICCV, ECCV, ACL, NAACL,

EMNLP, AAAI, TPAMI. He received the CVPR Best Student Paper Award in

2019. Xin is also professionally active and have organized multiple

academic events on the topic of his research, including workshops at ACL

2020, CVPR 2020, and ICCV 2019, and a tutorial at AACL-IJCNLP 2020. He

also served as a session chair for the NLP session at AAAI 2019. He

worked at Google AI, Facebook AI Research, Microsoft Research (Redmond),

and Adobe Research.

As a gentle reminder, please respect the privacy of faculty recruitment by not sharing the candidate status of our guests with others outside of our organization