Sharing Knowledge For Large Scale Visual Recognition

Speaker Name: 
Lamberto Ballan
Speaker Title: 
Postdoctoral Researcher / Artificial Intelligence Lab
Speaker Organization: 
Stanford University
Start Time: 
Wednesday, September 28, 2016 - 2:00pm
End Time: 
Wednesday, September 28, 2016 - 3:00pm
Location: 
E2 475
Organizer: 
Roberto Manduchi

Abstract

In this talk I'll present two models for "sharing" prior and contextual knowledge for solving large scale visual recognition problems. In the first part of the talk, I'll show that images that are very difficult to recognize on their own may become more clear in the context of a neighborhood of related images with similar social-network metadata. We build on this intuition to improve multi-label image annotation and show state-of-the-art results on the NUS-WIDE dataset. Our model uses image metadata nonparametrically to generate neighborhoods of related images, then uses a deep neural network to blend visual information from the image
and its neighbors. In the second part of the talk, I'll present our recent work on knowledge transfer for scene-specific motion prediction. When given a single frame of a video, humans can not only interpret the content of the scene, but also they are able to forecast the near future. This ability is mostly driven by their rich prior knowledge about the visual world, both in terms of the dynamics of moving agents, as well as the semantic of the scene. We exploit the interplay between these two key elements to predict scene-specific motion patterns on a novel large dataset collected from UAV on the Stanford campus.

Bio

Lamberto Ballan is a senior postdoctoral researcher at Stanford University working in the Artificial Intelligence Lab, supported by a prestigious Marie Curie Fellowship from the European Commission. He received the Laurea and Ph.D. degrees in computer engineering in 2006 and 2011, both from the University of Florence, Italy. He was also a visiting scholar at the Signal and Image Processing department at Telecom Paristech, Paris, in 2010. His research interests lie at the boundary of computer vision and multimedia, specifically focused on exploiting big data for visual recognition problems. The primary aim of his current research is on designing learning algorithms that make the most effective use of prior and contextual knowledge in presence of sparse and noisy labels.