Research Topics
Visual tracking, Stereo
and Sparse IBR, Facial Modeling and
Analysis, Image and Video Processing
Visual Tracking
Object
Tracking with Dynamic Feature Graphs
A good object representation is a description of object/objects
by high level features from all perspectives, both spatial and temporal.
Extensive representations have been proposed to model objects. They
can roughly be categorized as global representation and local representation.
Examples of global representations include: color appearance model,
subspace methods (like PCA), etc. Local representation describes the
object using a set of local features, which are usually selected as
the local characterization of object parts. This kind of representation
usually incorporates relations between local features to capture the
object structure. Typical such representation is region adjacent graph.
However, relative less work has been done on modeling the dynamic changes
of the object model. Dynamic feature graph is designed as a representation
that models both spatial and temporal characteristics of an object.
Spatially, the object is represented as an attributed relational graph,
with features as nodes and their relations as the edges. Temporally,
the graph can adaptively update itself to keep the good features and
eliminate unstable features. Learn
More
Trajectory
Based Multiple Object Tracking
Most tracking algorithms are based on the maximum a posteriori (MAP)
solution of a probabilistic framework called Hidden Markov Model, where
the distribution of the object state at current time instance is estimated
based on current and previous observations. However, this approach is
prone to errors caused by temporal distractions such as occlusion, background
clutter and multi-object confusion. In this paper we propose a multiple
object tracking algorithm that seeks the optimal state sequence which
maximizes the joint state-observation probability. We name this algorithm
trajectory tracking since it estimates the state sequence or trajectory
instead of the current state. The algorithm is capable of tracking multiple
objects whose number is unknown and varies during tracking. We introduce
an observation model which is composed of the original image, the foreground
mask given by background subtraction and the object detection map generated
by an object detector. The image provides the object appearance information.
The foreground mask enables the likelihood computation to consider the
multi-object configuration in its entirety. The detection map consists
of pixelwise object detection scores, which drives the tracking algorithm
to perform joint inference on both the number of objects and their configurations
efficiently. Learn
More
Background
layer model for tracking through occlusion
In this work, we extend previous research on layer-based tracker
by introducing the concept of background occluding layers and explicitly
inferring depth ordering of foreground layers. . Experimental results
show that under various conditions with occlusion, including situations
with moving objects undergoing complex motions or having complex interactions,
our tracking algorithm is able to handle many difficult tracking tasks
reliably.
Dynamic layer representation and
its applications to tracking
A dynamic layer
representation is proposed for tracking moving objects. Previous
work on layered representations has largely concentrated on
two-/multi-frame batch formulations, and tracking research has not
addressed the issue of joint estimation of object motion,
ownership and appearance. This research work extends the
estimation of layers in a dynamic scene to incremental estimation
formulation and demonstrates how this naturally solves the
tracking problem. Learn
More
Sampling methods for tracking and
detecting multiple objects
The CONDENSATION algorithm and its variants enable the
estimation of arbitrary multi-modal posterior distributions that
potentially represent multiple tracked objects. However, the
specific state representation adopted in the earlier work does not
explicitly supports counting, addition, deletion and occlusion of
objects. Furthermore, the representation may increasingly bias the
posterior density estimates towards objects with dominant
likelihood as the estimation progresses over many frames. Learn
More
Stereo Computation and
Image-Based New View Rendering
Learning
Based Stereo
This
paper describes a novel learning-based approach for improving the performance
of stereo computation. The behavior of a given window-based matching
method is characterized by whether the matching scores lead to the true
depth, the nearby foreground depth, or random depth values. The probabilities
that the matching result belonging to these three categories are determined
by the original stereo images, the underlying scene structure and the
size of matching window. This conditional probability is learned from
training data and is integrated into a depth estimation algorithm using
the MAP-MRF framework. Preliminary experimental results show that the
learning process captures common errors in SSD matching including the
fattening effect, the aperture effect, and mismatches in occluded or
low texture regions. It is also demonstrated that the proposed approach
significantly improves the accuracy of the depth computation. Learn
More
Direct
range space rendering
We
propose an algorithm that addresses the sparse image-based rendering
(IBR) problem. Unlike the traditional stereo or sparse IBR approach,
our method does not explicitly recover the scene geometry or the pixel-wise
correspondences between the two images. Instead, we solve this problem
by using a range space rendering algorithm, in which the depth information
is computed only implicitly in each new view. We show that the rendering
result is good even though the local depth maps are not correctly recovered.
Learn
More
Depth
recovery from unsynchronized cameras
An
algorithm is proposed for estimating dense depth information of dynamic
scenes using multiple video streams captured from unsynchronized fixed
cameras. We solve this problem by first imposing two assumptions about
the scene motions and the time difference between cameras. The scene
motion is represented using a local constant velocity model and the
camera temporal difference is modeled as a constant within a short of
period of time. Based on these models, geometric relations between the
images of moving scene points, the scene depth, the scene motions, and
the camera temporal offset are investigated and an estimation method is
developed to compute the camera temporal difference. The algorithm is
tested on both synthetic data and real images. Promising quantitative
and qualitative experimental results are demonstrated in the paper.
Dynamic
depth
recovery from synchronized video streams
This
work addresses the problem of extracting depth information of nonrigid
dynamic 3D scenes from multiple synchronized video streams. Three main
issues are discussed in this context: (i) temporally consistent depth
estimation, (ii) sharp depth discontinuity estimation around object
boundaries, and (iii) enforcement of the global visibility constraint.
We present a framework in which the scene is modeled as a collection of
3D piecewise planar surface patches induced by color based image
segmentation. This representation is continuously estimated using an
incremental formulation in which the 3D geometric, motion, and global
visibility constraints are enforced over space and time. The proposed
algorithm optimizes a cost function that incorporates the spatial color
consistency constraint and a smooth scene motion model.
Color
segmentation based stereo and the global matching criteria
We
present a new analysis by synthesis computational framework for stereo
vision. It is designed to achieve the following goals: (1) enforcing
global visibility constraints, (2) obtaining reliable depth for depth
boundaries and thin structures, (3) obtaining correct depth for
textureless regions, and
(4) hypothesizing correct depth for unmatched regions.
The framework employs depth and visibility based rendering within
a global matching criterion to compute depth in contrast with approaches
that rely on local matching measures and relaxation. A color
segmentation based depth representation guarantees smoothness in
textureless regions.
Hypothesizing depth from neighboring segments enables propagation
of correct depth and produces reasonable depth values for unmatched
region. A practical algorithm that integrates all these aspects is
presented in this paper.
Comparative experimental results are shown for real images.
Results on new view rendering based on a single stereo pair are
also demonstrated. Learn
More
Image and Video
Processing
Image hallucination with primal sketch
priors (Collaboration with Microsoft Research Asia)
We
propose a Bayesian approach to image hallucination. Given a generic
low resolution image, we hallucinate a high resolution image using a
set of training images. Our work is inspired by recent progress on natural
image statistics that the priors of image primitives can be well represented
by examples. Specifically, primal sketch priors (e.g., edges, ridges
and corners) are constructed and used to enhance the quality of the
hallucinated high resolution image. Moreover, a contour smoothness constraint
enforces consistency of primitives in the hallucinated image by a Markov-chain
based inference algorithm. A reconstruction constraint is also applied
to further improve the quality of the hallucinated image. Experiments
demonstrate that our approach can hallucinate high quality super-resolution
images. Learn More
Facial Modeling,
Animation, Analysis, and Transmission
The piecewise
Bézier volume deformation model (PBVD)
Capturing
real facial motions from videos enables automated construction of
dynamic models for facial animation. We proposed an explanation-based
facial motion tracking algorithm based on a piecewise Bézier volume
deformation model (PBVD).
The PBVD is a suitable model both for synthesis and analysis of
facial images. With this model, basic facial movements, or action units,
are first interactively defined. Then, by linearly combining these
action units, various facial movements are synthesized.
The magnitudes of these action units can be estimated from real
videos using a model-based tracking algorithm. The predefined PBVD
action units may also be adaptively modified to customize the dynamic
model for a particular face. Experimental results on PBVD-based
animation, model-based tracking, and explanation-based tracking are
demonstrated.
Learn
More
|