| Complete list of publications | |||||
| Back to Roberto's home page | |||||
![]() ![]() |
Learning Outdoor Color Classification IEEE Trans. PAMI, November 2006 We present an algorithm for color classification with explicit illuminant estimation and compensation. A Gaussian classifier is trained with color samples from just one training image. Then, using a simple diagonal illumination model, the illuminants in a new scene that contains some of the surface classes seen in the training image are estimated in a Maximum Likelihood framework using the Expectation Maximization algorithm. We also show how to impose priors on the illuminants, effectively computing a Maximum A Posteriori estimation. Experimental results are provided to demonstrate the performances of our classification algorithm in the case of outdoor images. PDF Version (1.2 MB) |
|
![]() |
Fast Image Motion Computation on an Embedded Computer with X. Lu ECV 2006 Wireless, battery-powered camera networks are becoming of interest for surveillance and monitoring applications. The computational power of these platforms is often limited in order to reduce energy consumption. Among the visual tasks that the onboard processor may be required to perform, motion analysis is one of the most basic and relevant. Knowledge of the direction of motion and velocity of a moving body may be used to take actions such as sending an alarm or triggering other camera nodes in the network. We present a fast algorithm for identifying moving areas in an image and computing the average velocity in such areas. The algorithm, which was implemented and tested on a Crossbow Stargate embedded platform, is comprised of three stages. First, local differential measurements are used to determine an initial labeling of image blocks. A total least squares approach is proposed, with fast implementation inspired by the work of Benedetti and Perona. Then, belief propagation is used to impose spatial coherence and resolve aperture effect inherent in textureless areas. Finally, the velocity of the resulting blobs is estimated via least squares regression. A detailed analysis of timing and power consumption characteristics of this algorithm is also presented. PDF Version (300 KB) |
|
![]() |
Cell Phone-based Wayfinding for the Visually Impaired with J. Coughlan and X. Shen IMV 2006 A major challenge faced by the blind and visually impaired population is that of wayfinding the ability of a person to find his or her way to a given destination. We propose a new wayfinding aid based on a camera cell phone, which is held by the user to find and read aloud specially designed machine-readable signs in the environment (labeling locations such as offices and restrooms). Our main technical innovation is that we have designed these machine-readable signs to be detected and located in fractions of a second on the cell phone CPU, even at a distance of several meters. A linear barcode printed on the sign is read using novel decoding algorithms that are robust to noisy images. The information read from the barcode is then read aloud using pre-recorded or synthetic speech. We have implemented a prototype system on the Nokia 7610 cell phone, and preliminary experiments with blind subjects demonstrate the feasibility of using the system as a real-time wayfinding aid. PDF Version (300 KB) |
|
![]() |
Rotational Invariant Operators based on Steerable Filter Banks with X. Shi, A.L. Ribeiro Castro, and R. Montgomery IEEE Signal Processing Letters We introduce a technique for designing rotation invariant operators based on steerable filter banks. Steerable filters are widely used in Computer Vision as local descriptors for texture analysis. Rotation invariance has been shown to improve texture-based classification in certain contexts. Our approach to invariance is based on solving the PDE associated with the formulation of invariance in a Lie group framework. PDF Version (360 KB) |
|
![]() |
Energy Consumption Tradeoffs in Visual Sensor Networks with C. Margi and K. Obraczka SBRC 2006 Visual sensor networks are being increasingly employed as a tool for monitoring and surveillance of wide areas. Due to the relatively high power consumption characteristics of cameras, as well as their more stringent processing, storage, and communication requirements, it is important to carefully evaluate the different modes of operation of these systems, in order to devise energyaware resource management policies. The ultimate goal is to deliver adequate applicationlevel performance (e.g., maximize the probability of detecting events), yet maximally prolonging the systems operational lifetime. In this paper we present an accurate power consumption analysis for the different elementary tasks forming the duty cycle of a visual sensor node in a wireless camera network testbed. We also present a number of different duty cycle configurations, and provide direct energy consumption measurements for each one of them. Based on the energy consumption characterization we conducted for the elementary visual sensing tasks, we explore the possibility of predicting the lifetime of a visual sensor network system. PDF Version (250 KB) |
|
![]() |
Hybrid Joint-Separable Multibody Tracking with O. Lanz CVPR 2005 Statistical models for tracking different moving bodies must be able to reason about occlusions in order to be effective. Representing the joint statistics across different bodies is computationally hard, since the size of the representation grows exponentially with the number of bodies being tracked. Separable tracking, with one tracker per body, cannot deal with occlusions effectively. We propose a new model, dubbed Hybrid Joint-Separable (HJS), that uses a representation size that grows linearly with the number of bodies, and a computational complexity that grows quadratically. This model can reason explicitly about occlusions. We describe a particle filter implementation of this model, and present promising experimental results. PDF Version (1.2 MB) |
|
![]() |
Detection and Localization of Curbs and Stairways Using Stereo Vision with X. Lu ICRA 2005 We present algorithms to detect and precisely localize curbs and stairways for autonomous navigation. These algorithms combine brightness information (in the form of edgels) with 3-D data from a commercial stereo system. The overall system (including stereo computation) runs at about 4 Hz on a 1 GHz laptop. We show experimental results and discuss advantages and shortcomings of our approach. PDF Version (508 KB) |
|
![]() |
Obstacle Detection and Terrain Classification for Autonomous Off-Road Navigation with A. Castano, A. Talukder, L. Matthies Autonomous Robots, 18:81-102, 2005 Autonomous navigation in cross-country environments presents many new challenges with respect to more traditional, urban environments. The lack of highly structured components in the scene complicates the design of even basic functionalities such as obstacle detection. In addition to the geometric description of the scene, terrain typing is also an important component of the perceptual system. Recognizing the different classes of terrain and obstacles enables the path planner to choose the most efficient route toward the desired goal. This paper presents new sensor processing algorithms that are suitable for cross-country autonomous navigation. We consider two sensor systems that complement each other in an ideal sensor suite: a color stereo camera, and a single axis ladar. We propose an obstacle detection technique, based on stereo range measurements, that does not rely on typical structural assumption on the scene (such as the presence of a visible ground plane); a color-based classification system to label the detected obstacles according to a set of terrain classes; and an algorithm for the analysis of ladar data that allows one to discriminate between grass and obstacles (such as tree trunks or rocks), even when such obstacles are partially hidden in the grass. These algorithms have been developed and implemented by the Jet Propulsion Laboratory (JPL) as part of its involvement in a number of projects sponsored by the US Department of Defense, and have enabled safe autonomous navigation in high-vegetated, off-road terrain. PDF Version (4.7 MB) |
|
![]() |
A Tool for Range Sensing and Environment Discovery for the Blind with D. Yuan IEEE Workshop on Real-Time 3-D Sensors and Their Use, 2004 This paper describes the development of a hand-held environment discovery tool for the blind. The final device will be composed of a laser-based range sensor and of an onboard processor. As the user swings the hand-held system around, he/she will receive local range information by means of a tactile interface. In addition, the time profile of the range will be analyzed by the onboard processor to detect environmental features that are critical for mobility, such as curbs, steps and drop-offs. In our current implementation, range is collected by a short-baseline triangulation system formed by a point laser and a miniaturized camera, producing readings at frame rate. An Extended Kalman filter is used to track the range data and detect environmental features of interest. PDF Version (1.1 MB) |
|
![]() |
Wide-Baseline Feature Matching Using the Cross-Epipolar Ordering Constraint with X. Lu CVPR 2004 Robust feature matching across different views of the same scene taken by two cameras with wide baseline and arbitrary rotation is still an open problem. Matching based on appearance alone is unreliable, because a surface point changes its appearance depending on the viewpoint. As a result, this approach may generate "unrealizable" correspondence sets, that is, sets of pairwise correspondences that are not consistent with the epipolar geometry. We propose a novel technique, which is applicable when the epipolar lines in the two images are approximately parallel. Our algorithm only searches in the space of realizable matchings, thereby reducing the likelihood of mismatches as well as the dimension of the search domain. This approach can use any given feature descriptor with an associated distance function, and assumes no knowledge of the intrinsic parameters of the two cameras. The extension to the general case of epipoles in finite position is the object of current research. PDF Version (1.1 MB) |
|
![]() |
Probabilistic 3D Data Fusion for Multiresolution Surface Generation with A. Johnson 3DPVT 2002 In this paper we present an algorithm for multiresolution integration of 3D data collected from multiple distributed sensors. The input to the algorithm is a set of 3D surface samples and associated sensor models. Using a probabilistic rule, a surface probability function is generated that represents the probability that a particular volume of space contains the surface. The surface probability function is represented using an octree data structure; regions of space with samples of large covariance are stored at a coarser level than regions of space containing samples with smaller covariance. The algorithm outputs a multiresolution surface generated by connecting points that lie on the ridge of surface probability with triangles scaled to match the local discretization of space given by the octree. To demonstrate the performance of our algorithm, we present results from 3D data generated by scanning lidar and structure from motion. PDF Version (4.3 MB) |
|
![]() |
Onboard Science Processing and Buffer Management for Intelligent Deep Space Communications with S. Dolinar, A. Matache, F. Pollara IEEE Aerospace 2000 We present an integrated system for the intelligent progressive transmission of data for deep space communications. This work is motivated by the realization that much more information can be collected by imaging and remote sensing equipment than can be transmitted through downlink channels. Suitable onboard science processing allows us to introduce semantics to the data collected by the imaging and remote sensing equipment. The data stream is then prioritized according to its significance in the image, and the most significant segments of data are transmitted first by means of a prioritized buffer management strategy. We show that this system allows to optimally exploit the limited onboard resources (downlink data rate, buffer size) and therefore to maximize the science return of a mission. PDF Version (848 KB) |
|
![]() |
Terrain Perception for DEMO III with P. Bellutta, L. Matthies, K. Owens, A. Rankin Intelligent Vehicle Symposium 2000 The Demo III program has as its primary focus the development of autonomous mobility for a small rugged cross country vehicle. Enabling vision based terrain perception technology for classification of scene geometry and material is currently under development at JPL. In this paper we report recent progress on both stereo-based obstacle detection and terrain cover color-based classification. Our experiments show that the integration of geometric description and terrain cover characterization may be the key to enabling successful autonomous navigation in cross-country vegetated terrain. PDF Version (256 KB) |
|
![]() |
Independent Component Analysis of Textures with J. Portilla ICCV 1999 A common method for texture representation is to use the marginal probability densities over the outputs of a set of multi-orientation, multi-scale filters as a description of the texture. We propose a technique, based on Independent Components Analysis, for choosing the set of filters that yield the most informative marginals, meaning that the product over the marginals most closely approximates the joint probability density function of the filter outputs. The algorithm is implemented using a steerable filter space. Experiments involving both texture classification and synthesis show that compared to Principal Components Analysis, ICA provides superior performance for modeling of natural and synthetic textures. PDF Version (228 KB) |
|
![]() |
Bayesian Fusion of Color and Texture Segmentations ICCV 1999 In many applications one would like to use information from both color and texture features in order to segment an image. We propose a novel technique to combine "soft" segmentations computed for two or more features independently. Our algorithm merges models according to a maximum descriptiveness criterion, and allows to choose any number of classes for the final grouping. This technique also allows to improve the quality of supervised classification based on one feature (e.g. color) by merging information from unsupervised segmentation based on another feature (e.g., texture.) PDF Version (208 KB) |
|
![]() |
Bilateral Filtering for Gray and Color Images with C. Tomasi ICCV 1998 Bilateral filtering smoothes images while preserving edges, by means of a nonlinear combination of nearby image values. The method is non-iterative, local, and simple. It combines gray levels or colors based on both their geometric closeness and their photometric similarity, and prefers near values to distant values in both domain and range. In contrast with filters that operate on the three bands of a color image separately, a bilateral filter can enforce the perceptual metric underlying the CIE-Lab color space, and smooth colors and preserve edges in a way that is tuned to human perception. Also, in contrast with standard filtering, bilateral filtering produces no phantom colors along edges in color images, and reduces phantom colors where they appear in the original image. PDF Version (2.9 MB) HTML Version |
|
![]() |
Stereo Matching as a Nearest Neighbor Problem with C. Tomasi IEEE Trans. PAMI, April 1998 We propose a representation of images, called intrinsic curves, that transforms stereo matching from a search problem into a nearest-neighbor problem. Our approach combines the ideas of associative storage of images with connectedness of the representation: intrinsic curves are the paths that a set of local image descriptors trace as an image scanline is traversed from left to right. Curves become surfaces when full images are considered instead of scanlines. Because only the path in the space of descriptors is used for matching, intrinsic curves loose track of space, and are invariant with respect to disparity under ideal circumstances. Establishing stereo correspondences then becomes a trivial lookup problem. We also show how to use intrinsic curves to match real images in the presence of noise, brightness bias, contrast fluctuations, and moderate geometric distortion. In this case, matching becomes a nearest-neighbor problem. We also show how intrinsic curves can be used to deal with image ambiguity and occlusions. We carry out experiments on single-scanline matching to prove the feasibility of the approach and to illustrate its main features. PDF Version (956 KB) |
|
![]() |
Efficient Implementation of Deformable Filter Banks with P. Perona, D. Shy Technical Report CNS-TR-97-04, California Institute of Technology, 1997 (a shorter version appears on IEEE Trans. Signal Processing, April 1998) This paper describes efficient schemes for the computation of a large number of multiscale/multioriented filtered versions of an image. We generalize the well-known steerable/scalable ("deformable") filter bank structure by imposing X-Y separability on the basis filters. This systems, designed by an iterative projections technique, achieve substantial reduction of the computational cost. To reduce the memory requirement, we adopt a multirate implementation. The resulting structure, however, is not shift-invariant - it gives raise to "aliasing". We introduce a design criterion for multirate deformable structures that jointly controls the approximation error and the shift-variance. PDF Version (976 KB) |
|
![]() |
2-D IFIR Structures Using Generalized Factorable Filters IEEE Trans. Circuits and Systems, July 1997 The extension of the idea of Interpolated FIR filters to the two-dimensional case is presented. Such systems allow for lower computational weight, in terms of number of elementary operations per input sample. We have considered 2-D IFIR filters with parallelogram-shaped spectral support. To design the filters in the two stages, we have used a technique recently developed by Chen and Vaidyanathan. The resulting filters belong to the class of Generalized Factorable filters, for which an efficient implementation exists. An interesting problem peculiar to the multidimensional case is the choice of the sublattice which represents the definition support of the first-stage filter. We present a strategy for choosing (given the spectral support of the desired frequency response) the optimal sublattice, and to design the second-stage (interpolator) filter in order to achieve low overall computational complexity. PDF Version (616 KB) |
|
![]() |
Spectral Characteristics and Motion-Compensated Restoration of Composite Frames with G.M. Cortelazzo IEEE Trans. Image Processing, January 1995 The practice of superimposing the fields of a frame is applied in various fields, for example, thermographic and biomedical imaging. The pictures obtained in this way, which are termed composite frames, are severely degraded if the scene's objects are not perfectly still. The restoration of composite frames affected by motion-induced blurring requires the ability to estimate the field displacement from composite frames. The frequency domain analysis of composite frames proposed in this work suggests a displacement estimation technique of a phase-correlation type that can be applied to composite frames. PDF Version (684 KB) |
|
![]() |
Multistage Sampling Structure Conversion of Video Signals with G.M. Cortelazzo, G.A. Mian IEEE Trans. Circuits and Systems for Video Technology, October 1993 This work extends multistage implementation of sampling structure conversion to the multidimensional case. The issues arising in this task are usefully addressed on the basis of lattice theory. Numerical data supporting the advantages of multistage sampling conversion are presented, and the case of format conversion from 4/3 to 16/9 aspect ratio is examined as a study case. The main indication of this work is that multistage implementation, in the case of systems for sampling structure conversion of video signals, may improve the system characteristics and visual rendition. PDF Version (1.2 MB) |
|
![]() |
On the Determination of All the Sublattices of Preassigned Index and Its Application to Multidimensional Subsampling with G.M. Cortelazzo IEEE Trans. Circuits and Systems for Video Technology, August 1993 Subsampling offers an effective and simple way to implement a data compression technique. Its use in the video context is rather attractive, as its application to current video schemes, such as MUSE and HD-MAC, clearly shows. A sensible exploitation of subsampling's potential requires the systematic evaluation of the image degradation effects due to the specific subsampling lattice. Such a possibility is equivalent to the determination of all the sublattices of a given index within a lattice, a problem that can be solved, as the work shows, on the basis of the Hermite normal-form theorem. The practical use of the result in the subsampling context is exemplified for a data compression case. PDF Version (316 KB) |