Saturday, January 14, 2012

Real-Time Stereo Visual Odometry for Autonomous Ground Vehicles




Link to the paper:
http://www-robotics.jpl.nasa.gov/publications/Andrew_Howard/howard_iros08_visodom.pdf

Authors & Group
Andrew Howard from JPL, NASA.

Intuition
Consider the figure below containing images of a rubix cube taken from two view points. We want to find the match of a point in the left image to a point in the right image where both points come from the same physical location in the scene. This simple problem has been worked upon by various people from various fields for two decades now!
Computer Vision community - Tried to come with best feature descriptors that represent the point uniquely.
Signal/Image Processing community - Various methods to match two feature descriptors (Sum of absolute differences, Normalized cross correlation etc.)
Machine Learning - Finding features that are specific to the task ( learning on the fly! )
This paper builds on a very simple idea that in case of rigid motion euclidean distances between two physical points remains constant! So, if you have a stereo camera which can give you depth at each point then other than doing feature match at a per point level you can also add additional constraints such as point A and point B are 2 meters away in frame1 and so they should be 2 meters away in frame2 as well. If not then either of the points is not matched correctly.

Figures




Previous work
All the previous methods worked on outlier rejection but this paper talks about inlier detection. RANSAC ( Random sample consensus ) is the most prominent outlier rejection technique which assumes a model and finds the points that satisfy the models and discards the remaining points as outliers and does this iteratively finding the best fit and the points that match the best fit ( inliers ). This would fail if the inlier/outlier ratio is very low which puts lot of emphasis on having a good feature detector. Using the method in this paper even if the inlier/outlier ratio is less than one you could still find inliers reliably.

Gory details
The math in this paper is actually pretty easy and the details are also straightforward. Once you compute matches of points in frame1 and frame2 you create a graph. Each node in this graph is a match and we join two nodes if the euclidean distances between the two points in both frames remains consistant. Given this graph if you find the maximum clique(One of Karp's 21-NP complete problems so they use an approximate method to compute it) then you get all the inliers as these are the only set of points that have the same euclidean distances between them in both frames.

Future work ( according to authors )
The authors claim that a systematic bias is introduced when you use this method to estimate the visual odometry of the robot due to inaccuracies in camera calibration. So, they plan to use a ground-truth trajectory and tweak the calibration until the estimated trajectory does not match the ground-truth.

Future ideas by me ( if I can think of any )
I am thinking about two really cool extensions to this paper:
1) Extending this to multi-frame in which case it will be a spatio-temporal graph and modify the proposed approximate technique to make the inliers more robust. 
2) I was little disappointed about the consistency check used in the paper where they check that the euclidean distance is less than a threshold and they join the edge. The change I propose is to use a Gaussian distribution with a dynamic variance based on the match score of the feature and then assign a probability to the edge and then find the maximal clique that has the highest probability. ( This will be a really coool paper! )

No comments:

Post a Comment