Wednesday, February 15, 2012

Fast Approximate Nearest Neighbors with Automatic Algorithm Configuration

Link to the paper 
pdf

Authors & Group:

Marius Muja, David G. Lowe
University of British Columbia, Vancouver, B.C., Canada

Intuition:

Imagine a problem and many many solutions to it! Now, also imagine some of them work well in some circumstances and not so good in certain cases. Also, in the worst case these solutions are almost equivalent to bruteforce algorithm. This is exactly the state of nearest neighbor and its variant approximate nearest neighbor problem. This paper makes two contributions, one very incremental towards a different approach to find approximate nearest neighbor but another significant and interesting one is to define a cost function and use this cost function to find the best algorithm for finding approximate nearest neighbor for given data and on top of it also fine tune the parameters of the algorithm to achieve lowest cost. The cost is defined in terms of creating the index, searching the index for a given query/queries and also reduce the memory footprint. They define weights which can be tuned to give priority to speed/memory, training time/searching time etc. The cost function is also very intuitive and I am including it instead of the various graphs in the figures below to give an idea. 
In the cost function, s stands for search time, b stands for time for building index and m is the memory used. w_b and w_m are the weights used to give importance to time used in building the index and the memory  used by the method respectively. The denominator is the optimal value when you use as much memory as you can(w_m = 0).

Figures




Previous work
Previous work is very huge because they apply this idea to various problems like meta-data transfer, segmentation, geometry estimation, 3D model transfer and RElated Object Priming. But, previous techniques trained classifiers with some positive data and lot of negative data. The downfall is does the trained classifier accurately capture the model and does it generalize well enough to detect variations to the model? 



Gory details
This paper is pretty straightforward to follow if you know the data structures and clustering algorithms mentioned like KDTree, K-Means clustering tree, voronoi diagrams etc. A quick read of the topics on wiki is enough!
 
Future work ( according to authors )
The authors make a good point of supporting upcoming algorithms in their framework/library. This is a very helpful contribution to the entire community working on similar problems as they provide a nice framework and also take the pain of incorporating the state of the art algorithms to the library.

Future ideas by me ( if I can think of any )
I have a very  simple idea specific to a problem I am working on which was the main reason I ended up reading this paper. I am interested in finding nearest neighbors between consecutive 3D scans captured by a robot. There are two things specific to this problem that can be exploited:
 1) Since the two scans are taken in a span of 50 milliseconds the robot wont move a lot and hence the points will be pretty close and since they are indexed the index could be used to make the search faster!
2) I did not find/read any paper that talks about creating incremental index and since I have to use the ANN algorithm for every two frames I am thinking of changing the index incrementally instead of recreating it every time. 
 
Misc
The first thing I did is to try out the library and I found out how omnipresent it is! The FLANN library is included in ROS(Robot operating system), OpenCV and many other libraries. I created a quick example based on OpenCV so that it might be serve as a useful starting point for anyone trying to use it. Please visit my post on my tech blog related to this: cvKNN

Sunday, January 15, 2012

Ensemble of Exemplar-SVMs for Object Detection and Beyond

Link to the paper 
pdf

Authors & Group:

Tomasz Malisiewicz, A
bhinav Gupta, 
Alexei A. Efros from CMU



Intuition:
Building classifiers for various tasks has been extensively researched upon by the computer vision and machine learning communities. There are various parametric methods like Suppor Vector Machines, Neural Networks, Linear/Quadratic Discriminants and various non-parametric methods like Voronoi diagrams, K-nearest neighbors etc. Each have their own advantages/disadvantages. One fundamental disadvantage of using a trained classifier for a specific task with lot of training data is does the training data correctly parameterize the model and is the classifier generalized enough to incorporate the possible variations. This paper tries to solve both problems by using just one training image for each classifier. So, they build an ensemble(collection) of classifiers trained with one positive instance i.e. the exemplar(example) and millions of negative images. The key motivation is this method represents the positives in a non-parametric way and the negatives by semi-parametric or parametric way. In the first figure you see that they have a Linear SVM classifier for each positive example instead of trying to come up with a classifier for set of positive examples.

Figures



Previous work
Previous work is very huge because they apply this idea to various problems like meta-data transfer, segmentation, geometry estimation, 3D model transfer and RElated Object Priming. But, previous techniques trained classifiers with some positive data and lot of negative data. The downfall is does the trained classifier accurately capture the model and does it generalize well enough to detect variations to the model? 


Gory details
The Math is non-existant or very simple but one detail that is important is these ensemble of classifiers give different scores and comparing them will be like comparing apples and oranges. So, there is an additional calibration step where they modify the distance function of the SVM's. According to the authors a simple way to interpret the calibration step is to think of it as a scaling and shifting of the decision boundary. See figure below:
Final Output:

Future work ( according to authors )
There is no specific future plan given by the authors but this contribution opens up doors for many exciting applications in object recognition, scene understanding and computer graphics.

Future ideas by me ( if I can think of any )
I had two ideas that I wanted to explore:
1) The idea is to start with an exemplar SVM and to incrementally add training data to this model and retrain it periodically. This way you are adding examples that satisfy the underlying parametric model with caution 
2) The other idea was to build relations between these exemplars and make a hierarchical classifier that will make the problem of scene understanding more semantic and might even improve the accuracy over existing methods.

Saturday, January 14, 2012

Real-Time Stereo Visual Odometry for Autonomous Ground Vehicles




Link to the paper:
http://www-robotics.jpl.nasa.gov/publications/Andrew_Howard/howard_iros08_visodom.pdf

Authors & Group
Andrew Howard from JPL, NASA.

Intuition
Consider the figure below containing images of a rubix cube taken from two view points. We want to find the match of a point in the left image to a point in the right image where both points come from the same physical location in the scene. This simple problem has been worked upon by various people from various fields for two decades now!
Computer Vision community - Tried to come with best feature descriptors that represent the point uniquely.
Signal/Image Processing community - Various methods to match two feature descriptors (Sum of absolute differences, Normalized cross correlation etc.)
Machine Learning - Finding features that are specific to the task ( learning on the fly! )
This paper builds on a very simple idea that in case of rigid motion euclidean distances between two physical points remains constant! So, if you have a stereo camera which can give you depth at each point then other than doing feature match at a per point level you can also add additional constraints such as point A and point B are 2 meters away in frame1 and so they should be 2 meters away in frame2 as well. If not then either of the points is not matched correctly.

Figures




Previous work
All the previous methods worked on outlier rejection but this paper talks about inlier detection. RANSAC ( Random sample consensus ) is the most prominent outlier rejection technique which assumes a model and finds the points that satisfy the models and discards the remaining points as outliers and does this iteratively finding the best fit and the points that match the best fit ( inliers ). This would fail if the inlier/outlier ratio is very low which puts lot of emphasis on having a good feature detector. Using the method in this paper even if the inlier/outlier ratio is less than one you could still find inliers reliably.

Gory details
The math in this paper is actually pretty easy and the details are also straightforward. Once you compute matches of points in frame1 and frame2 you create a graph. Each node in this graph is a match and we join two nodes if the euclidean distances between the two points in both frames remains consistant. Given this graph if you find the maximum clique(One of Karp's 21-NP complete problems so they use an approximate method to compute it) then you get all the inliers as these are the only set of points that have the same euclidean distances between them in both frames.

Future work ( according to authors )
The authors claim that a systematic bias is introduced when you use this method to estimate the visual odometry of the robot due to inaccuracies in camera calibration. So, they plan to use a ground-truth trajectory and tweak the calibration until the estimated trajectory does not match the ground-truth.

Future ideas by me ( if I can think of any )
I am thinking about two really cool extensions to this paper:
1) Extending this to multi-frame in which case it will be a spatio-temporal graph and modify the proposed approximate technique to make the inliers more robust. 
2) I was little disappointed about the consistency check used in the paper where they check that the euclidean distance is less than a threshold and they join the edge. The change I propose is to use a Gaussian distribution with a dynamic variance based on the match score of the feature and then assign a probability to the edge and then find the maximal clique that has the highest probability. ( This will be a really coool paper! )

Sunday, January 8, 2012

The Belief Roadmap - Efficient planning in belief space

So, here is the link to the paper I am talking about: pdf

Group:
Robust Robotics Group from MIT
Samuel Prentice and Nicholas Roy are the authors


Contribution: 
Planning in belief space can be done efficiently for linear Gaussian systems using a factored form of the covariance matrix.


Intuition:
The idea of the paper is simple and yet awesome! To explain it I will take a simple example. Lets say you want to drive from point A to point B in a city and you have continuous GPS signal available. There are multiple ways to get from A to B but the catch is the GPS signal is accurate to millimeters(open spaces?) in some places but not so accurate at other places(surrounded by tall buildings?). Given this information you are to plan a route and execute it. Your basic gut will tell you to pick the route where the GPS information is more accurate even though the route is longer but at the same time you would not mind lapses in the GPS signal occasionally. So, how do you capture this mathematically? That is exactly what this paper solves and its uber cool!!

Figures:
The figure below captures the essence of the paper. In the figure(a) you sample means from a space( note an inherent assumption is made that it is easy to classify a mean position to be free or classified as an obstacle) and in figure(b) you add edges to nearest points and then pick the shortest path to reach the goal. In figure(c) after sampling the means you also estimate the covariance matrices at each position(covariance is the inverse of information matrix so the smaller the norm of the covariance the more certain you are about the position) and you pick the path that maximizes the certainty or minimizes the covariance upon reaching the goal(instead of the shortest path picked by the PRM algorithm).





Previous Techniques:
Prior to this paper people used a technique called PRM - Probabilistic roadmap algorithm. Since your current position is not accurately determined you have an estimate of your current position. In PRM the planner uses only the mean and not the covariance where as in BRM you also incorporate additional statistics such as the covariance. So, in actuality you are doing nothing but increase the state space of your PRM configuration. But, this also complicates things on how to compute the covariances at various mean positions given the starting position. That is the latter part of the paper using lemmas and theorems but the intuition again is that they facator the covariance matrix and achieve this somehow. ( not important to understand the idea! ).

Gory details:
Control theory to factorize the covariance matrix..


Future Work according to authors:
This paper only takes the kinematic motion planning into consideration but they plan to extend it to kinodynamic planning.


My ideas on future work:
Can we combine this with D*lite and new search techniques to get better planning algorithm?


Saturday, January 7, 2012

The only thing that is constant is Change!

Yet another blog I started. This time to vent out my knowledge and not frustration :-)
I think I have a very good knack of abstracting out the math and details to understand the basic idea of ap paper and I want to put that to good use! ( Self claimed trait BTW! ). Moreover, I am fed up of people complaining about my reading frequency and habits and sadly they are right so this is also to increase my reading ability and frequency. Yet another change in my life!!

I love certain papers even though they are simple and forget some even though they are awesome. The ones I remember are the ones with intuitive ideas(Math can be gory!) and the ones that are not so simple just get lost in the past. In any case, my objective in starting this blog is three fold:

1) To document various awesome ideas I come across from research papers.
2) To actually remember the authors and groups ( I suck at this and by going through the exercise of visiting their webpage and the groups webpage I think I will become better over time ).
3) I want to convey the intuition behind the paper and not necessarily explain the paper and If I cannot find a way to do it easily that means either the paper sucks or I did not understand it properly. Either way its helpful to me.

The basic structure of each paper I review will be:


Link to the paper ( if applicable )
Authors & Group
Intuition
Figures
Previous work
Gory details
Future work ( according to authors )
Future ideas by me ( if I can think of any )

Here we go, Research community brace yourself for some awesomeness from Manohar Paluri...