Sunday, January 15, 2012

Ensemble of Exemplar-SVMs for Object Detection and Beyond

Link to the paper 
pdf

Authors & Group:

Tomasz Malisiewicz, A
bhinav Gupta, 
Alexei A. Efros from CMU



Intuition:
Building classifiers for various tasks has been extensively researched upon by the computer vision and machine learning communities. There are various parametric methods like Suppor Vector Machines, Neural Networks, Linear/Quadratic Discriminants and various non-parametric methods like Voronoi diagrams, K-nearest neighbors etc. Each have their own advantages/disadvantages. One fundamental disadvantage of using a trained classifier for a specific task with lot of training data is does the training data correctly parameterize the model and is the classifier generalized enough to incorporate the possible variations. This paper tries to solve both problems by using just one training image for each classifier. So, they build an ensemble(collection) of classifiers trained with one positive instance i.e. the exemplar(example) and millions of negative images. The key motivation is this method represents the positives in a non-parametric way and the negatives by semi-parametric or parametric way. In the first figure you see that they have a Linear SVM classifier for each positive example instead of trying to come up with a classifier for set of positive examples.

Figures



Previous work
Previous work is very huge because they apply this idea to various problems like meta-data transfer, segmentation, geometry estimation, 3D model transfer and RElated Object Priming. But, previous techniques trained classifiers with some positive data and lot of negative data. The downfall is does the trained classifier accurately capture the model and does it generalize well enough to detect variations to the model? 


Gory details
The Math is non-existant or very simple but one detail that is important is these ensemble of classifiers give different scores and comparing them will be like comparing apples and oranges. So, there is an additional calibration step where they modify the distance function of the SVM's. According to the authors a simple way to interpret the calibration step is to think of it as a scaling and shifting of the decision boundary. See figure below:
Final Output:

Future work ( according to authors )
There is no specific future plan given by the authors but this contribution opens up doors for many exciting applications in object recognition, scene understanding and computer graphics.

Future ideas by me ( if I can think of any )
I had two ideas that I wanted to explore:
1) The idea is to start with an exemplar SVM and to incrementally add training data to this model and retrain it periodically. This way you are adding examples that satisfy the underlying parametric model with caution 
2) The other idea was to build relations between these exemplars and make a hierarchical classifier that will make the problem of scene understanding more semantic and might even improve the accuracy over existing methods.

2 comments:

  1. Hi Manohar,

    Tomasz[http://www.cs.cmu.edu/~tmalisie/] (first author on paper) here:

    Your future ideas are good and we have thinking along these lines for quite some time. In fact, we started out working on (1) but the problem is that when you keep adding positives, the linear SVM gets too happy too fast. The separation between a few number of positives and a large set of negatives is so easy, that you can do it with just about any set of few positive windows. Most recently, I have been working on ways to determine when to stop incrementally adding positives during learning based on some type of cross validation termination condition.

    But I'm 100% with you on the "proceed with caution idea" and you can check out papers on self-paced learning to get a better idea of how people have done this successfully in practice.

    Regarding (2), you can go back to one of my older papers to see an earlier algorithm of mine for building such a hierarchy:

    Tomasz Malisiewicz, Alexei A. Efros. Beyond Categories: The Visual Memex Model for Reasoning About Object Relationships. In NIPS, 2009.

    In fact, this entire exemplar-based approach to recognition is motivated by the desire to build such a graph structure to represent visual concepts. At CMU, we have called this the Visual Memex, and you can think of the Exemplar-SVMs as a sort of glue which can be used to piece together different visual pieces. Now the trick is using the glue (learned similarity functions) and the building blocks (exemplars) to arrange them in a tree. There's definitely a lot of merit to this idea, I definitely think you're thinking along the same lines that Efros and I have been thinking about it.

    ReplyDelete
  2. Hi Tosmasz,

    That's awesome! I will definitely check out the references you pointed out. I am glad you came across this post and took time to reflect your thoughts! I think its a really cool paper and loved reading it :-) My friend Santosh also works with Efros and we talked about Visual Memex during our conversations. Thanks again and good luck!

    ReplyDelete