**Link to the paper**

**Authors & Group:**

Marius Muja, David G. Lowe

University of British Columbia, Vancouver, B.C., Canada

University of British Columbia, Vancouver, B.C., Canada

**Intuition:**

Imagine a problem and many many solutions to it! Now, also imagine some of them work well in some circumstances and not so good in certain cases. Also, in the worst case these solutions are almost equivalent to bruteforce algorithm. This is exactly the state of nearest neighbor and its variant approximate nearest neighbor problem. This paper makes two contributions, one very incremental towards a different approach to find approximate nearest neighbor but another significant and interesting one is to define a cost function and use this cost function to find the best algorithm for finding approximate nearest neighbor for given data and on top of it also fine tune the parameters of the algorithm to achieve lowest cost. The cost is defined in terms of creating the index, searching the index for a given query/queries and also reduce the memory footprint. They define weights which can be tuned to give priority to speed/memory, training time/searching time etc. The cost function is also very intuitive and I am including it instead of the various graphs in the figures below to give an idea.

In the cost function,

*s*stands for search time,*b*stands for time for building index and*m*is the memory used.*w_b*and*w_m*are the weights used to give importance to time used in building the index and the memory used by the method respectively. The denominator is the optimal value when you use as much memory as you can(w_m = 0).**Figures**

**Previous work**

Previous work is very huge because they apply this idea to various problems like meta-data transfer, segmentation, geometry estimation, 3D model transfer and RElated Object Priming. But, previous techniques trained classifiers with some positive data and lot of negative data. The downfall is does the trained classifier accurately capture the model and does it generalize well enough to detect variations to the model?

**Gory details**

This
paper is pretty straightforward to follow if you know the data
structures and clustering algorithms mentioned like KDTree, K-Means
clustering tree, voronoi diagrams etc. A quick read of the topics on
wiki is enough!

**Future work ( according to authors )**

The
authors make a good point of supporting upcoming algorithms in their
framework/library. This is a very helpful contribution to the entire
community working on similar problems as they provide a nice framework
and also take the pain of incorporating the state of the art algorithms
to the library.

**Future ideas by me ( if I can think of any )**

I
have a very simple idea specific to a problem I am working on which
was the main reason I ended up reading this paper. I am interested in
finding nearest neighbors between consecutive 3D scans captured by a
robot. There are two things specific to this problem that can be
exploited:

1)
Since the two scans are taken in a span of 50 milliseconds the robot
wont move a lot and hence the points will be pretty close and since they
are indexed the index could be used to make the search faster!

2)
I did not find/read any paper that talks about creating incremental
index and since I have to use the ANN algorithm for every two frames I
am thinking of changing the index incrementally instead of recreating it
every time.

**Misc**

The first thing I did is to try out the library and I found out how omnipresent it is! The FLANN library is included in ROS(Robot operating system), OpenCV and many other libraries. I created a quick example based on OpenCV so that it might be serve as a useful starting point for anyone trying to use it. Please visit my post on my tech blog related to this: cvKNN