Monthly Archives: January 2015

Love affair with CBIR Part 3

Love affair with CBIR (Content base image retrieval system) Part 3

I never figure out, this would be series of article, here are links to rest of it

Love affair with CBIR – Part 1 

Love affair with CBIR – Part 2

Love affair with CBIR – Part 3 (The one you are reading)

Love affair with CBIR – Part 4

This is continuation of the series of article, so, if you are not able to catch with this post, read the previous ones to understand the context.

Similar Images

Last article, we had discuss about the De-duplication of image, and how to get identical across thousand of images. But, when it some to similar images, those algorithm does work. My quest for find some algorithm, which works for similar images with has different lighting condition, rotation, affine transformation, yet could identify the image.

List of CBIR

As, I was going through this rabbit hole, I found there are several research has been done in this area and tons of research paper has been publish. There are several open and close source CBIR already developed.  Many are the outcome of the University research, while others are commercial.

Google already has reverse image search, if you haven’t tried it, I recommend to do so. It fast, robust, and works really well in finding similar image. But, this won’t work for me, as it search entire web for images, and prerequisites was, that image set in which you are trying to find similar need to be crawl by google search bots. Mine was internal website (wasn’t expose to internet), and image set was sort of copyrighted.

Next best commercial solution was tineye matchengine. It have sets of API, where I could submit my image, it would run it’s indexing algorithm on my set of images, then I can pass the query image, and it would return in set of similar images. As you can see above, paint of American Gothic, where farmer in different mask would be detect as similar. This is big deal in computer vision research. But, sigh, Tineye cost the bomb ($1000 for starter) and my project stakeholder required some free solution.

List of free ones (CBIR)

I look to various open source CBIR, WindSurf is desktop base application in Java,  LIRE is web implementation using lucene and solr in Java, FIRE (Flexible Image Retrieval on Image Parts) developed in C++ and phyton. But, none suit my needs, first I need something in .NET, second, it need to work on part image search. I mean, I give just the portion of image, and it should retrieve the complete image, if it able to match the part.

It looks like, there isn’t good CBIR in .NET and I need to built one myself. Sigh! To built one CBIR, I need to know how to tell computer to identify images, next question stump be was how does human eye identify images?

Image Feature


I did some research and found out, that human eye tries to find feature in the images, and try to identify same set of features in another image, to tell if the images are similar. There is excellent  tutorial on OpenCV, which takes more about the image feature detection.

Now, my next problem was, how do I tell computer to detect feature in image. Images are going to be random, and i need some algorithm, which can detect significant feature in any image I throw to it, not just one type. e.g. In picture of a cat, it should detect cat eyes, nose, whisker, pointed ear. But, in some beach picture, it should detect people on beach, coconut trees, sea waves. You got the problem?

Now, I was going down hot waters , research field of Computer Vision, Image Processing, mathematics, matching learning, artificial intelligence. In short, areas I was afraid of. But, its love affair, and I knew, its not going to be easy.

Corner Point Detection

In computer vision, they call this problem as corner or interesting point detection. Their are several algorithm, like Harris corner detection, Susan corner detection, LoG, DoG, and DoH feature detection and several others. One that catch my attention was Harris corner detection and FREAK (Fast Retina Keypoint) detection.

This algorithm detect corner something like this

Cool, knowing algorithm name is one thing, and implementing is other. I was searching for .NET implementation of these corner detection algorithm, where I can to know about two beautiful open source libraries / framework.

  1. Accord Framework
  2. Emgu CV

This libraries have set of functions which you need to for Computer Vision, Image Processing, Machine Learning, Artificial Neuron networking, etc.

Correlation base matching

Now, I know, how to detect corner, I was stump with another problem, one I detect corners, from Image, how do I compare them. I mean, corners are just set of (x,y) points in space. Now, I need to detect this points in other image. You would say, just compare their x coordinate and y coordinate. Cool, but remember rotation and affine transformation in first post, it means one image points x,y no longer remain the same in other image.


Phew, it gonna we tough than I though. Here, come another algorithm, know as correlation based matching. It use matrices, and whole lots of math to find the distance between points and matching points. Accord framework has the Correlation matching algorithm, which approximately find the matching points.

   1: CorrelationMatching matcher = new CorrelationMatching(9, img1, img2);

   2: IntPoint[][] matches = matcher.Match(harrisPoints1, harrisPoints2);

It detect matching points, somewhat similar to image below

Figure: Top Left: Corners detect in Image A, Top Right: Corners detect in ImageB. Bottom Left: Correlation mapping of points in both image. Bottom Right: Remain after RANSAC estimation.

Cools, now, I know, how to match points in one image with another. But, how do I find set of matching points with thousand of images in database. Stump again!

Another idea struck, I need to say to describe the points detect in image. If the description of the point match with any of the description of points in thousand image, it means, it match. Wow! I am genius.

SIFT descriptors extractor

But, wait, how do describe the points? Few googling, and found out computer vision scientist David Lowe, though about this problem long back in 2004. He came up with SIFT algorithm (Scale-invariant feature transform), which works on LoG, DoG, and DoH feature detection, detects the interesting keypoints (corner), and describe it with arrays of double of 128 bit length.


Now, Image comparison problem has been reduce to math, where you need to compare one matrix with other


Problem: Need for speed

Cool, but this there is problem. First, the SIFT detection of points and descriptors is not fast, I need some algorithm, which is fast enough to detect points and descriptors of thousand of images in matter of minutes.

Second, method to compare fast with the thousands of images in database


Other descriptor extractor

Lets, take problem, one by one. SIFT fast alternative? There is another algorithm, SURF (Speeded Up Robust Features), which is fast compare to SIFT and in default mode, computes descriptor of 64 bit length. But, there is another problem with SIFT and SURF, both are patented and can’t be use in commercial application without loyalty fee to patent holder. Life in never easy. Quest of open source algorithms continues. 

There are several other algorithm, which are patent free, and can be use in commercial application, like BRISK, FREAK and ORB. You can get their .NET implementation in EMGU CV library.

Cool, now, I have way to reduce image in set of jagged array (read Matrix) and now, I can find a particular feature in other images, irrespective to lighting, slight rotation and affine transformation. But, second problem till remain, how to compare descriptors to sets of hundred millions even billion (descriptors of thousand of images) of descriptors.


This can be tackle in two ways.

Trees and Nearest Neighbor

Trees – Answer like in tree structure. There need to create the tree data structure which stores the descriptors in such fashion, that it can be fast matched when searched in tree. Something similar to binary tree, but here challenges are more. In order to compare two descriptors say [d1,d2,d3,…, d64] which is one row in matrix to another row is via [n1,n2,n3…..n64] is done via different distance metrics formula like Euclidian, Manhattan or Minkowski. You arrange the descriptors in such way that it compliment the distance with each and other in the tree.

R-Tree, BK-Tree, Cover tree, MVP tree are possible solution.I had gone with k-d tree. You can kind the implementation of the k-d tree in the Accord Framework libraries. Once, you arrange the descriptors in the k-d tree, you perform the k-nearest neighbor to find the descriptor in skyscraper like matrix. 

In my personal experience, I found k-d tree do not perform well in higher dimension 64 and 128. Accord framework had the Approximate Nearest Neighbor implementation, but the result were Random. I found FLANN library, which there in Emgu CV as Flann Index, give better result. FLANN uses k-d forest to perform approximate nearest neighbor search.

If all this stuff makes you dizzy, no problem. You can checkout my implementation on GitHub where you can play around the application to understand all the Jargons which I have thrown here. 

Till, problem lies with the volume of descriptors, in my trial, 1000 images, which cumulates to approx 40 MB in size had SURF descriptors 120 MB in size. Phew, three times more. What if, I had ten thousands of images, descriptors size would result in GBs. Maintaining such large dataset of descriptors is problem, even on Hard disk, plus loading into memory during search is big pain.

Bag of Visual Words

Bag of Visual Words – In Bag of Visual Words, you read sets of images, and calculate mean sets of feature out of it. Now, you get histogram of mean visual descriptors which appear number of times in image. It will get the array of fix length which describes the image. This is brief description of bag of words. If you don’t understand the bit, you can look for detail explanation on this blog, this one or google out for more post which describes the Bag of Visual Words.

Plus, applied the document and query weighting scheme to query out the images. You can find more about this in the implementation of term frequency, document frequency and normalization. There is good paper on the why to use of weighting scheme of vector space (Bag of visual words, in our case)

Now, we have reduce to statistical problem, and compare thousands, even millions images in database. Here, each image is just the representation of visual bag of words, and base on the query image, which again is converted into visual bag of words, and let the similar ones for it.

Enough Words, Action

You would be thinking, enough of gyan, where the implementation/code of all these? That’s my plan for next post. Stay tuned.


Love affair with CBIR – Part 2

Love affair with CBIR (Content base image retrieval system) Part 1

I never figure out, this would be series of article, here are links to rest of it

Love affair with CBIR – Part 1

Love affair with CBIR – Part 2 (The one you are reading)

Love affair with CBIR – Part 3

Love affair with CBIR – Part 4

This is continuation of the first post, if you not getting the context, read this first post.


In my last blog, I have describe the different solution angles for this, De-duplication and similar. Let’s start with de-duplication, Identical images with different names, different format, but same resolution, same scale, no rotation, no affine transformation, etc.


How do you start, do what rest of the people do, google for the library in .NET, which already do this. .NET was the prefer choice of framework, C# was my first love. I was much interest in libraries written for .NET rather than c++, python or MATLAB.

First idea to solve this problem was to do pixel by pixel comparison of both the image, if any of the pixels are different, I can get the percentage of the number of different pixels and then get the difference

Code for this, what describe on the below site

This methods compare pixel by pixel or compare the byte hash to determine the difference, can detect the exact duplicate. But, I wasn’t in for this approaches, as it has two problems.


  1. They are slow for what I need. I need to compare 10000’s of images from database, not just two images, and it would take while, to compare them all
  2. Second, they aren’t fault tolerant, like same image with difference in jpeg compression result will result in false match. I want match to compensates this, and my need wasn’t to match pixel to pixel, just visually similar


Now, next milestone is to look for method/algorithm to address my problems

Another idea was to convert image into some kind of signature/fingerprint and compare the signatures with 10000’s of signatures to see if the image matches.


This would be faster, as it just need to compute few bytes compare to full images. But, now, challenge was to identify slight difference in image by signature and classify them as same or different.

I wasn’t the first guy to have this idea, and I found out after few googling, there is already an algorithm which does this. It known as “Perceptual hashing”. There is post on the hackerfactor, which describes the algorithm to phashing.

Now, I need to find the same implementation in C#. I found, there is guy name, ‘David Oftedal’ who posted the C# implementation at image hash. Sigh, this link doesn’t work anymore. No problem, I have got the implementation code in my ImageCompare application on GitHub. You can get it from there.

I got the reference to another library ‘ImageMagick.NET’, which is the .NET wrapper around the C++ ImageMagick. It has various algorithm to compare image, here below is the List

  1. Absolute
  2. Fuzz
  3. MeanAbsolute
  4. MeanErrorPerPixel
  5. MeanSquared
  6. NormalizedCrossCorrelation
  7. PeakAbsolute
  8. PeakSignalToNoiseRatio
  9. PerceptualHash,
  10. RootMeanSquared

You can compare the image with following code

   1: MagickImage orgImage = new MagickImage(orgImagePath);

   2: MagickImage dupImage = new MagickImage(dupImagePath);

   3: var percentage = orgImage.Compare(dupImage, imageAlgo);

See here, at number nine, there is PerceptualHash algorithm, but I found, when compare with David Oftedal implementation, it doesn’t give good result.

Further searching, I found the another article by ‘Jakob XnaFan Krarup’ on the code project. He compares the image with RGB histogram and get the difference using Bhattacharyya co-oefficient distance formula. You can read his article for more information about this methods.

So far, I have describe three implementation in this post

  1. David Oftedal implementation of pHash
  2. Perceptual Hash implementation of the ImageMagick library, along with other algorithms
  3. Jakob XnaFan Krarup implementation of perceptual hash at code project.

I decide to play around with this algorithm to test the deep waters myself. For this, I had Image compare utility


You can get the code from this on my GitHub account


To compare, I have created sample images from flicker.

  1. Original – 200px by 120px resolution picture
  2. Original2 – Copy of Original
  3. Resize – 400px by 240px resolution picture
  4. FormatType – Change the type from jpg to png keeping resolution same

Rests images are self explanatory


This utility has Main form, as you can see in the below Image.


To use it, select the first image, second image and click Image Compare. You get the percentage difference in the textbox.

If you need Grid view of all the percentage difference, open Data Compare form, select the algorithm, and get the percentage difference of all the images.


You can download the code and play around to understand details of each compare algorithm.

David Oftedal pHash Algorithm

David Oftedal pHash gives you the unsign long integer, which you can calculate and store in the database for all the images,

To compare all image to new image, just calculate the phash for new image and compare it all the phashes in table, that’s it. If the hash matches, it exactly the same image. Simple and Superfast method.


If you need to get the similarity distance function, when you pass the hash, you get the difference in percentage, 100% mean identical, 0 mean completely different.

   1: public static double Similarity(ulong hash1, ulong hash2)

   2: {

   3:     return ((64 - BitCount(hash1 ^ hash2)) * 100) / 64.0;

   4: }

Jakob XnaFan Krarup Implementation

Jakob KnaFan Krarup use RGB histogram approach to compare. It calculates the histogram of grayscale hue values, and compare them


You can store normalize histogram either in SQL table or NOSQL database and compare using bhattacharya coefficient  distance formula to get the percentage difference.

Using either of the two algorithm, I have address the problems state above. But still, if you need to detect similar images with invariance like different lighting, rotation, affine transformation, scale etc, these algorithm won’t suit. Even, for simple problem, like user upload the part image of the whole image present, these algorithms would consider it to be different image.

For visual similar matching, you need some point to point matching algorithm like below. Hang on to my third post, which will unleash the power of point to point compare.



Love affair with CBIR – Part 1

Love affair with CBIR (Content base image retrieval system) Part 1

I never figure out, this would be series of article, here are links to rest of it

Love affair with CBIR – Part 1 (The one you are reading)

Love affair with CBIR – Part 2

Love affair with CBIR – Part 3

Love affair with CBIR – Part 4

Recently, I was architecting for the project for one of the American client, and they had the one simple requirement, “Ability to detect duplicate images from set of image files during upload”.

I was thinking, common, this is simple. When the user uploads an image to website, just check if the image already exists, that all. What the big deal in it. If this was some string, it would be

   1: SELECT * FROM TABLEA WHERE Field1 = ‘String’

Same thing, I need to do with Image. It here, I was stump! In MS SQL, I can’t compare images, as image if store in table, would be the blob object. I can’t just compare the byte size of two blob object, as there would be possibility, where two different images could have same byte size. See below example


Now, the challenges start increasing, what if there are two identical image, but of different file type. Say, one is jpg, other is png


There are other challenges too, invariance

Different type of invariance

  1. Illumination
    image image
  2. Scale
    image image
  3. Rotation
    image image
  4. Affine
    image image
  5. Full Perspective
    image image

So now, things started to complicate, there is no simple way to compare two images, as you do for string, integer and any primitive type. This was the start of love affair. It mystical, it esoteric and magical, you don’t know ‘HOW’ to get it, just follow the with ‘WHAT’

For the solution, there are two angle to it, De-duplication and similar. With this, there are two new terms, 

  1. De-duplication: Identical images, multiple copies of the same photo in with different names, different format. This is simple and everybody understands this.
  2. Similar Image: This is important one, and this defines the game, what do you call similar? Is same color similar for you? Is same shape similar for you? Is same texture similar for you? Or is mix bag of all. For my love affair, it’s invariance which was shown above, which are visually appears to be same, but different exposure plus slight difference in rotation, angle and scale.

In the next post, we will see how we will address each of it.