Love affair with CBIR – Part 2

Love affair with CBIR (Content base image retrieval system) Part 1

Update:
I never figure out, this would be series of article, here are links to rest of it

Love affair with CBIR – Part 1

Love affair with CBIR – Part 2 (The one you are reading)

Love affair with CBIR – Part 3

Love affair with CBIR – Part 4

This is continuation of the first post, if you not getting the context, read this first post.

De-duplication

In my last blog, I have describe the different solution angles for this, De-duplication and similar. Let’s start with de-duplication, Identical images with different names, different format, but same resolution, same scale, no rotation, no affine transformation, etc.

IdenticalImageDifferentSize

How do you start, do what rest of the people do, google for the library in .NET, which already do this. .NET was the prefer choice of framework, C# was my first love. I was much interest in libraries written for .NET rather than c++, python or MATLAB.

First idea to solve this problem was to do pixel by pixel comparison of both the image, if any of the pixels are different, I can get the percentage of the number of different pixels and then get the difference

Code for this, what describe on the below site

http://blogs.msdn.com/b/domgreen/archive/2009/09/06/comparing-two-images-in-c.aspx

http://www.codeproject.com/Articles/9299/Comparing-Images-using-GDI

This methods compare pixel by pixel or compare the byte hash to determine the difference, can detect the exact duplicate. But, I wasn’t in for this approaches, as it has two problems.

Problems

  1. They are slow for what I need. I need to compare 10000’s of images from database, not just two images, and it would take while, to compare them all
  2. Second, they aren’t fault tolerant, like same image with difference in jpeg compression result will result in false match. I want match to compensates this, and my need wasn’t to match pixel to pixel, just visually similar

article_4_2006_colbert_4.gif

Now, next milestone is to look for method/algorithm to address my problems

Another idea was to convert image into some kind of signature/fingerprint and compare the signatures with 10000’s of signatures to see if the image matches.

image

This would be faster, as it just need to compute few bytes compare to full images. But, now, challenge was to identify slight difference in image by signature and classify them as same or different.

I wasn’t the first guy to have this idea, and I found out after few googling, there is already an algorithm which does this. It known as “Perceptual hashing”. There is post on the hackerfactor, which describes the algorithm to phashing.

Now, I need to find the same implementation in C#. I found, there is guy name, ‘David Oftedal’ who posted the C# implementation at image hash. Sigh, this link doesn’t work anymore. No problem, I have got the implementation code in my ImageCompare application on GitHub. You can get it from there.

I got the reference to another library ‘ImageMagick.NET’, which is the .NET wrapper around the C++ ImageMagick. It has various algorithm to compare image, here below is the List

  1. Absolute
  2. Fuzz
  3. MeanAbsolute
  4. MeanErrorPerPixel
  5. MeanSquared
  6. NormalizedCrossCorrelation
  7. PeakAbsolute
  8. PeakSignalToNoiseRatio
  9. PerceptualHash,
  10. RootMeanSquared

You can compare the image with following code

   1: MagickImage orgImage = new MagickImage(orgImagePath);

   2: MagickImage dupImage = new MagickImage(dupImagePath);

   3: var percentage = orgImage.Compare(dupImage, imageAlgo);

See here, at number nine, there is PerceptualHash algorithm, but I found, when compare with David Oftedal implementation, it doesn’t give good result.

Further searching, I found the another article by ‘Jakob XnaFan Krarup’ on the code project. He compares the image with RGB histogram and get the difference using Bhattacharyya co-oefficient distance formula. You can read his article for more information about this methods.

So far, I have describe three implementation in this post

  1. David Oftedal implementation of pHash
  2. Perceptual Hash implementation of the ImageMagick library, along with other algorithms
  3. Jakob XnaFan Krarup implementation of perceptual hash at code project.

I decide to play around with this algorithm to test the deep waters myself. For this, I had Image compare utility

image

You can get the code from this on my GitHub account https://github.com/sbrakl/ImageCompare

ImageSet

To compare, I have created sample images from flicker.

  1. Original – 200px by 120px resolution picture
  2. Original2 – Copy of Original
  3. Resize – 400px by 240px resolution picture
  4. FormatType – Change the type from jpg to png keeping resolution same

Rests images are self explanatory

image

This utility has Main form, as you can see in the below Image.

SNAGHTML12729a4

To use it, select the first image, second image and click Image Compare. You get the percentage difference in the textbox.

If you need Grid view of all the percentage difference, open Data Compare form, select the algorithm, and get the percentage difference of all the images.

SNAGHTML129fc7d

You can download the code and play around to understand details of each compare algorithm.

David Oftedal pHash Algorithm

David Oftedal pHash gives you the unsign long integer, which you can calculate and store in the database for all the images,

To compare all image to new image, just calculate the phash for new image and compare it all the phashes in table, that’s it. If the hash matches, it exactly the same image. Simple and Superfast method.

SNAGHTML1344a5e

If you need to get the similarity distance function, when you pass the hash, you get the difference in percentage, 100% mean identical, 0 mean completely different.

   1: public static double Similarity(ulong hash1, ulong hash2)

   2: {

   3:     return ((64 - BitCount(hash1 ^ hash2)) * 100) / 64.0;

   4: }

Jakob XnaFan Krarup Implementation

Jakob KnaFan Krarup use RGB histogram approach to compare. It calculates the histogram of grayscale hue values, and compare them

image

You can store normalize histogram either in SQL table or NOSQL database and compare using bhattacharya coefficient  distance formula to get the percentage difference.

Using either of the two algorithm, I have address the problems state above. But still, if you need to detect similar images with invariance like different lighting, rotation, affine transformation, scale etc, these algorithm won’t suit. Even, for simple problem, like user upload the part image of the whole image present, these algorithms would consider it to be different image.

For visual similar matching, you need some point to point matching algorithm like below. Hang on to my third post, which will unleash the power of point to point compare.

box_in_scenebox

SURFExample

Advertisements

2 thoughts on “Love affair with CBIR – Part 2

  1. vishal

    getting file not found exception Could not load file or assembly ‘Magick.NET-x64.dll’ or one of its dependencies. The specified module could not be found.

    Like

    Reply

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s