Category Archives: CBIR

Beyond CBIR

In last post, I have written about the series of article about my odyssey in creating the CBIR application in .NET. CBIR is part of Computer Vision. As I was researching in this area, I found Computer Vision has lots of application, rather than just detecting similar images.

The libraries (EMGU CV and Accord Framework) which I introduce during series of post on CBIR, has lot to offer apart from similar image detection.

There are salient articles on the Web if you google for capabilities of both, what I try here is just to give brief  capabilities of both the framework, where you can get the idea and Google more for the application for it.

First in the list is

Face Detection

In my last series of post, I describe about finding the similar images, but those similar images algorithm won’t work for the face detection. Face detection is separate area, which has it’s own set of algorithm, which are available in both Emgu CV and Accord Framework.

Emgu CV Face detection – There is excellent article on the CodeProject, which describe how to detect face using Emgu CV. You can get the sample application of face detection at [Path where EMGU CV is installed]\Emgu.CV.Example\FaceDetection directory.

image Face_Recognition/Main1.jpg

Accord Framework has two application for face detection and tracking. Face detection is for recognizing the face, while tracking is more for gesture control.

SNAGHTML533d2a4

Gesture Control

As you have seen in accord framework, it is using face tracking for gesture, both Emgu CV and accord have capability for tracking with face, hand, etc. Here below are youtube videos which does tracking with Emgu CV

Even Accord framework has gesture control, something similar to touch screen device, where you draw the gesture and it identifies, what the gesture is

Object detection

In Emgu CV, there are list of object detection, such as license plate. There is another example, using Accord/Aforge framework to get the traffic signs

OCR

Optical character recognition, which every ones familiar, it event been use in the license plate detection. In Accord framework, there is an example of learning deep neural networks using restricted boltzmann machines

Motion Detection

Emgu CV has certain motion detection algorithms, which can get pedestrian from the frame

 

Photograph Stitching

Emgu CV as well as Accord framework as photo stiching capability. In EmguCV you can get sample application at [Path where EMGU CV is installed]\Emgu.CV.Example\Stitching. For accord framework, sample code is at [Directory where Accord.NET is installed]\Framework\Samples\Imaging\Panorama

SNAGHTML6192377

Love affair with CBIR Part 4

Love affair with CBIR (Content base image retrieval system) Part 4

Update:
I never figure out, this would be series of article, here are links to rest of it

Love affair with CBIR – Part 1 

Love affair with CBIR – Part 2

Love affair with CBIR – Part 3

Love affair with CBIR – Part 4 (The one you are reading)

In this article, I will walkthru the sample application, which I have created to demonstrate the working of CBIR. This is continuation of the series of article, so, if you are not able to catch with this post, read the previous ones to understand the context.

Last article, you have probably read about the theory involve in finding the similar image. But, you won’t get it, till you see the working code. Same was my frustration, when I was doing the research for the CBIR stuff. I find tons of papers from various universities around the world. But, it was hard to find working demo, especially in .NET I googled for days, found blogs, article, but most of it doesn’t share implementation code. Most of the Computer Vision stuff happens in C++\Matlab\Phyton. Very few taker for .NET languages.

That’s where I decided to help the poor souls like me, who are searching for CBIR implementation in .NET would get benefited with this proof of concept application. Especially, college student’s who are trying to learn C# language, .NET framework and trying to built CBIR all at same time. This would be boon from them, like the oasis in the desert of internet

I would be showing you demo on the Wang image dataset. You can download 1000 test images, which contain 10 sets of 100 images each. This will help you understand how reverse image search works, and how efficient the algorithm is, in getting the similar image.

You can download Image Database application from https://github.com/sbrakl/ImageDatabase

License

Note: This application which I have build is under GPLv3 license, but the libraries it (EMGU CV, Accord Framework, etc.) aren’t under same license. If you need to use it in commercial application, I request you to check the license of individual libraries used in this application, and use it appropriately.

This utility is been developed in WPF, .NET framework 4.5 as the proof of concept for various image recognition algorithm.

This utility consists of three areas

SNAGHTML143ca66

1 – This Is the index section, before you query the image with your favorite algorithm, you need to index the image directory for that algorithm. Select the Image directory which contains all the image, and click ‘Calculate Index’. Note, Image directory must contain all images at the root of the folder, if the images are nested with sub-directories, it won’t take it for index.

This process of index calculation need be done once for selected folder, against which you can query multiple times. With this utility, you can only search the images, which it has been index. It means, if the image is missed in indexing, if won’t be queried either.

2 – This is query image section. Select the image and search. ‘Only Search’ is for comparison of same query image against multiple algorithms. Just change the algorithm from dropdown and click ‘Only Search’, if you need to query the same image for new selected algorithm.

3- This is the result area, where visual similar images to one been queried would displayed here.

ImageDatabaseUsage

Let analyze each algorithm in details, starting with Binary Algorithms,

Binary Algorithms

Binary algorithms are good in finding de-duplication, but can’t help with similar images search. You can read more about this in Part2 of this series.

  • Perceptual Hash

pHash has been implement using the SimilarImageDotNet library, which has been written by ‘Carlos Ballesteros Velasco’ . It compute the hash into string.

   1: using (Bitmap bmp = new Bitmap(Image.FromFile(fi.FullName)))

   2: {

   3:      compressHash = SimilarImage.GetCompressedImageHashAsString(bmp);

   4: }

You can compare two images by

   1: var dist = SimilarImage.CompareHashes(queryImageCompressHash, imgInfo.CompressHash);

Distance range between 0 and 1, where 0 mean complete different, and 1 mean complete identical.

  • RBG Histogram

RBG uses the color histogram for compare. I have use the Eye.Open similar image finder library. It convert the image into histogram of RBG color (two dimension array). During compare, it calculate the mean square distance between two histogram (two dimension array) and gives the distance.

   1: queryProjections = new RgbProjections(ImageUtility.GetRgbProjections(bitmap));

   2: var dist = imgInfo.RGBProjection.CalculateSimilarity(queryProjections);

  • Bhattacharyya histogram algorithm

In the second post, I had describe about the ‘Jakob XnaFan Krarup’ implementation on the code project. It resize the image to 16×16 size, convert the image into grayscale, and get the R channel to calculate the normalize histogram (double type jagged array) of image. Then, it compare normalize histogram using Bhattacharyya coefficient distance formal for get the distance. Here, in my implementation, I am convert the difference into percentage difference, 0% different mean, it exact same identical image, 100% difference means completely different image. 30% mean, query image is 30% different from search image. Lesser the number, better is result for this algorithm

   1: queryHistogram = BhattacharyyaCompare.Bhattacharyya.CalculateNormalizedHistogram(img);

   2: var dist = BhattacharyyaCompare.Bhattacharyya.CompareHistogramPercDiff(queryHistogram, norHist);

These were the binary algorithm, now time for global descriptors

Global Descriptors

  • CEDD

Global descriptors are signature, which describes image from whole. It differs from the binary algorithm, as it consider color, edges, texture to describe the image, rather than some simple projection of binary algorithm. It been create by Savvas A. Chatzichristofis, and available on his website http://chatzichristofis.info/?page_id=15

CEDD (Color and Edge Directivity Descriptor) descriptor consists of 144 bins (double array of 144 length), where first 24 bins represents the 24 shades of color used by the system, and rest 5 set of 24 bins represent the texture unit. This set classifies image in the texture area. You can download the code to calculate CEDD descriptor from Savvas site.

Print

You can get the CEDD descriptor from the bitmap from the following code

   1: double[] queryCeddDiscriptor;            

   2: using (Bitmap bmp = new Bitmap(Image.FromFile(queryImagePath)))

   3: {

   4:     queryCeddDiscriptor = cedd.Apply(bmp);

   5: }

It compare the two descriptors using the Tanimoto Coefficient, which can be compare by as follows

   1: var dist = CEDD_Descriptor.CEDD.Compare(queryCeddDiscriptor, ceddDiscriptor);

During my experimentation, CEDD give much better results compare to binary algorithm, as you can see below. When you search for Bus, it brings all remain Bus images from the database.

SNAGHTML177e3ff

Notice the green button, ‘Show/Hide Setting’ appear, when you select CEDD. When, I was developing CEDD, I was hypothetical problem, say I have million images. In linear approach, it needs to compare each and every million descriptors before getting the right ones. This would be slow. But, I have the hint from the RDMS indexes, that database do not compare each row in table, they built index around it. MS SQL Server built binary tree cluster/non-cluster index for particular column, hence query is blazing fast.

Same concept, I had applied here. I have create CEDD BK-Tree index. Since, compare function of two descriptors is not simple as for string or integers, I need to get tanimoto distance. Idea is to arrange the descriptors in such a fashion, that similar one, once having lower distance value to each other are place near each other. When compare, I need to compare just fraction of  total descriptor (10% of million in our case) to get the near about values. Search down the tree is n(log n) times faster, where n is total number of descriptors. Can’t understand a word of this paragraph, in short and plain English, search is blazing fast using BK-Tree index.

Note: You need to index for each algorithm type, linear and bk-tree. Select Linear and click ‘Calculate Index’ will calculate linear CEDD index, and Select BK-Tree and click ‘Calculate Index’ will calculate BK-Tree index.

SNAGHTML1da9e6a

Vectors Descriptors

  • SURF

I did mention about the SURF image descriptors in the Part 3 of these series. Here, I am using the Emgu CV implementation. Surf algorithm runs in two parts, first it detects keypoints and second, it calculate descriptors for those keypoints.

   1: //Initializing the SURF Detector

   2: double hessianThresh = 500;

   3: SURFDetector surfDectector = new SURFDetector(hessianThresh, false);

   4:  

   5: //Calculating the Keypoints and descriptor together

   6: using (Image<Gray, byte> observerImage = new Image<Gray, byte>(strImagePath))

   7: {

   8:     VectorOfKeyPoint observerKeyPoints = new VectorOfKeyPoint();

   9:     Matrix<float> observerDescriptor = surfDectector.DetectAndCompute(observerImage, null, observerKeyPoints);

  10: }

In this POC application, use can define Surf Settings

SNAGHTML8e8854

Threshold is hessain threshold use in initializing the SURF Detectors, Uniqueness threshold is the distance in computing the similar descriptors and good match is number, where atleast ‘x’ number of descriptors should be matched, if image needs to be similar

SURF also has two approaches, similar to CEDD

  • Linear
  • Flann

Linear is comparing the descriptors of all the image in linear fashion. FLANN is library which I mention in Part 3 of this series. I am using the Emgu CV implementation of Flann Index. Here, it takes descriptors of all the images, create a super matrix for it, and index this matrix with FLANN algorithms (Randomize K-D tree forest). When you perform the knn search, it find the approximate neighbors descriptors in matter of milliseconds. You can play around with both the implementation to understand the performance impact.

If you query, and need to check the what SURF has match, you can right-click query image, and see in the window, what key points has been match as shown below

SNAGHTML32cf8e

Note: SURF implementation of OpenCV is patented. Royalty needs to be purchase before using it commercial application. Emgu CV library is under dual license, if you developing for commercial project, even EMGU CV license needs to purchased.

  • Accord SURF

Here, I am using the Accord Framework version of SURF. Accord framework implementation is based on Open source implementation in the OpenSURF computer vision library by Christopher Evans (http://www.chrisevansdev.com). Used under the LGPL.

Here, concept remains the same, it first detects keypoints and then it calculate descriptors for those keypoints.  Here, below is code sample, of how it looks

   1: double hessianThresh = 500;

   2: float hessianThreshold2 = (float)hessianThresh / 1000000;

   3: //Open Surf Detector, hessian value in float i.e. 0.0002f

   4: SpeededUpRobustFeaturesDetector surf = new SpeededUpRobustFeaturesDetector(hessianThreshold2);

   5: using (Bitmap observerImage = (Bitmap)Image.FromFile(fi.FullName))

   6: {

   7:     List<SpeededUpRobustFeaturePoint> observerImageSurfPoints = surf.ProcessImage(observerImage);

   8: }

In this POC application, use can define Accord Surf Settings as following

SNAGHTMLcbb34b

Similar to EMGU CV Surf, In Accord SURF too, I had implemented two search algorithm. Linear search will compare each descriptors of all the images with query one to find similar image, while kd-tree, I am trying to built some kind of index these descriptors. I am using Accord-Framework kd-tree implementation.

In my experience, k-d tree wasn’t good for this implementation. kd-tree knn search performs poor in higher dimensions (more than 10), and here, I am having surf descriptor of 64 dimensions. I need some algorithm similar to FLANN, which does the approximate nearest neighbor search but couldn’t find such implementation in .NET. If anybody, who is reading this blog finds one, do let me know in post comments.

If you query, and need to check the what Accord SURF has match, you can right-click query image, and see in the window, what key points has been match as shown below

SNAGHTML399b55

  • LoCATe

LoCATe is algorithm, where it is finding SURF for keypoint detection, and using CEDD for compute local descriptors. This algorithm was create by Savvas A. Chatzichristofis, same guy, you built CEDD. Its available on his website http://chatzichristofis.info/?page_id=1479.

In order to use LoCATe, you need to following

If you are creating Index for first time, you can create the Codebook by ticking Create CodeBook checkbox. Below textbox, will display the path where codebook would be created.

SNAGHTMLd7d694

If you are creating Index for subsequence time, you can select the Codebook previously create as reference for Indexing.

You would be wondering, that codebook, you need to follow my Part3, where I have explained the Bag of Visual Words algorithm. This is how LoCATe algorithm works

  1. It computes the descriptors of all the images in the index directory
  2. It takes 10% of the total descriptors compute on sampling bases for vector quantization
  3. Vector quantization is perform, by calculating the k-mean clustering algorithm found in Accord framework. Here, there is numeric updown textbox for Codebook length. This length is taken for number of cluster, which needs to be calculate. Normal, I take binary series number like 64, 128, 256, 512, etc.
  4. K-mean clustering finds the 128 (as in given example) mean cluster descriptors from the 10% sample descriptors. (This step takes lot of time, for 1000 images, it took 3 hrs to calculate the mean)
  5. I take these 128 mean descriptor and save it in the LocateCookBook_{codebook length}_{number of image}.cb file
  6. Then, I start calculating the inverted index of visual bag of words for each image from these mean descriptor. 
  7. I store this inverted index in another file, which found the histogram index, which I will use to perform the query

Here, what happen, when the user search the image

  1. Computes the Locate descriptors for query image
  2. Taking codebook create earlier as reference, it will compute the inverted index of visual bag of words for query image
  3. If Apply LTC scheme checkbox is click, it will apply the query weighting scheme to both, inverted index of query image as well as all the images indexes.
  4. Base on the Euclidian distance, it finds the nearest match and display the result. It will return max 25 result. I have hard coded this value in code
  5. If ‘Extended’ query algo is selected, then it will perform further search of results return from bag of words search. It will run EmguCV SURF to find the exact or partial match, and weight the ranking base on the CEDD and LoCATe percentage score.

If you are not able to catch up with this article, I strongly advise you to google for “bag of visual bag of words” algorithm, and try to understand the same.

Point of interest here, inverted index of visual bag of words describes the image in histogram, what it is. In my experiment, LoCATe algorithm outperforms all the other algorithm mention above. It the best Algorithm, I could come up in my research. Extended give better result in term of partial match. Partial match is where query image in part of one of the actual image existing in the set of images for which search is perform.

SNAGHTML4012b6

On the whole, with this research, I found Computer Vision field to be interesting and work on CBIR would be improve. But, I ran out of time, and this research need to be stopped. If you are reading this, and if this article help you, or have any suggestion to improve it, please leave a comment. I would be happy to help you out, if my busy schedule permits 😦

Love affair with CBIR Part 3

Love affair with CBIR (Content base image retrieval system) Part 3

Update:
I never figure out, this would be series of article, here are links to rest of it

Love affair with CBIR – Part 1 

Love affair with CBIR – Part 2

Love affair with CBIR – Part 3 (The one you are reading)

Love affair with CBIR – Part 4

This is continuation of the series of article, so, if you are not able to catch with this post, read the previous ones to understand the context.

Similar Images

Last article, we had discuss about the De-duplication of image, and how to get identical across thousand of images. But, when it some to similar images, those algorithm does work. My quest for find some algorithm, which works for similar images with has different lighting condition, rotation, affine transformation, yet could identify the image.

List of CBIR

As, I was going through this rabbit hole, I found there are several research has been done in this area and tons of research paper has been publish. There are several open and close source CBIR already developed.  Many are the outcome of the University research, while others are commercial.

Google already has reverse image search, if you haven’t tried it, I recommend to do so. It fast, robust, and works really well in finding similar image. But, this won’t work for me, as it search entire web for images, and prerequisites was, that image set in which you are trying to find similar need to be crawl by google search bots. Mine was internal website (wasn’t expose to internet), and image set was sort of copyrighted.

Next best commercial solution was tineye matchengine. It have sets of API, where I could submit my image, it would run it’s indexing algorithm on my set of images, then I can pass the query image, and it would return in set of similar images. As you can see above, paint of American Gothic, where farmer in different mask would be detect as similar. This is big deal in computer vision research. But, sigh, Tineye cost the bomb ($1000 for starter) and my project stakeholder required some free solution.

List of free ones (CBIR)

I look to various open source CBIR, WindSurf is desktop base application in Java,  LIRE is web implementation using lucene and solr in Java, FIRE (Flexible Image Retrieval on Image Parts) developed in C++ and phyton. But, none suit my needs, first I need something in .NET, second, it need to work on part image search. I mean, I give just the portion of image, and it should retrieve the complete image, if it able to match the part.

It looks like, there isn’t good CBIR in .NET and I need to built one myself. Sigh! To built one CBIR, I need to know how to tell computer to identify images, next question stump be was how does human eye identify images?

Image Feature

image

I did some research and found out, that human eye tries to find feature in the images, and try to identify same set of features in another image, to tell if the images are similar. There is excellent  tutorial on OpenCV, which takes more about the image feature detection.

Now, my next problem was, how do I tell computer to detect feature in image. Images are going to be random, and i need some algorithm, which can detect significant feature in any image I throw to it, not just one type. e.g. In picture of a cat, it should detect cat eyes, nose, whisker, pointed ear. But, in some beach picture, it should detect people on beach, coconut trees, sea waves. You got the problem?

Now, I was going down hot waters , research field of Computer Vision, Image Processing, mathematics, matching learning, artificial intelligence. In short, areas I was afraid of. But, its love affair, and I knew, its not going to be easy.

Corner Point Detection

In computer vision, they call this problem as corner or interesting point detection. Their are several algorithm, like Harris corner detection, Susan corner detection, LoG, DoG, and DoH feature detection and several others. One that catch my attention was Harris corner detection and FREAK (Fast Retina Keypoint) detection.

This algorithm detect corner something like this

Cool, knowing algorithm name is one thing, and implementing is other. I was searching for .NET implementation of these corner detection algorithm, where I can to know about two beautiful open source libraries / framework.

  1. Accord Framework
  2. Emgu CV

This libraries have set of functions which you need to for Computer Vision, Image Processing, Machine Learning, Artificial Neuron networking, etc.

Correlation base matching

Now, I know, how to detect corner, I was stump with another problem, one I detect corners, from Image, how do I compare them. I mean, corners are just set of (x,y) points in space. Now, I need to detect this points in other image. You would say, just compare their x coordinate and y coordinate. Cool, but remember rotation and affine transformation in first post, it means one image points x,y no longer remain the same in other image.

image

Phew, it gonna we tough than I though. Here, come another algorithm, know as correlation based matching. It use matrices, and whole lots of math to find the distance between points and matching points. Accord framework has the Correlation matching algorithm, which approximately find the matching points.

   1: CorrelationMatching matcher = new CorrelationMatching(9, img1, img2);

   2: IntPoint[][] matches = matcher.Match(harrisPoints1, harrisPoints2);

It detect matching points, somewhat similar to image below

Figure: Top Left: Corners detect in Image A, Top Right: Corners detect in ImageB. Bottom Left: Correlation mapping of points in both image. Bottom Right: Remain after RANSAC estimation.

Cools, now, I know, how to match points in one image with another. But, how do I find set of matching points with thousand of images in database. Stump again!

Another idea struck, I need to say to describe the points detect in image. If the description of the point match with any of the description of points in thousand image, it means, it match. Wow! I am genius.

SIFT descriptors extractor

But, wait, how do describe the points? Few googling, and found out computer vision scientist David Lowe, though about this problem long back in 2004. He came up with SIFT algorithm (Scale-invariant feature transform), which works on LoG, DoG, and DoH feature detection, detects the interesting keypoints (corner), and describe it with arrays of double of 128 bit length.

image

Now, Image comparison problem has been reduce to math, where you need to compare one matrix with other

image 

Problem: Need for speed

Cool, but this there is problem. First, the SIFT detection of points and descriptors is not fast, I need some algorithm, which is fast enough to detect points and descriptors of thousand of images in matter of minutes.

Second, method to compare fast with the thousands of images in database

image

Other descriptor extractor

Lets, take problem, one by one. SIFT fast alternative? There is another algorithm, SURF (Speeded Up Robust Features), which is fast compare to SIFT and in default mode, computes descriptor of 64 bit length. But, there is another problem with SIFT and SURF, both are patented and can’t be use in commercial application without loyalty fee to patent holder. Life in never easy. Quest of open source algorithms continues. 

There are several other algorithm, which are patent free, and can be use in commercial application, like BRISK, FREAK and ORB. You can get their .NET implementation in EMGU CV library.

Cool, now, I have way to reduce image in set of jagged array (read Matrix) and now, I can find a particular feature in other images, irrespective to lighting, slight rotation and affine transformation. But, second problem till remain, how to compare descriptors to sets of hundred millions even billion (descriptors of thousand of images) of descriptors.

image

This can be tackle in two ways.

Trees and Nearest Neighbor

Trees – Answer like in tree structure. There need to create the tree data structure which stores the descriptors in such fashion, that it can be fast matched when searched in tree. Something similar to binary tree, but here challenges are more. In order to compare two descriptors say [d1,d2,d3,…, d64] which is one row in matrix to another row is via [n1,n2,n3…..n64] is done via different distance metrics formula like Euclidian, Manhattan or Minkowski. You arrange the descriptors in such way that it compliment the distance with each and other in the tree.

R-Tree, BK-Tree, Cover tree, MVP tree are possible solution.I had gone with k-d tree. You can kind the implementation of the k-d tree in the Accord Framework libraries. Once, you arrange the descriptors in the k-d tree, you perform the k-nearest neighbor to find the descriptor in skyscraper like matrix. 

In my personal experience, I found k-d tree do not perform well in higher dimension 64 and 128. Accord framework had the Approximate Nearest Neighbor implementation, but the result were Random. I found FLANN library, which there in Emgu CV as Flann Index, give better result. FLANN uses k-d forest to perform approximate nearest neighbor search.

If all this stuff makes you dizzy, no problem. You can checkout my implementation on GitHub where you can play around the application to understand all the Jargons which I have thrown here. 

Till, problem lies with the volume of descriptors, in my trial, 1000 images, which cumulates to approx 40 MB in size had SURF descriptors 120 MB in size. Phew, three times more. What if, I had ten thousands of images, descriptors size would result in GBs. Maintaining such large dataset of descriptors is problem, even on Hard disk, plus loading into memory during search is big pain.

Bag of Visual Words

Bag of Visual Words – In Bag of Visual Words, you read sets of images, and calculate mean sets of feature out of it. Now, you get histogram of mean visual descriptors which appear number of times in image. It will get the array of fix length which describes the image. This is brief description of bag of words. If you don’t understand the bit, you can look for detail explanation on this blog, this one or google out for more post which describes the Bag of Visual Words.

Plus, applied the document and query weighting scheme to query out the images. You can find more about this in the implementation of term frequency, document frequency and normalization. There is good paper on the why to use of weighting scheme of vector space (Bag of visual words, in our case)

Now, we have reduce to statistical problem, and compare thousands, even millions images in database. Here, each image is just the representation of visual bag of words, and base on the query image, which again is converted into visual bag of words, and let the similar ones for it.

Enough Words, Action

You would be thinking, enough of gyan, where the implementation/code of all these? That’s my plan for next post. Stay tuned.

SNAGHTML1bd4793

Love affair with CBIR – Part 2

Love affair with CBIR (Content base image retrieval system) Part 1

Update:
I never figure out, this would be series of article, here are links to rest of it

Love affair with CBIR – Part 1

Love affair with CBIR – Part 2 (The one you are reading)

Love affair with CBIR – Part 3

Love affair with CBIR – Part 4

This is continuation of the first post, if you not getting the context, read this first post.

De-duplication

In my last blog, I have describe the different solution angles for this, De-duplication and similar. Let’s start with de-duplication, Identical images with different names, different format, but same resolution, same scale, no rotation, no affine transformation, etc.

IdenticalImageDifferentSize

How do you start, do what rest of the people do, google for the library in .NET, which already do this. .NET was the prefer choice of framework, C# was my first love. I was much interest in libraries written for .NET rather than c++, python or MATLAB.

First idea to solve this problem was to do pixel by pixel comparison of both the image, if any of the pixels are different, I can get the percentage of the number of different pixels and then get the difference

Code for this, what describe on the below site

http://blogs.msdn.com/b/domgreen/archive/2009/09/06/comparing-two-images-in-c.aspx

http://www.codeproject.com/Articles/9299/Comparing-Images-using-GDI

This methods compare pixel by pixel or compare the byte hash to determine the difference, can detect the exact duplicate. But, I wasn’t in for this approaches, as it has two problems.

Problems

  1. They are slow for what I need. I need to compare 10000’s of images from database, not just two images, and it would take while, to compare them all
  2. Second, they aren’t fault tolerant, like same image with difference in jpeg compression result will result in false match. I want match to compensates this, and my need wasn’t to match pixel to pixel, just visually similar

article_4_2006_colbert_4.gif

Now, next milestone is to look for method/algorithm to address my problems

Another idea was to convert image into some kind of signature/fingerprint and compare the signatures with 10000’s of signatures to see if the image matches.

image

This would be faster, as it just need to compute few bytes compare to full images. But, now, challenge was to identify slight difference in image by signature and classify them as same or different.

I wasn’t the first guy to have this idea, and I found out after few googling, there is already an algorithm which does this. It known as “Perceptual hashing”. There is post on the hackerfactor, which describes the algorithm to phashing.

Now, I need to find the same implementation in C#. I found, there is guy name, ‘David Oftedal’ who posted the C# implementation at image hash. Sigh, this link doesn’t work anymore. No problem, I have got the implementation code in my ImageCompare application on GitHub. You can get it from there.

I got the reference to another library ‘ImageMagick.NET’, which is the .NET wrapper around the C++ ImageMagick. It has various algorithm to compare image, here below is the List

  1. Absolute
  2. Fuzz
  3. MeanAbsolute
  4. MeanErrorPerPixel
  5. MeanSquared
  6. NormalizedCrossCorrelation
  7. PeakAbsolute
  8. PeakSignalToNoiseRatio
  9. PerceptualHash,
  10. RootMeanSquared

You can compare the image with following code

   1: MagickImage orgImage = new MagickImage(orgImagePath);

   2: MagickImage dupImage = new MagickImage(dupImagePath);

   3: var percentage = orgImage.Compare(dupImage, imageAlgo);

See here, at number nine, there is PerceptualHash algorithm, but I found, when compare with David Oftedal implementation, it doesn’t give good result.

Further searching, I found the another article by ‘Jakob XnaFan Krarup’ on the code project. He compares the image with RGB histogram and get the difference using Bhattacharyya co-oefficient distance formula. You can read his article for more information about this methods.

So far, I have describe three implementation in this post

  1. David Oftedal implementation of pHash
  2. Perceptual Hash implementation of the ImageMagick library, along with other algorithms
  3. Jakob XnaFan Krarup implementation of perceptual hash at code project.

I decide to play around with this algorithm to test the deep waters myself. For this, I had Image compare utility

image

You can get the code from this on my GitHub account https://github.com/sbrakl/ImageCompare

ImageSet

To compare, I have created sample images from flicker.

  1. Original – 200px by 120px resolution picture
  2. Original2 – Copy of Original
  3. Resize – 400px by 240px resolution picture
  4. FormatType – Change the type from jpg to png keeping resolution same

Rests images are self explanatory

image

This utility has Main form, as you can see in the below Image.

SNAGHTML12729a4

To use it, select the first image, second image and click Image Compare. You get the percentage difference in the textbox.

If you need Grid view of all the percentage difference, open Data Compare form, select the algorithm, and get the percentage difference of all the images.

SNAGHTML129fc7d

You can download the code and play around to understand details of each compare algorithm.

David Oftedal pHash Algorithm

David Oftedal pHash gives you the unsign long integer, which you can calculate and store in the database for all the images,

To compare all image to new image, just calculate the phash for new image and compare it all the phashes in table, that’s it. If the hash matches, it exactly the same image. Simple and Superfast method.

SNAGHTML1344a5e

If you need to get the similarity distance function, when you pass the hash, you get the difference in percentage, 100% mean identical, 0 mean completely different.

   1: public static double Similarity(ulong hash1, ulong hash2)

   2: {

   3:     return ((64 - BitCount(hash1 ^ hash2)) * 100) / 64.0;

   4: }

Jakob XnaFan Krarup Implementation

Jakob KnaFan Krarup use RGB histogram approach to compare. It calculates the histogram of grayscale hue values, and compare them

image

You can store normalize histogram either in SQL table or NOSQL database and compare using bhattacharya coefficient  distance formula to get the percentage difference.

Using either of the two algorithm, I have address the problems state above. But still, if you need to detect similar images with invariance like different lighting, rotation, affine transformation, scale etc, these algorithm won’t suit. Even, for simple problem, like user upload the part image of the whole image present, these algorithms would consider it to be different image.

For visual similar matching, you need some point to point matching algorithm like below. Hang on to my third post, which will unleash the power of point to point compare.

box_in_scenebox

SURFExample

Love affair with CBIR – Part 1

Love affair with CBIR (Content base image retrieval system) Part 1

Update:
I never figure out, this would be series of article, here are links to rest of it

Love affair with CBIR – Part 1 (The one you are reading)

Love affair with CBIR – Part 2

Love affair with CBIR – Part 3

Love affair with CBIR – Part 4

Recently, I was architecting for the project for one of the American client, and they had the one simple requirement, “Ability to detect duplicate images from set of image files during upload”.

I was thinking, common, this is simple. When the user uploads an image to website, just check if the image already exists, that all. What the big deal in it. If this was some string, it would be

   1: SELECT * FROM TABLEA WHERE Field1 = ‘String’

Same thing, I need to do with Image. It here, I was stump! In MS SQL, I can’t compare images, as image if store in table, would be the blob object. I can’t just compare the byte size of two blob object, as there would be possibility, where two different images could have same byte size. See below example

SimilarImageSameSize

Now, the challenges start increasing, what if there are two identical image, but of different file type. Say, one is jpg, other is png

IdenticalImageDifferentSize

There are other challenges too, invariance

Different type of invariance

  1. Illumination
     
    image image
  2. Scale
    image image
  3. Rotation
    image image
  4. Affine
    image image
  5. Full Perspective
    image image

So now, things started to complicate, there is no simple way to compare two images, as you do for string, integer and any primitive type. This was the start of love affair. It mystical, it esoteric and magical, you don’t know ‘HOW’ to get it, just follow the with ‘WHAT’

For the solution, there are two angle to it, De-duplication and similar. With this, there are two new terms, 

  1. De-duplication: Identical images, multiple copies of the same photo in with different names, different format. This is simple and everybody understands this.
  2. Similar Image: This is important one, and this defines the game, what do you call similar? Is same color similar for you? Is same shape similar for you? Is same texture similar for you? Or is mix bag of all. For my love affair, it’s invariance which was shown above, which are visually appears to be same, but different exposure plus slight difference in rotation, angle and scale.

In the next post, we will see how we will address each of it.