1 minute read

I've been playing around with VLFeat, and specifically SIFT to compare images using sift feature extraction. A while back I was looking at comparing files and images using sdhash and ssdeep, and they did not work well with images (which completely makes sense sense!).

So I was looking at some computer vision implementations, and found programming computer vision with python. From a basic example in the book, we can now visually compare similarity on the kitty corpus used last time.

5a762d8cdf4f1beae208595e79990a01 /corpus/kitty_hex.jpg
1704cd46c5c0f994278769e533015525 /corpus/kitty_sm.jpg
bcbed42be68cd81b4d903d487d19d790 /corpus/kitty_text.jpg
6d5663de34cd53e900d486a2c3b811fd /corpus/kitty_orig.jpg
4312932e8b91b301c5f33872e0b9ad98 /corpus/kitty_whirl.jpg

comparing corpus/kitty_text.jpg corpus/kitty_sm.jpg number of matches = 107
comparing corpus/kitty_text.jpg corpus/kitty_orig.jpg number of matches = 375
comparing corpus/kitty_text.jpg corpus/kitty_hex.jpg number of matches = 375
comparing corpus/kitty_text.jpg corpus/kitty_whirl.jpg number of matches = 358
comparing corpus/kitty_sm.jpg corpus/kitty_orig.jpg number of matches = 108
comparing corpus/kitty_sm.jpg corpus/kitty_hex.jpg number of matches = 108
comparing corpus/kitty_sm.jpg corpus/kitty_whirl.jpg number of matches = 88
comparing corpus/kitty_orig.jpg corpus/kitty_hex.jpg number of matches = 389
comparing corpus/kitty_orig.jpg corpus/kitty_whirl.jpg number of matches = 343
comparing corpus/kitty_hex.jpg corpus/kitty_whirl.jpg number of matches = 343

Just extracting SIFT features and comparing which features match we can do pretty well at identifying similar images. As a reference, see an unrelated image compared to a kitty image:

comparing corpus/kitty_text.jpg corpus/cheese.jpg number of matches = 0

It is interesting to note that even if the image is modified, the swirled face for example, similarity to the original image is still relatively high. The lowest performance was seen when the image size was reduced, which is probably because fewer features would be extracted from the smaller image. Note in this experiment the only image pre-processing we are doing is conversion to gray scale. We are not resizing, doing PCA or anything like that.

I also want to point out something:
comparing corpus/kitty_text.jpg corpus/cheese_text.jpg number of matches = 2

In this case there are two different images that have a small bit of text inserted into the image. Feature detection was able to determine some similarity (the text looked the same) on completely different images. This could potentially be used to determine if a string or watermark was added to a group of pictures in a directory.