After the last SDHASH test showed that fuzzy hashing on multiple sizes of the same picture files did not appear to work well. I decided to try the same size image with slight modifications like one might see in the real world. So, again there is an original image, the same image modified with text added, and the same image modified with a swirl pattern on the face.
|
Kitty Orig: 75K MD5 6d5663de34cd53e900d486a2c3b811fd |
|
Kitty Text: 82K MD5 bcbed42be68cd81b4d903d487d19d790 |
|
Kitty whirl: 92K MD5 4312932e8b91b301c5f33872e0b9ad98 |
On this test, I hypothesized that there would be a high match between the original kitty, and the text kitty, and a low, if any, match between the original kitty and the whirl kitty. My reasoning for this is because I think the features of the data would be similar enough - excluding the text area.
Unfortunately, I was wrong. sdhash did not find similarity between any of the pictures (ssdeep did not either).
$sdhash -g *
kitty_orig.jpeg|kitty_text.jpeg|000
kitty_orig.jpeg|kitty_whirl.jpeg|000
kitty_text.jpeg|kitty_whirl.jpeg|000
So, both sdhash and ssdeep did not detect any similarity between the picture files. Perhaps these tools are not suitable for picture file analysis, or a replacement for standard hashes like MD5, etc. when looking for like pictures.