md5sum vs. md5deep vs. openssl md5: MD5 calculation speed test

The following experiment is conducted to determine if md5sum, md5deep or openssl md5 hash calculations are faster than the others.

Methodology:
<div>Test 1: A directory of test files consisting of disk images and extracted Windows Registry files will be scanned with the command ‘find -type f | xargs -n 1 md5program’. The output of this command will be fed into each program, respectively. The time for the entire process to finish will be tracked with the time command. Xargs is being used to feed each line of output to the md5 program.</div><div>Test 1.1: After all programs have finished, the computer will be restarted, and the order the programs are ran in will be changed. For example, if md5sum was the first program in the first round, it will be the second program in the second round. This is to attempt to account for any caching that takes place in the operating system.</div><div>
</div><div>The result will be 9 timed runs for calculating the md5 sum of all files in a directory.</div><div>
</div><div>Test 2: Big files - In test 1, multiple files of different sizes will be continuously fed to the hashing program. In test 2, a single 10 GB disk image will be given to each program and timed.</div><div>
</div><div>Test machine:</div><div>Intel(R) Core(TM) i7-3537U CPU @ 2.00GHz (4 cores)</div><div>8 GB RAM</div><div>Timing cached reads: 8892.40 MB/sec</div><div><div>Timing buffered disk reads: 403.98 MB/sec</div></div><div>
</div><div>Results:</div><div><table border="1" style="text-align: center;"><tbody><tr><td>md5sum, run 1</td><td>md5sum, run 2</td><td>md5sum, run 3</td></tr><tr><td>real 3m46.457s</td><td>real 3m46.117s</td><td>real 3m59.198s</td></tr><tr><td>user 0m42.183s</td><td>user 0m39.254s</td><td>user 0m42.235s</td></tr><tr><td>sys 3m3.595s</td><td>sys 3m6.300s</td><td>sys 3m16.096s</td></tr><tr><td>md5deep, run 1</td><td>md5deep, run 2</td><td>md5deep, run 3</td></tr><tr><td>real 3m57.595s</td><td>real 3m57.666s</td><td>real 4m2.255s</td></tr><tr><td>user 0m43.743s</td><td>user 0m43.567s</td><td>user 0m44.551s</td></tr><tr><td>sys 3m17.552s</td><td>sys 3m17.192s</td><td>sys 3m22.881s</td></tr><tr><td>openssl md5, run 1</td><td>openssl md5, run 2</td><td>openssl md5, run 3</td></tr><tr><td>real 3m48.130s</td><td>real 3m43.142s</td><td>real 3m50.619s</td></tr><tr><td>user 0m37.878s</td><td>user 0m37.462s</td><td>user 0m38.558s</td></tr><tr><td>sys 3m9.436s</td><td>sys 3m5.112s</td><td>sys 3m11.136s</td></tr></tbody></table>
Overall processor time:
Run 1, md5sum 3m45.77s; md5deep 4m01.29s; openssl 3m47.32s.
Run 2: md5sum 3m45.55s; md5deep 4m00.76s; openssl 3m42.57s.
Run 3: md5sum 3m58.34s; md5deep 4m07.43s; openssl 3m49.7s.

Average (rounded to second):
md5sum: 3m50s
md5deep: 4m03s
openssl: 3m46s

Test 1 Conclusions
While the ‘real’ time may be an interesting factor for investigators, what we are more interested in is the time that the program was using the processor. In this case ‘user’ is the time taken by the program in user mode, and sys is time in kernel mode. The basic idea is that we want a program that takes the least amount of time on the processor to do the same amount of work.

Note: from the data it appears that another process, perhaps an OS update, was taking place that affected all the programs.

Based on the average of the three runs, it appears that openssl is slightly faster, followed by md5sum and then md5deep.

It should be noted, however, that md5deep is not really being used in the way it was designed. For example, md5sum does not include a recursive mode, where md5deep does. If we run md5deep using its recursive mode - instead of find and xargs - then the time is actually slightly better than the others:

md5deep recursive - restricted to 1 thread
real 3m43.963s
user 0m41.327s
sys 3m2.103s

md5deep: 3m43s

So, in conclusion, if you are are feeding a list of files into an md5 hash program, openssl appears to be a slightly better choice over md5sum and md5deep. However, if you can choose how the file list is ingested, md5deep is probably a better choice because of speed and available features.

Test 2: Testing the time for each program to hash a 10 GB disk image. This test will not use find and xargs - it will use the program directly. md5deep will be restricted to 1 thread.

<table border="1" style="text-align: center;"><tbody><tr><td>md5sum</td><td>md5deep</td><td>openssl md5</td></tr><tr><td>real 1m44.558s</td><td>real 1m49.491s</td><td>real 1m43.475s</td></tr><tr><td>user 0m19.369s</td><td>user 0m20.421s</td><td>user 0m18.233s</td></tr><tr><td>sys 1m24.877s</td><td>sys 1m30.534s</td><td>sys 1m24.889s</td></tr></tbody></table>
md5sum: 1m44s
md5deep: 1m51s
openssl: 1m43s

For a single large file, it appears that openssl is also the fastest, followed by md5sum and then md5deep.

Please note: all of these tests are, at best, quite shallow. A proper testing environment with many more runs should be conducted.
<hr />
Bonus observation: As set, xargs will send 1 line to the hashing program and spawn 1 process. This process seems to switch between processors, even while the same hash is calculating.

</div><div class="separator" style="clear: both; text-align: center;"></div>

3 min read

[How to] Install pHash on Ubuntu

pHash is an open source software library released under the GPLv3 license that implements several perceptual hashing algorithms, and provides a C-like API to use those functions in your own programs. pHash itself is written in C++.

From pHash.org: A perceptual hash is a fingerprint of a multimedia file derived from various features from its content. Unlike cryptographic hash functions which rely on the avalanche effect of small changes in input leading to drastic changes in the output, perceptual hashes are “close” to one another if the features are similar.

Ubuntu has a package libphash0 and libphash0-dev, but for this tutorial we will be installing from source.

Installation of pHash on Ubuntu (from source). This tutorial assumes you have build tools, like build-essential, installed.

Packages: apt-get install libavformat-dev libmpg123-dev libsamplerate-dev libsndfile-dev cimg-dev libavcodec-dev ffmpeg libswscale-dev

Next download pHash source from: http://phash.org/download/
Version at the time of this writing: 0.9.6

Extract the tar: tar -xvf pHash-0.9.6.tar.gz
Standard configure and make:  ./configure && make && make install

Remember this is just the library. For PHP bindings try this: http://useranswer.com/answer/compile-phash-on-centos-php-extension/

~1 min read

[Webinar] Malware Triage and Analysis with AccessData

When: Tuesday, May 28, 2013 10:00 AM - 12:00 PM PDT
Speaker: Chris Sanft
Registration Form

Attend this session to learn about the industry’s first automated malware triage and analysis module, Cerberus. This is the first solution of its kind that allows you to detect and triage malicious code without using signatures or running it in a sandbox. The first step towards automated reverse engineering, Cerberus provides threat scores and disassembly analysis to determine both the behavior and intent of suspect binaries. During this demonstration, you will learn how Cerberus works and how to use it to gain immediate actionable intelligence without waiting for days or even weeks for the results of traditional analysis.

~1 min read

4th Annual The Sleuth Kit and Open Source Digital Forensics Conference

When: 4 Nov. 2013

Where: Chantilly, Virginia, U.S.A.

The Sleuth Kit and Digital Forensics Conference 2013 banner

<blockquote>The 4th Annual Open Source Digital Forensics Conference will be held on November 5, 2013 in Chantilly, VA. We’ll be sending out updates via e-mail and Twitter at #osdfcon.
The conference will be attended by digital forensic investigators and developers. This conference sold out quickly last year, and registration closed early due to fire code occupancy limits. This event is a unique opportunity to make investigators aware of your tools, get feedback from users, meet fellow developers, and help direct the future of open source digital forensics software.</blockquote>Unfortunately, I missed out on the call for papers, but this conference is very interesting and a lot of fun. Highly-recommended!

~1 min read

The 7th International Symposium on Digital Forensics and Information Security (DFIS-13)

When: September 4-6, 2013
Where: Gwangju, South Korea
More information can be found here

<blockquote class="tr_bq">Digital Forensics and Information Security (DFIS) are key elements for the next generation communication and networking environments where all applications and services are focused on users. In addition, the DFIS has emerged rapidly an exciting new paradigm to provide reliable and comfortable life services. Furthermore, the benefits of DFIS will only be realized if security issues can be appropriately addressed. Specially, forensics for DFIS is very important in the security fields.
This workshop is intended to foster state-of-the-art research Digital Forensics and Information Security in the area of DFIS including information and communication technologies, law, social sciences and business administration. The DFIS-13 is the next event in a series of highly successful the 6th International Symposium on Digital Forensics and Information Security (DFIS-12, Canada, 2012), which was the next extended event of the International Workshop on Forensics for Future Generation Communication environments : F2GC-11 (Loutraki, Greece), F2GC-10 (Cebu, Philippines), F2GC-09 (Jeju , Korea), F2GC-08 (Sanya, China), F2GC-09 (Jeju , Korea). Papers presented in the DFIS-13 will be published by Springer Lecture Notes in Electrical Engineering (Indexed by EI & SCOPUS).</blockquote>

~1 min read

Global Essay Competition for Seoul Conference on Cyberspace 2013

From the Korean Ministry of Foreign Affairs:
1,500 to 2,000 word essay
Deadline: 12 July, 2013
Notification: 'end of July'

Global Essay Competition for Seoul Conference on Cyberspace 2013

Global Essay Competition for Seoul Conference on Cyberspace 2013 is a special competition, hosted by Korea’s Ministry of Foreign Affairs(MOFA), Korea Information Society Development Institute (KISDI), and Center for International Studies of Yonsei University for youth aged from 18 to 32 across the world. The event forms a crucial part of the Seoul Conference on Cyberspace 2013 (the SeoulCyber 2013); the awards ceremony will be held at the SeoulCyber 2013 Youth Forum scheduled on 2 September 2013 at the Yonsei University, Korea. The aims of the competition are twofold: firstly, to encourage ambitious and talented students all over the world to demonstrate their own ideas on a set of ongoing cyberspace issues; and secondly, to foster their capacities on the purpose of coping with a wide variety of (inter)national cyber threats.

Candidates are invited to submit an essay of between 1,500 and 2,000 words (excluding footnotes and references) on one of the three questions below.


<ol><li>Internet as an engine for economic growth and social/cultural benefit</li><li>Securing yourself from cyber threats</li><li>Towards a shared prosperity (capacity building)</li></ol>
Submissions must be received by the deadline of 12 July (email entry to [email protected]). Those intending to apply should also complete an application form which is to be included with the submission. Results will be announced by the end of July.

1 min read