3 minute read

The following experiment is conducted to determine if md5sum, md5deep or openssl md5 hash calculations are faster than the others.

Methodology:
Test 1: A directory of test files consisting of disk images and extracted Windows Registry files will be scanned with the command 'find -type f | xargs -n 1 md5program'. The output of this command will be fed into each program, respectively. The time for the entire process to finish will be tracked with the time command. Xargs is being used to feed each line of output to the md5 program.
Test 1.1: After all programs have finished, the computer will be restarted, and the order the programs are ran in will be changed. For example, if md5sum was the first program in the first round, it will be the second program in the second round. This is to attempt to account for any caching that takes place in the operating system.

The result will be 9 timed runs for calculating the md5 sum of all files in a directory.

Test 2: Big files - In test 1, multiple files of different sizes will be continuously fed to the hashing program. In test 2, a single 10 GB disk image will be given to each program and timed.

Test machine:
Intel(R) Core(TM) i7-3537U CPU @ 2.00GHz (4 cores)
8 GB RAM
Timing cached reads: 8892.40 MB/sec
Timing buffered disk reads: 403.98 MB/sec

Results:
md5sum, run 1md5sum, run 2md5sum, run 3
real 3m46.457sreal 3m46.117sreal 3m59.198s
user 0m42.183suser 0m39.254suser 0m42.235s
sys 3m3.595ssys 3m6.300ssys 3m16.096s
md5deep, run 1md5deep, run 2md5deep, run 3
real 3m57.595sreal 3m57.666sreal 4m2.255s
user 0m43.743suser 0m43.567suser 0m44.551s
sys 3m17.552ssys 3m17.192ssys 3m22.881s
openssl md5, run 1openssl md5, run 2openssl md5, run 3
real 3m48.130sreal 3m43.142sreal 3m50.619s
user 0m37.878suser 0m37.462suser 0m38.558s
sys 3m9.436ssys 3m5.112ssys 3m11.136s

Overall processor time:
Run 1, md5sum 3m45.77s; md5deep 4m01.29s; openssl 3m47.32s.
Run 2: md5sum 3m45.55s; md5deep 4m00.76s; openssl 3m42.57s.
Run 3: md5sum 3m58.34s; md5deep 4m07.43s; openssl 3m49.7s.

Average (rounded to second):
md5sum: 3m50s
md5deep: 4m03s
openssl: 3m46s

Test 1 Conclusions
While the 'real' time may be an interesting factor for investigators, what we are more interested in is the time that the program was using the processor. In this case 'user' is the time taken by the program in user mode, and sys is time in kernel mode. The basic idea is that we want a program that takes the least amount of time on the processor to do the same amount of work.

Note: from the data it appears that another process, perhaps an OS update, was taking place that affected all the programs.

Based on the average of the three runs, it appears that openssl is slightly faster, followed by md5sum and then md5deep.

It should be noted, however, that md5deep is not really being used in the way it was designed. For example, md5sum does not include a recursive mode, where md5deep does. If we run md5deep using its recursive mode - instead of find and xargs - then the time is actually slightly better than the others:

md5deep recursive - restricted to 1 thread
real 3m43.963s
user 0m41.327s
sys 3m2.103s

md5deep: 3m43s

So, in conclusion, if you are are feeding a list of files into an md5 hash program, openssl appears to be a slightly better choice over md5sum and md5deep. However, if you can choose how the file list is ingested, md5deep is probably a better choice because of speed and available features.

Test 2: Testing the time for each program to hash a 10 GB disk image. This test will not use find and xargs - it will use the program directly. md5deep will be restricted to 1 thread.

md5summd5deepopenssl md5
real 1m44.558sreal 1m49.491sreal 1m43.475s
user 0m19.369suser 0m20.421suser 0m18.233s
sys 1m24.877ssys 1m30.534ssys 1m24.889s

md5sum: 1m44s
md5deep: 1m51s
openssl: 1m43s

For a single large file, it appears that openssl is also the fastest, followed by md5sum and then md5deep.

Please note: all of these tests are, at best, quite shallow. A proper testing environment with many more runs should be conducted.


Bonus observation: As set, xargs will send 1 line to the hashing program and spawn 1 process. This process seems to switch between processors, even while the same hash is calculating.