The following experiment is conducted to determine if md5sum, md5deep or openssl md5 hash calculations are faster than the others.
Test 1: A directory of test files consisting of disk images and extracted Windows Registry files will be scanned with the command 'find -type f | xargs -n 1 md5program'. The output of this command will be fed into each program, respectively. The time for the entire process to finish will be tracked with the time command. Xargs is being used to feed each line of output to the md5 program.
Test 1.1: After all programs have finished, the computer will be restarted, and the order the programs are ran in will be changed. For example, if md5sum was the first program in the first round, it will be the second program in the second round. This is to attempt to account for any caching that takes place in the operating system.
The result will be 9 timed runs for calculating the md5 sum of all files in a directory.
Test 2: Big files - In test 1, multiple files of different sizes will be continuously fed to the hashing program. In test 2, a single 10 GB disk image will be given to each program and timed.
Intel(R) Core(TM) i7-3537U CPU @ 2.00GHz (4 cores)
8 GB RAM
Timing cached reads: 8892.40 MB/sec
Timing buffered disk reads: 403.98 MB/sec
|md5sum, run 1||md5sum, run 2||md5sum, run 3|
|real 3m46.457s||real 3m46.117s||real 3m59.198s|
|user 0m42.183s||user 0m39.254s||user 0m42.235s|
|sys 3m3.595s||sys 3m6.300s||sys 3m16.096s|
|md5deep, run 1||md5deep, run 2||md5deep, run 3|
|real 3m57.595s||real 3m57.666s||real 4m2.255s|
|user 0m43.743s||user 0m43.567s||user 0m44.551s|
|sys 3m17.552s||sys 3m17.192s||sys 3m22.881s|
|openssl md5, run 1||openssl md5, run 2||openssl md5, run 3|
|real 3m48.130s||real 3m43.142s||real 3m50.619s|
|user 0m37.878s||user 0m37.462s||user 0m38.558s|
|sys 3m9.436s||sys 3m5.112s||sys 3m11.136s|
Overall processor time:
Run 1, md5sum 3m45.77s; md5deep 4m01.29s; openssl 3m47.32s.
Run 2: md5sum 3m45.55s; md5deep 4m00.76s; openssl 3m42.57s.
Run 3: md5sum 3m58.34s; md5deep 4m07.43s; openssl 3m49.7s.
Average (rounded to second):
Test 1 Conclusions
While the 'real' time may be an interesting factor for investigators, what we are more interested in is the time that the program was using the processor. In this case 'user' is the time taken by the program in user mode, and sys is time in kernel mode. The basic idea is that we want a program that takes the least amount of time on the processor to do the same amount of work.
Note: from the data it appears that another process, perhaps an OS update, was taking place that affected all the programs.
Based on the average of the three runs, it appears that openssl is slightly faster, followed by md5sum and then md5deep.
It should be noted, however, that md5deep is not really being used in the way it was designed. For example, md5sum does not include a recursive mode, where md5deep does. If we run md5deep using its recursive mode - instead of find and xargs - then the time is actually slightly better than the others:
md5deep recursive - restricted to 1 thread
So, in conclusion, if you are are feeding a list of files into an md5 hash program, openssl appears to be a slightly better choice over md5sum and md5deep. However, if you can choose how the file list is ingested, md5deep is probably a better choice because of speed and available features.
Test 2: Testing the time for each program to hash a 10 GB disk image. This test will not use find and xargs - it will use the program directly. md5deep will be restricted to 1 thread.
|real 1m44.558s||real 1m49.491s||real 1m43.475s|
|user 0m19.369s||user 0m20.421s||user 0m18.233s|
|sys 1m24.877s||sys 1m30.534s||sys 1m24.889s|
For a single large file, it appears that openssl is also the fastest, followed by md5sum and then md5deep.
Please note: all of these tests are, at best, quite shallow. A proper testing environment with many more runs should be conducted.
Bonus observation: As set, xargs will send 1 line to the hashing program and spawn 1 process. This process seems to switch between processors, even while the same hash is calculating.