InfoSecurity Russia 2012


Last week, Pavel and I gave an invited talk at InfoSecurity Russia 2012. From Digital FIRE:
<blockquote class="tr_bq">Our talk explored the issues of digital forensics in the cloud environment. The first part of the talk introduced the concepts of cyber crime investigations and the challenges faced by the digital forensic practitioners. The second part of the talk explored investigative difficulties posed by cloud computing. A possible approach to dealing with some of these difficulties based on I-STRIDE methodology was then outlined.</blockquote><div class="separator" style="clear: both; text-align: center;"></div>
Discussed security challenges with Cloud environments are further elaborated on in our chapter “Digital Forensics and Cloud Computing” that can be found in Cybercrime and Cloud Forensics: Applications for Investigation Processes. Some investigation challenges were introduced based on the work of our friends at CloudForensicsResearch.org, with a few of my own thoughts added. Finally, a very quick overview of the Investigation STRIDE (I-STRIDE) model was given to attempt to help investigators and first responders identify potential sources of evidence, their jurisdiction, and other factors that may effect the admissibility of evidence extracted from a Cloud Service Provider.

http://infosecurityrussia.ru/speakers/
Image from Koraxdc

~1 min read

Creating and Configuring a Large Redundant Storage Array for a Network Share in Ubuntu using MDADM

We had a hardware RAID card that worked well in Windows, but was giving some issues in Linux (Ubuntu 12.04). So, we decided to try to setup a software array using mdadm. This is how we did it.

First, make sure your hardware RAID card has "non-RAID" mode. Basically that it lets each of the attached drives show up as a single drive. Ensure this is enabled. On some cards you will have to flash the BIOS with a non-RAID version.
  • Do not create a hardware RAID array using the RAID configuration menu
  • Make sure no drives are assigned to a RAID array
  • Install the newest version of Ubuntu Server
    • When asked for partitioning information
    • Select manual partitioning
    • Select "Create Software RAID”
    • “Create MD device”
    • Select RAID5   (we are using RAID5)
    • Active devices = #   (where # is the number of drives you want in the array)
    • Spare devices = #
    • Select all the drives you want in the array
    • Select OK

After the array is created our new device has about 21TB. In versions of Ubuntu before 12.04 it was difficult to create a partition using the whole 21TB, but now you should be able to do it from the install menu.
<ul><li>Create a partition on the newly created device (usually md0)</li><li>Format as ext4</li><li>Save and continue with install per usual</li><ul><li>You might want to select “ssh” from the package selection</li></ul></ul>Once install is done, and you boot into the OS:
<ul><li>Make sure array device has been created</li><ul><li>Sudo fdisk –l</li><li>Look for /dev/md0</li></ul><li>Check status of the software array</li><ul><li>Sudo mdadm –detail /dev/md0</li><li>If the status of the array is “building” or “syncing”, let the process finish (may take several hours)</li></ul><li>Create a mount point for the array</li><ul><li>Sudo mkdir /media/RAIDStorage</li></ul><li>Modify fstab to mount partition on boot</li><ul><li>Sudo nano /etc/fstab</li><li>Add a new line:</li><ul><li>/dev/md0 /media/RAIDStorage ext4 defaults 0 0</li></ul><li>Save and exit</li><li>Remount</li><ul><li>Sudo mount –a</li></ul><li>Check device was mounted to /media/RAIDStorage</li><ul><li>Sudo mount | grep RAIDStorage</li></ul></ul><li>Share RAIDStorage with NFS</li><ul><li>Sudo apt-get install nfs-kernel-server</li></ul><li>Edit exports file</li><ul><li>Sudo nano /etc/exports</li><li>Add line</li><ul><li>/media/Storage/Case_Data 10.1.1.0/24(rw,insecure,async,no_subtree_check)</li></ul><li>Save and exit</li><ul><li>/etc/init.d/nfs-kernel-server restart</li></ul></ul></ul>In this case, permissions are NOT being set up on NFS. If you need a more secure environment, make sure you set it up.

Also, we are using ‘async’ instead of ‘sync’. For some reason, when writing very large files sync have very, very bad performance, while async allowed for maximum write speeds.

If there are write permission errors, check that the permissions on the folder (/media/RAIDStorage) on the server are set correctly for the user

1 min read

Korean National Police University International Cybercrime Research Center


<div class="p1">Today is the inauguration of the Korean National Police University (KNPU) International Cybercrime Research Center (ICRC). The inauguration ceremony will immediately be followed by the 1st International Cybercrime Research Seminar.</div><blockquote class="tr_bq">Grand opening: The International Cybercrime Research Center, Korean National Police University.
The opening ceremony will be hosted by the president of the Korean National Police University on September 18th, 2012. To commemorate the event, the Police orchestra will perform, and the 1st International Cybercrime Research Seminar will be held. The International Cybercrime Research Center will focus on multi-disciplinary research and support dealing with cyber-policing strategy, quality education and training. The official website for the International Cybercrime Research Center will be available soon.
For more information or to submit proposals for partnerships and collaboration, please contact: [email protected]</blockquote><div class="p1">The 1st International Cybercrime Research Seminar will cover current trends and the future of training, education and research, child exploitation in South Korea (International Centre for Missing and Exploited Children: South Korean Chapter), and criminological aspects of digital crime.

Update:
<blockquote class="tr_bq">경찰대학(학장 치안정감 서천호)에서는 2012. 9. 18(화) 14:00 영상강의실에서 오전 1부행사로 국제 사이버범죄 연구센터 개소식에 이어 국내외 전문가들이 ‘유럽 사이버수사의 교육훈련 동향’ , ‘호주의 정보보안 침해범죄의 범죄학적 연구’, ‘인터넷 아동음란물 실태와 대응방안’ 등에 대한 제1회 국제사이버 범죄 세미나를 개최하였다.
경찰대학(학장 치안정감 서천호)에서는 2012. 9. 18(화) 14:00 영상강의실에서 오전 1부행사로 국제 사이버범죄 연구센터 개소식에 이어 국내외 전문가들이 ‘유럽 사이버수사의 교육훈련 동향’ , ‘호주의 정보보안 침해범죄의 범죄학적 연구’, ‘인터넷 아동음란물 실태와 대응방안’ 등에 대한 제1회 국제사이버 범죄 세미나를 개최하였다.</blockquote>
http://www.police.ac.kr/open/photo_news.jsp?SEQ=33307&BoardMode=view</div>

1 min read

Another SDHASH Test with Picture Files

After the last SDHASH test showed that fuzzy hashing on multiple sizes of the same picture files did not appear to work well. I decided to try the same size image with slight modifications like one might see in the real world. So, again there is an original image, the same image modified with text added, and the same image modified with a swirl pattern on the face.

<div class="separator" style="clear: both; text-align: center;"><table cellpadding="0" cellspacing="0" class="tr-caption-container" style="clear: left; margin-bottom: 1em; margin-left: auto; margin-right: auto; text-align: center;"><tbody><tr><td style="text-align: center;"></td></tr><tr><td class="tr-caption" style="text-align: center;">Kitty Orig: 75K MD5 6d5663de34cd53e900d486a2c3b811fd</td></tr></tbody></table><table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody><tr><td style="text-align: center;"></td></tr><tr><td class="tr-caption" style="text-align: center;">Kitty Text: 82K MD5 bcbed42be68cd81b4d903d487d19d790</td></tr></tbody></table></div>
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody><tr><td style="text-align: center;"></td></tr><tr><td class="tr-caption" style="text-align: center;">Kitty whirl: 92K MD5 4312932e8b91b301c5f33872e0b9ad98</td></tr></tbody></table>On this test, I hypothesized that there would be a high match between the original kitty, and the text kitty, and a low, if any, match between the original kitty and the whirl kitty. My reasoning for this is because I think the features of the data would be similar enough - excluding the text area.<div>
</div><div>Unfortunately, I was wrong. sdhash did not find similarity between any of the pictures (ssdeep did not either).</div><blockquote class="tr_bq">$sdhash -g *</blockquote><div><blockquote class="tr_bq">kitty_orig.jpeg
kitty_text.jpeg 000
kitty_orig.jpeg
kitty_whirl.jpeg 000
kitty_text.jpeg
kitty_whirl.jpeg 000</blockquote> So, both sdhash and ssdeep did not detect any similarity between the picture files. Perhaps these tools are not suitable for picture file analysis, or a replacement for standard hashes like MD5, etc. when looking for like pictures.

</div>
~1 min read

Comparing Fuzzy Hashes of Different Sizes of the Same Picture (SDHASH)

In a previous single, we looked at setting up and using SDHASH. After comparing modified files and, and getting a high score for similarity, we started wondering how well fuzzy hashing works on different sized images. So today, we have a little experiment.

First, we have 4 images. One original, and 3 smaller versions of the original.
<table cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody><tr><td style="text-align: center;"></td></tr><tr><td class="tr-caption" style="text-align: center;">Original: 75K, MD5 6d5663de34cd53e900d486a2c3b811fd</td></tr></tbody></table><table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody><tr><td style="text-align: center;"></td></tr><tr><td class="tr-caption" style="text-align: center;"> 1/2 Original: 44K, MD5 87ec8d4b69293161bca25244ad4ff1ac</td></tr></tbody></table>
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody><tr><td style="text-align: center;"></td></tr><tr><td class="tr-caption" style="text-align: center;">1/4 Original: 14K, MD5 978f28d7da1e7c1ba23490ed8e7e8384</td></tr></tbody></table>
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody><tr><td style="text-align: center;"></td></tr><tr><td class="tr-caption" style="text-align: center;">1/8 Original: 3.6K, MD5 3e8e0d049be8938f579b04144d2c3594</td></tr></tbody></table>So, if we have an original image, we can take the hash like so:
<blockquote class="tr_bq">$sdhash kitty_orig.jpeg > kitty_orig.sdbf</blockquote>Now, we want to take the hashes of other other files (manual way):
<blockquote class="tr_bq">$sdhash kitty_2.jpeg >> kitties.sdbf
$sdhash kitty_4.jpeg >> kitties.sdbf
$sdhash kitty_8.jpeg >> kitties.sdbf</blockquote>Now we can compare the hashes of the smaller versions to the hash of the original. Note: set the threshold to negative one (-t -1) if you want to see results below 1.
<blockquote class="tr_bq">$sdhash -t -1 -c kitty_orig.sdbf kitties.sdbf</blockquote>Unfortunately, but as expected, the results were not so good. Feature selection for the hash is done at the bit level, and those features do not carry over to smaller files since there are less bytes.
<blockquote class="tr_bq">kitty_2.jpeg
kitty_orig.jpeg 000
kitty_4.jpeg
kitty_orig.jpeg 000
kitty_8.jpeg
kitty_orig.jpeg 000 </blockquote>If you were working with more images, and you wanted to hash and compare at the same time, you could use the -g switch. For example:
<blockquote class="tr_bq">$sdhash -t -1 -g *</blockquote>The output of which (in this case) would be:
<blockquote class="tr_bq">kitty_2.jpeg
kitty_4.jpeg 000
kitty_2.jpeg
kitty_8.jpeg 000
kitty_2.jpeg
kitty_orig.jpeg 000
kitty_4.jpeg
kitty_8.jpeg 000
kitty_4.jpeg
kitty_orig.jpeg 000
kitty_8.jpeg
kitty_orig.jpeg 000</blockquote>So, in conclusion, sdhash’s feature selection does not allow for comparison of greatly different sized picture files. Note that a text file would be quite different, and would probably produce better results.
1 min read

Similarity Comparison with SDHASH (fuzzy hashing)

Being a fan of ssdeep for fuzzy hashing, I was interested in this article comparing ssdeep to sdhash.

As the article says, ssdeep basically breaks up a file into pieces, hashes each piece, and combines the produced hashes to create a hash for the whole file. Similarity for each piece of the file can then be calculated since small chunks of the file are examined. When comparing two files, the similarity of chunks will result in a similar hash.

sdhash, on the other hand, uses probabilistic feature extraction to find features that are unlikely to happen randomly. The extracted features are hashed and put into a Bloom filter. For the comparison of two files, Bloom filters are compared using a Hamming distance measure. The more similar the filter, the lower the Hamming distance.

The article An Evaluation of Forensic Similarity Hashes demonstrated that with the different approaches, sdhash appears to outperform ssdeep in many instances. Because of this, I took a closer look at sdhash.

One of the more interesting features of sdhash is the ability to tell the command how many threads you want to spawn. In Roussev’s paper he talks about parallelization. He had 12 processors on a test machine, with 24 cores. In many of his tests he was spawning 24 threads.

So, wanting to get a feel for sdhash I built it on a 2.4 Ghz Intel Core 2 Duo machine. Building was straightforward on OS X: make && sudo make install
<blockquote class="tr_bq">$ sdhash –version
sdhash 2.3alpha by Vassil Roussev, Candice Quates, August 2012, rev 476
       http://sdhash.org, license Apache v2.0</blockquote>When doing a comparison you first need to create a list of SDBFs (bloom filters) for the files of interest. The switches I was most interested in were:

<blockquote class="tr_bq">  -r [ –deep ]                         generate SDBFs from directories and files
  -g [ –gen-compare ]             generate SDBFs and compare all pairs
  -c [ –compare ]                   compare all pairs in SDBF file, or
                                            compare two SDBF files to each other</blockquote>
So, let’s say I had a corpus of illegal images in a folder (and potentially many sub-folders) named “evidence”. I could create the hashes of each of the images using the command:
<blockquote class="tr_bq">$sdhash -r evidence/ > illegal_images.sdbf</blockquote>In this case I am outputting the results to the file “illegal_images.sdbf” using a standard >, but you can also use -o to specify the output file.

If I then want to compare all the hashes in illegal_images.sdbf to hashes create from a new case, on the new machine I would have to create a sdbf (test_images.sdbf) file for the new hashes, and then compare:
<blockquote class="tr_bq">$sdhash -c illegal_images.sdbf test_images.sdbf</blockquote>The output will be the names of the two compared files, plus a score of similarity all separated by pipes (
). For an explanation about the scoring system please see Content triage with similarity digests: the M57 case study:
<blockquote class="tr_bq">warnings.py
warnings.py 100
opera8-2png1txt/file2.png
flask.ui/static/jquery.js 005
opera8-2png1txt/file2.png
flask.ui/venv/bin/python 009
opera8-2png1txt/file2.png
opera8-2png1txt/request.txt 066</blockquote>As Roussev says in his papers, the score is a calculation of similarity. In other words, even if the score is 100, the files may not be exactly the same.

To test this, I have a 3.3K text file named “Makefile”. First, I create a md5 and fuzzy hash of the file:
<blockquote class="tr_bq">$sdhash Makefile > make.sdbf</blockquote>Result: MD5 (Makefile) = 4fe15b4cf1591cdfe92b7efd65d291ec
<blockquote class="tr_bq">sdbf:03:8:Makefile:3417:sha1:256:5:7ff:160:1:50:AICBAiDMAAAAAoEACMAQAAAICA5gACQIgBAAQYAAAAACAhAAAFAEACIAIAoEAACCAAICAAAAAEgEEMCgQAACBAAACAAQAABCCCBSQFgACAAAABLAAiBCciCAAgAIAACIBCQAAgGAAAASJhAgAAAUAAgAACQAAIEAEAAIAAAIAIAAAAaQCAAQEgABAWABAAAAEAAAAAAik6CgAQEAAAAABRQUxwAAAEEGAAAACAAQAEAAABAAAAA8CAiAAAEpgIAkAAQGAAICCAYSBsIFEQCoiAAAQIYCAAAIAASAAAAMSCABOCAAAARAAAQAIAAEFIABAgACgAACAEAIDAIAAACEEg== </blockquote>Next, I modify “Makefile” to remove the first “# “ (hash and a space), create an md5 and fuzzy hash again and compare.

Result: MD5 (Makefile) = d7e9182ee1d8cb7c6ad41157637c7d62
<blockquote class="tr_bq">sdbf:03:8:Makefile:3415:sha1:256:5:7ff:160:1:50:AICBAiDMAAAAAoEACMAQAAAICA5gACQIgBAAQYAAAAACAhAAAFAEACIAIAoEAACCAAICAAAAAEgEEMCgQAACBAAACAAQAABCCCBSQFgACAAAABLAAiBCciCAAgAIAACIBCQAAgGAAAASJhAgAAAUAAgAACQAAIEAEAAIAAAIAIAAAAaQCAAQEgABAWABAAAAEAAAAAAik6CgAQEAAAAABRQUxwAAAEEGAAAACAAQAEAAABAAAAA8CAiAAAEpgIAkAAQGAAICCAYSBsIFEQCoiAAAQIYCAAAIAASAAAAMSCABOCAAAARAAAQAIAAEFIABAgACgAACAEAIDAIAAACEEg==</blockquote>Comparison of fuzzy hashes:
<blockquote class="tr_bq">$sdhash -c make.sdbf make_change.sdbf </blockquote>Resulting in:
<blockquote class="tr_bq">Makefile
Makefile 100 </blockquote>So from this we see that the MD5 hashes (4fe15b4cf1591cdfe92b7efd65d291ec and d7e9182ee1d8cb7c6ad41157637c7d62) changed dramatically - as expected. The result of sdhash, however, still returned a score of 100, meaning that the file is very similar, even if the content is slightly different. I may run more experiments to see how much the file can change before the score is reduced, but that is for a later day.

Another feature I am interested in is the specification of the number of threads.
<blockquote class="tr_bq">  -p [ –threads ] arg (=1)             compute threads to use</blockquote>Using this, we can (hopefully) easily utilize server farms to make the hash generation and comparison an easier task.

My test machine has 1 processor with 2 cores. Running sdhash without specifying the -p flag, ran with 1 thread and used around 80% - 90% of the CPU. When ‘sdhash -p 2’ was specified, 2 threads were spawned, and 170% - 190% of the processor was used (both cores). So, using multiple processors is quite easy, however, it may not always be the best option. In one of his papers, Roussev claims that running some threads on multiple cores of the same processor may not result in increased performance, but instead a competition for the processor’s time. This appeared to be the case in our brief experiment:

<blockquote class="tr_bq">$ time sdhash -r . > test.hashes
real 2m20.365s
user 0m57.093s
sys 1m11.767s
$ time sdhash -p 2 -r . > test.hashes
real 2m38.744s
user 2m20.538s
sys 2m41.520</blockquote> So in other words, you may want to test to make sure they way you are processing is the most efficient. I was hashing only a few hundred files, and not doing any comparison. If this were thousands, the a lot of time might have been wasted.

Roussev suggested creating hashes at the same time as imaging by redirecting the output of the suspect device to an image file and as an input to sdhash. He suggested dcfldd. I’ve not tried it, but if you could hash and image at the same time, there could be some definite time savings depending on your forensic process.

Overall fuzzy hashing appears interesting for finding similarity, however, I would like to test this method against full-size images vs. thumbnails. Since this method is NOT content analysis, but instead feature selection of raw data, I would be surprised if it found much similarity between an image and a corresponding thumbnail. If anyone has tested this, please let a comment with your results.
4 min read