[Hash sets] Korea University DFRC Reference Data Set
The National Software Reference Library (NSRL) is designed to collect software from various sources and incorporate file profiles computed from this software into a Reference Data Set (RDS) of information. The RDS can be used by law enforcement, government, and industry organizations to review files on a computer by matching file profiles in the RDS. This will help alleviate much of the effort involved in determining which files are important as evidence on computers or file systems that have been seized as part of criminal investigations.In other words, the NSRL is a very large collection of file hashes for 'known' software. In most cases it can be treated as a known good hash set, for filtering out potentially uninteresting files from a case.
The NSRL hashes are also hosted on hashsets.com, allowing you to query their database for a particular hash if you don't want to store the files locally.
NSRL is very useful, but it does have some limitations. The Korea University Digital Forensic Research Center is attempting to solve some of these limitations but providing the DFRC Reference Dataset.
Their Reference Data Set includes hash values from software used in South Korea as well as the NSRL. Currently, they have over 27 million hashes. Best of all, they provide a number of interfaces to test your data with. You can upload the suspect file directly, upload a list of hashes, search for a single sha1 or md5, or query their RDS via their REST interface! This way you can call their hash database directly from your tools.
Of course, you can also directly download their entire RDS.