Compatible MD5/SHA-1 file hashes in Ruby, Java and .NET / C#
SHA-1 and MD5 are hashing algorithms. They read some content and produce a string that is relatively unique to that content.
The purpose of a hash is to identify a file as being consistent with your expectations. The two reasons I use a hash are: to ensure that a file I transmit (usually over a web service) is exactly the same as the file on the server, and also to create a ‘base-line’ hash of a downloaded file to compare against the same file at a later date (if the user has made changes the hash will change).
Note that a these hashes, while fairly robust, are not bullet proof. Two files can have the same hash despite being completely different content. These are called collisions, and are extremely rare (at least when considering collisions that happen by chance). Be aware that malicious users can purposefully create matching hashes on different files with relative ease, and this is not intended as a security mechanism (if you need to prove that a document has not changed, consider digital signatures of at least 1024 bytes in length).
The primary purpose for this article is to give programmers a quick reference implementation for the SHA-1 hash that will read a binary file on the local disk and return a hexadecimal string. I have provided implementations for Ruby, Java and .NET that all return identical values for a given file.
If you have written your own functions and are having problems matching your hashes, ensure that you are opening your files in binary mode (applies to Ruby).
Here’s the code:
.NET (C#):public string GenerateHash(string filePathAndName) { string hashText = ""; string hexValue = ""; byte[] fileData = File.ReadAllBytes(filePathAndName); byte[] hashData = SHA1.Create().ComputeHash(fileData); // SHA1 or MD5 foreach (byte b in hashData) { hexValue = b.ToString("X").ToLower(); // Lowercase for compatibility on case-sensitive systems hashText += (hexValue.Length == 1 ? "0" : "") + hexValue; } return hashText; }
def generate_hash(file_path_and_name) hash_func = Digest::SHA1.new # SHA1 or MD5 open(file_path_and_name, "rb") do |io| while (!io.eof) readBuf = io.readpartial(1024) hash_func.update(readBuf) end end hash_func.hexdigest end
public String generateHash(File file) throws NoSuchAlgorithmException, FileNotFoundException, IOException { MessageDigest md = MessageDigest.getInstance("SHA"); // SHA or MD5 String hash = ""; byte[] data = new byte[(int)file.length()]; FileInputStream fis = new FileInputStream(file); fis.read(data); fis.close(); md.update(data); // Reads it all at one go. Might be better to chunk it. byte[] digest = md.digest(); for (int i = 0; i < digest.length; i++) { String hex = Integer.toHexString(digest[i]); if (hex.length() == 1) hex = "0" + hex; hex = hex.substring(hex.length() - 2); hash += hex; } return hash; }
Note that all examples use SHA-1 by default, but can easily be changed to use MD5 by changing one line in each case (see comments).
