TL;DR; I want to see some code

Authentication systems generalized in many ways over years. You can find a lot of materials in web how to securely design an authentication system. System itself became robust, schemes, designs, architectures, diagrams, papers, lots of things. But behind the system you also need to securely save and share your messages in between actors(users of system). Nowadays computers getting faster, and upgrading those not costs too much. Our well designed robust system working good but some of attacks still having a chance to break it. What I want to talk about here is how not to let hackers to breach our security using our own messages.

Here is Moore’s Law shows us even home computers getting better CPUs with less money in years:

That is means the passwords that we keep securely in our systems are no longer secure, at least getting less secure by time because of computationally needs less time to solve them.

In our case, we as builder of system need to save user passwords safely. Storing them safely and disabling against modifications are two separate topics. Our focus must be increasing solubility time of encrypted passwords.

What we do is generally hashing the important input, and in case of a flaw of our secret data, we guarantee that it takes very long time for hacker to solve secret data and get into system. There are so many ways to break a simple hashed password. So what we need to do is searching for more sophisticated method.

Before getting into the solution just lets remember what is hashing:

A good hashing function have some basic properties:

+ Generates fixed length output from given arbitrary data. (Some hashing functions can be dynamically resizable by input, because of a necessity I will explain later!)

+ For given input, always produces same output.

+ Produces well separated hash space.

And a hash function’s most known use case reducing time complexity of given input elements computations. If you have good hashing algorithm, you mostly make searching operations in input space with O(1) constant time. Hash functions and its variations are around here at least for 60 years, and still performing much better when comparing with new algorithms in some topics.

Maybe the biggest enemy of a hash function is collisions. A collision happens when given two different input produces same output by the Hashing function. Collisions are inevitable if Hash function is not injective. And guess what, most of hashing functions are not injective. There are some dynamic hash functions also to prevent high number of collisions but still nothing guarantees that a good generalized (not domain specific) hashing function has no collisions. Fixed hashing functions has no chance, of course! Input space are vast (I mean infinite), and hash output just have some known number of possibilities.

As an example for collisions, just think that you hashing some IPv4 addresses with those functions:

H([x].[y].[z].[t]) = ([x] << 8) + [y]

For instance, both 1.2.3.4 and 1.2.100.100 addresses will have same hash output which is (1 << 8) + 2 = 258. So, this hash output will define two different input for you. You have to handle it by yourself in your application. Because your hashing algorithm limits you to distinguish these two different inputs. Your inputs have 2^32 different values but hashing function have 2^16 different values.

Or another way, just find a better hashing function like this:

H([x].[y].[z].[t]) = ([x] << 24) + ([y] << 16) + ([z] << 8) + [t]

So, same examples 1.2.3.4 and 1.2.100.100 will have different hash outputs which are (1 << 24) + (2 << 16) + (3 << 8) + 4 = 16909060 and (1 << 24) + (2 << 16) + (100 << 8) + 100 = 16933988. This function produces a perfect hash output(injective) and it has no collision in its domain.

OK. What about our problem, hashing passwords, and the solution is?

No worries, we will get to the point, just need to dig into cryptographic hashing functions a little.

### Cryptographic Hashing Functions

If a hashing function described as “cryptographic”, it must gives us some properties like these:

+ Computationally efficient: If it is not fast enough to compute hash, we cannot use it everywhere. Time – complexity trade-off must be in understandable level. We don’t like too slow or too fast algorithms neither because of generalization purposes.

+ Collision Resistant: It must not produce same hash outputs (theoretically impossible). This was what we worried about hashing functions already, and never want to see it in a cryptographic hashing function. And we know mathematically it is impossible to run from that.

+ Unpredictable:

+ It must be infeasable to generate message from hash.

+ It must be infeasable to modify message without modifying hash (one-wayness).

+ It must be infeasable to find two different message with same hash.

+ Randomness: Almost same messages must produce two very different looking ashes.

So, those are way of simple explanation. What about defining these properties theoretically?

### Preimage Resistance

It’s another name is one-wayness. For given a hash output z it must be computationally infeasible to find an input message x such that z = H(x).

**Why important:** This way it prevents to resolve the given hashed output. You can verify what the message is, but cannot achieve itself. Of course it is not guaranteed, but it must resist long enough time to decrypting it.

### Second Preimage Resistance

As you understand it’s name, for a given second preimage, we want to see if it outputs the same hash. It’s known as Weak Collision Resistance.

We expect that for given x, and thus H(x), it is computationally infeasible to find any y such that H(x) = H(y).

**Why important:** It is very obvious x and y are two different message but their digest(hash) are same output! Think about you send correct message x through a channel and it replaced by somebody else to message y. In this case, the person on other line will get the wrong message’s digest which is the replacement of the actual message of attacker, and you won’t know it.

### Collision Resistance

In this case, we expect it is computationally infeasible to find any pairs x != y such that H(x) = H(y). We expect that randomly chosen any two different input should have two different hash outputs for very long time.

**Why important:** The Birthday attack shows us we need bigger hash space to reduce collision attack probabilities. More probability needs more time, at least for today!

Those are all we expect from a good cryptographic hashing function. But unfortunately there is no analytically resolution for some problems right now.

A known problem for collision resistance is Pigeonhole Rule. We have an infinite input space and we want our outputs not to be collided. It’s impossible mathematically. We have at least countable infinite input space but what we shifting it to a finite output space (some number of bits). It is what hashing functions do. Those spaces cardinalities cannot be equal, no matter.

Another issue is about infeasibilty. It is deadly important for a hashing function to have Avalanche Effect. In terms of unpredictability and randomness, the algorithm should produce very different looking outputs for very similar inputs.

### Applications, Algorithms and Attacks

**Here are some headlines of where they used:**

- Digital Signing
- Message Authentication
- CSPRNG (Cryptographically secure pseudo random number generator)
- Password Security
- Encryption

**And here are some of them:**

- MD4 Family (Message Digest)
- MD4 (128 bit)
- MD5 (128 bits)
- MD6 (up to 512 bits)

- SHA Family (Secure Hashing)
- SHA-1 (160 bits)
- SHA-2 (variants: SHA-256 and SHA-512)
- SHA-3 (KECCAK: arbitrary, variants: up to 512 bits)

- crypt, bcrypt, scrypt, PBKDF2
**(some details)**

**And some types of attacks:**

- Collision : Tries to find some collisions
- Birthday Attack : Randomly tried inputs with some probability.
- Brute-force : Tries every possibility
- Dictionary : In a given set of inputs compares the output with actual output
- Rainbow tables: reduced space of dictionary tables.

Now, lets talk about some defense mechanisms:

### Salting the Hash and Further

Since the hashes one way, there is no way back to produce message from hash. So hash attacks as you can see mostly brute-force and it’s derivatives. Attacker can pre-compute many hash outputs. This causes to crack hashes with lookup tables very quickly. As a solution, we can add some random input to message before it is hashed. This way we can prevent rainbow table and brute-force attacks.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 | static void Main() { var hashFunc = new SHA256Managed(); var message = "password"; byte[] salt = null; byte[] messageByte = Encoding.UTF8.GetBytes(message); byte[] messageAndSalt = SaltMessage(messageByte, out salt); var hashNoSalt = hashFunc.ComputeHash(messageByte); var hashWithSalt = hashFunc.ComputeHash(messageAndSalt); Console.WriteLine("H({0}) --> {1}", message, Convert.ToBase64String(hashNoSalt)); Console.WriteLine("H({0} + {1}) --> {2}", message, Convert.ToBase64String(salt), Convert.ToBase64String(hashWithSalt)); //somehow attacker read our passwords from our database. Console.WriteLine(); Console.WriteLine(); Dictionary<string, byte[]> hashDict = SetAttackerDictionary(hashFunc); //which hash output more probable to be in attacker's dictionary CheckIfPasswordExistInDictionary(hashNoSalt, hashDict); CheckIfPasswordExistInDictionary(hashWithSalt, hashDict); } private static void CheckIfPasswordExistInDictionary(byte[] hash, Dictionary<string, byte[]> hashDict) { //comparison of two hash var hashResult = hashDict.Where(passHash => passHash.Value.Zip(hash, (b1, b2) => b1 == b2).All(x => x)).FirstOrDefault(); if (!hashResult.Equals(default(KeyValuePair<string, byte[]>))) { Console.ForegroundColor = ConsoleColor.Green; Console.WriteLine("Found Password: {0} --> {1}", Convert.ToBase64String(hash), hashResult.Key); } else { Console.ForegroundColor = ConsoleColor.Red; Console.WriteLine("Nothing Found: {0}", Convert.ToBase64String(hash)); } Console.ResetColor(); } private static Dictionary<string, byte[]> SetAttackerDictionary(HashAlgorithm hashFunc) { //attacker's dictionary Dictionary<string, byte[]> hashDict = new Dictionary<string, byte[]>() { {"pass", null }, {"message", null }, {"mypass", null }, {"Password", null }, {"sa1234", null }, {"rootroot", null }, {"password", null } //what is the possibility of a long random text's hash being in this dictionary? }; //we set dictionary with hashes foreach (var key in hashDict.Keys.ToList()) { hashDict[key] = hashFunc.ComputeHash(Encoding.UTF8.GetBytes(key)); } return hashDict; } |

1 2 3 4 5 6 7 8 9 10 11 | private static byte[] SaltMessage(byte[] message, out byte[] salt, int saltLength = 8) { //we create a cryptographic random number generator and append it's generated values to our password. salt = new byte[saltLength]; RNGCryptoServiceProvider random = new RNGCryptoServiceProvider(); random.GetNonZeroBytes(salt); byte[] messageAndSalt = new byte[message.Length + salt.Length]; Array.Copy(message, messageAndSalt, message.Length); Array.Copy(salt, 0, messageAndSalt, message.Length, salt.Length); return messageAndSalt; } |

When we try to break the produced outputs of “only hash” and “salted hash” methods, we for sure need to spend much more time for breaking salted hash version of same password. And think about how much space attacker needs to store for each password + 8 byte. Sorry, It’s too much.

But, is it enough. Salting the passwords. Actually it’s pretty enough for basic systems. Remember, attacker now have to recompute each possibility of password and random text pairs. He has no given dictionary for those. But, still our hashes can be computed very fast. So, what we need to do is making our hashing mechanism slower. For this purpose defenders use a term called “key stretching” which means for us is recursively iterating our hashing functions to hash it again with salt. Key stretching is pain for an attacker. Because for each iteration you have to go one step backward and find its hash again. If you are lucky and your precomputed hash table contains it, then you can go and use it for going one step backward again. But today, with using any kind of brute-force attack, it takes way long time to find a single password.

A simple iterative hashing is basically like this:

1 2 3 4 5 6 7 8 9 10 11 | private static byte[] KeyStretching(byte[] hash, byte[] salt, HashAlgorithm hashFunc, int iterations = 10000) { byte[] newHash = new byte[hash.Length]; Array.Copy(hash, newHash, hash.Length); for (int i = 0; i < iterations; i++) { //for each iteration we salt the hash again!!! newHash = hashFunc.ComputeHash(newHash.Concat(salt).ToArray()); } return newHash; } |

It is only written for showing you to how a key stretching method simply working. If you want to go with a good setup, there is a .NET implementation of PBFDK2 algorithm. So I would recommend you to use that instead. If the problem is about cryptographic hashing functions, any new made-in-home applications will not welcome! Play with them, but as long as you are not the cryptography guru, never try to find something new. It is not about software principles like DRY or YAGNI or KISS. It is definitely a big challenge that people spend their lives to find a “good algorithm”. So stick with something robust.

Hope you liked this note. I want to share some well written articles in references section. Good luck and stay secure.

Some References

+ https://crackstation.net/hashing-security.htm

+ http://en.wikipedia.org/wiki/Cryptographic_hash_function

+ http://security.stackexchange.com/questions/211/how-to-securely-hash-passwords/31846#31846