From ‘hi’ to $2y$10$TmNKr…: What Happened to Your Password?

Say that I decide to sign up for an account an incredibly insecure password, ‘hi’. How does this become something stored in the database like this:

$2y$10$TmNKrCzcsgVeIS/DOdQ6JeyhZUePie/yaiBQHMrN0tk4THZhgHyW6

Passwords are sequences of characters that carry your wallet, personal information, and online history. There is a tremendous need for these to be secure, not only from the user standpoint but from the website’s, in order to be legally compliant and to ensure user trust.

Image for post

For many reasons, it’s dangerous to directly store the user’s password in the database. The most obvious is that if someone else were to gain access to the database, they would be able to see everyone’s password. Many websites also use cookies, or data stored in your browser that lasts even when you leave the site, to help auto-login on your next site visit. Leaving the raw-text password in a cookie leaves it accessible to the next computer user and to sites and programs that can read cross-site cookies.

A hash is used to convert a string into another representation of the string. This purpose of a hashing function is to create an encoded version of the password such that:

Going from password-to-encoded (encoding) is easy (hi_there to sjiu3s9*@slajsk).
Going from encoded-to-password (decoding) is impossible (or, at least, very very far from easy) (sjiu3s9*@slajsk to hi_there).

Hence, even if a hacker were to access our databases, they would not be able to convert sjiu3s9*@slajsk into the real entered password, hi_there. Hashing is a difficult idea, however, in that somehow information must only be able to flow in one direction.

This is analogous to representing all of The Odyssey in five pages of text: the transformation is one-way, as it’s impossible to reconstruct The Odyssey from a five-page summary. Hence, an inherent part of hashing is the discarding of information. Because of this fact, sometimes vastly different inputted strings will have the same hash because the discarded information results in the same processed string, but by design the chance this possibility leads to some actual security threat is negligible.

Although there are many hashing functions, generally they follow a three step design. First, the password is broken into several components, which are passed into a compression function. This compression function is the part that squeezes out some information and condenses the components. Lastly, the output of the compression is encoded (represented) as a long string of characters, which can contain numbers, letters, and symbols.

Image for post

Through the controlled loss of information — which can be represented mathematically through operations like rounding or modulo, hashing creates a one-way function to secure passwords. Hashing algorithms are designed such that even small changes in the input will drastically affect the end result (the hash for ‘fox123’ is nowhere similar to the hash for ‘fox122’).

Then, instead of storing the raw password in the database, we store its hash. The next time a user logs in, we hash their input with the same algorithm and see if the hashes match. Since hashes will always yield the same result for the same inputs, we can be confident that the passwords are the same without ever storing the password in an open, vulnerable format.

Hashing greatly increases database security, but it is still vulnerable to the classic try-and-see strategy of hacking. Although hashing has made it impossible to access the raw text password directly, as long as we know which hashing algorithm is being used — nothing a reasonably good hacker can’t access — we can try millions of inputs and store their hashes, then see if any of the hashes match the ones recorded. This is the practice of building rainbow tables.

Image for post

The construction of a Rainbow Table.

Although it may seem much too manual, unfortunately people aren’t as good as creating passwords as you would hope, and many passwords are the same across accounts. Then, a hacker that has access to the database may spot a hash that is similar to the one recorded in their rainbow tables:

Image for post

Noticing that id 4 has the same hash as id 6, and that the hashes recorded are identical to that of input ‘hi’ in the rainbow table, the hacker now has access to two accounts because they know that their passwords are ‘hi’. Although this method may seem like it‘s too laborious and inefficient, many hackers are constantly building rainbow tables:

Efficient rainbow tables take into account the chance a password is a real password. For instance, ‘si*S&3ljksna’ is probably not a used password and not worth checking, but ‘my_Doggo_3_2020’ is.
A standard computer alone can check hashes for almost 600,000 passwords per second. A GPU or 3D card can perform at three times that pace, not to mention specialized systems some hackers operate on.
Reiterating on the point above: people suck at generating unique passwords. ‘123456’ is still used as a password by 23 million account holders, and undoubtedly there are many other common passwords. 59% of people use the same password everywhere, meaning that if a rainbow table manages to land on one account, the hacker has a good shot of successfully logging into another of your accounts on a different site.

A well-designed rainbow-table generator that has been generating for several months nonstop would have a massive dictionary, and a large portion of accounts are almost guaranteed to be matched in it.

In order to address this issue, we salt our hashes. (Hungry yet?)

Salting is a brilliant idea — it is the adding of a long string to the end of the password before it is hashed. Whenever someone’s account is created, a salt is generated, and the stored password is the hash of the password and the salt. For example, if the password was ‘hi’ and the randomly generated salt was ‘3s8S72l3’, then the stored hash would not be the hash for ‘hi’ but the hash for ‘hi3s8S72l3’. Both the salt and the hash are stored such that when a user logs in, the salt is appended to their password and the hashes are matched.

Image for post

With a salt that is sufficiently long, salting hashes can defeat rainbow tables. Rainbow tables are constructed by keeping in mind ideas for ‘common passwords’, like names, adjectives, nouns, and sequences of numbers (like dates or digits of pi), since it is impossible to search for all possible passwords as combinations of characters within a reasonable amount of time.

Think of salts as adding complexity to your password. A five-character password, with a 15-character salt, would become a password with twenty-character complexity and drastically reduce the chance that a rainbow table would contain that hash. So, technically, if you were to sign up for a site that used very heavy salting with the password ‘hi’, you would be safe.

That being said, most sites don’t spend too many resources with heavy salting because they can get long (salts can also sometimes be some function of other data, like your username or the time in which you signed up, to avoid the space required to store it), so it’s better off to follow good password guidelines for security’s sake.

#programming #coding #computer-science #technology #security

medium.com

From ‘hi’ to $2y$10$TmNKr…: What Happened to Your Password?