Hash Functions Explained: MD5, SHA-256, and When to Use Each
Hash functions show up everywhere in software engineering: password storage, file integrity checks, Git commits, blockchain, digital signatures, caching, data deduplication. If you write code, you use hashes. Knowing which algorithm to reach for and why will save you from security mistakes that range from embarrassing to career-ending.
What Is a Hash Function?
A hash function takes input data of any size and produces a fixed-size output, usually called a hash, digest, or checksum. Same input, same output, every time. Change a single bit and the output looks completely different. You'll typically see it rendered as a hex string.
Input: "Hello"
MD5: 8b1a9953c4611296a827abf8c47804d7
SHA-256: 185f8db32271fe25f561a6fc938b2e264306ec304eda518007d1764826381969
Input: "Hello." (added a period)
MD5: 2946545e2e0a5f58f92738ac3ed3e02f
SHA-256: 2d8bd7d9bb5f85ba643f0110d50cb506a1fe439e769a22503193ea6046bb87f7
One period flips both hashes completely. That's the avalanche effect. A good hash function gives you no predictable relationship between similar inputs and their outputs.
Properties of Cryptographic Hash Functions
A hash function counts as "cryptographic" when it meets three properties. First, pre-image resistance: given a hash output, you can't find any input that produces it. No reversing. Second, second pre-image resistance: given one input and its hash, you can't find a different input with the same hash. Third, collision resistance: you can't find any two inputs that share a hash.
Non-cryptographic hash functions like CRC32, MurmurHash, and xxHash are fast but don't guarantee any of these. They work for hash tables and checksums when nobody is trying to attack you, but never use them where security matters.
The Algorithms Compared
| Algorithm | Output Size | Status | Use Case |
|---|---|---|---|
| MD5 | 128 bits (32 hex chars) | Broken (collisions practical since 2004) | Legacy file checksums only. Never for security. |
| SHA-1 | 160 bits (40 hex chars) | Broken (SHAttered attack, 2017) | Git still uses it internally, but SHA-256 migration underway. Do not use for new projects. |
| SHA-256 | 256 bits (64 hex chars) | Secure. Industry standard. | File integrity, digital signatures, blockchain, TLS certificates, code signing. |
| SHA-512 | 512 bits (128 hex chars) | Secure. Faster than SHA-256 on 64-bit CPUs. | Same as SHA-256 when you need a larger hash or are on 64-bit hardware. |
| SHA-3 | 224/256/384/512 bits | Secure. Different internal design (Keccak). | Defense-in-depth: if SHA-2 is ever broken, SHA-3 is the fallback. |
| BLAKE2/BLAKE3 | Variable (up to 512 bits) | Secure. Very fast. | High-performance checksums, file hashing, key derivation. |
When to Use What
File Integrity and Checksums
Use SHA-256. You've probably seen this already: download a Linux ISO, check the published SHA-256 hash against what you compute locally. If they match, the file hasn't been corrupted or tampered with. MD5 still shows up for this, but since collisions are practical, an attacker could substitute a malicious file with the same MD5 hash. Don't rely on it.
# Compute SHA-256 of a file
sha256sum ubuntu-24.04-amd64.iso # Linux
shasum -a 256 ubuntu-24.04-amd64.iso # macOS
Password Storage
Do not use MD5, SHA-1, or SHA-256 for passwords. General-purpose hash functions are fast, and that's exactly the problem. An attacker with a GPU can try billions of SHA-256 hashes per second.
Use a dedicated password hashing function instead. Argon2id is the current winner of the Password Hashing Competition and the right pick for new projects: memory-hard, configurable, well-studied. If Argon2 isn't available in your stack, bcrypt has been solid since 1999 and is supported practically everywhere. scrypt is another option, memory-hard and CPU-hard, though I've mostly seen it in cryptocurrency contexts.
All three are deliberately slow and memory-intensive, which is the whole point. They also salt automatically, so rainbow tables are off the table.
Digital Signatures and Certificates
TLS certificates use SHA-256 for the signature hash. The CA hashes the certificate data with SHA-256, then signs that hash with its private key. Browsers verify using the CA's public key. SHA-1 certificates were deprecated in 2017 after the SHAttered attack showed that practical forgery was possible.
Git and Version Control
Git identifies every object (commit, tree, blob) by its SHA-1 hash. You'll see this when you run git log: each commit hash is a SHA-1 digest of the commit content. Git is moving to SHA-256, but it's a slow migration. The SHAttered attack requires a specifically crafted collision, so existing repos aren't in immediate danger.
HMACs and API Authentication
HMAC (Hash-based Message Authentication Code) combines a hash function with a secret key to authenticate messages. If you've integrated Stripe, GitHub, or Shopify webhooks, you've seen this: they sign payloads with HMAC-SHA256. Your server recomputes the HMAC using the shared secret and compares it to the signature in the request header. Match means the payload is legit and untampered. You can test these signatures with the HMAC generator.
Compute MD5, SHA-1, SHA-256, SHA-512 and other hashes. Paste text or upload a file. Runs entirely in your browser.
Open Hash GeneratorCommon Mistakes
I've seen these repeatedly in code reviews and production incidents, so they're worth calling out explicitly.
Using MD5 or SHA-1 for anything security-sensitive. Both have practical collision attacks. Reach for SHA-256 or stronger.
Hashing passwords with SHA-256. It's too fast. An attacker with commodity hardware will brute-force it. Use Argon2id, bcrypt, or scrypt.
Storing unsalted hashes. Without a unique random salt per entry, identical inputs produce identical hashes, and rainbow table attacks become trivial. Password hashing functions handle salting for you.
Confusing hashing with encryption. Hashing is one-way. Encryption is reversible with a key. One is for verification, the other for confidentiality. They solve different problems.
Using == to compare hashes. Use a constant-time comparison function to prevent timing attacks. Most languages have one: hmac.compare_digest() in Python, crypto.timingSafeEqual() in Node.js.
Worth noting: Base64 is encoding, not hashing. It's fully reversible and provides zero security. Don't mix the two up.
Quick Reference: Hashing in Code
# Python
import hashlib
hashlib.sha256(b"Hello").hexdigest()
# "185f8db32271fe25f561a6fc938b2e264306ec304eda518007d1764826381969"
// JavaScript (browser)
const hash = await crypto.subtle.digest("SHA-256",
new TextEncoder().encode("Hello"));
[...new Uint8Array(hash)].map(b => b.toString(16).padStart(2,"0")).join("")
// Node.js
require("crypto").createHash("sha256").update("Hello").digest("hex")
# Bash
echo -n "Hello" | sha256sum
Frequently Asked Questions
Can you reverse a hash back to the original data?
No. Cryptographic hash functions are one-way by design. Given a hash output, there is no way to compute the original input. Attackers use brute-force or rainbow table attacks to find inputs that produce a given hash, but they cannot reverse the function itself. This is why hashing is used for passwords: even if the hash database leaks, the original passwords are not directly exposed.
Why is MD5 considered insecure?
MD5 is vulnerable to collision attacks: two different inputs can produce the same hash. Researchers demonstrated practical collisions in 2004, and by 2008 created a rogue CA certificate using an MD5 collision. MD5 is also too fast for password hashing, making brute-force attacks feasible. Use SHA-256 for checksums and Argon2/bcrypt for passwords.
Which hash function should I use for passwords?
Do not use MD5, SHA-1, or SHA-256 for passwords. These are too fast, making brute-force attacks trivial. Use a dedicated password hashing function: Argon2id (current best practice), bcrypt (battle-tested), or scrypt (memory-hard). These are deliberately slow and memory-intensive, making large-scale cracking impractical.