Debug Hash Collision Prevention
Prevent and debug hash collisions in MD5, SHA-1, and SHA-256. Detect collision attacks, choose safe algorithms, and implement collision-resistant patterns.
Hash collisions occur when two different inputs produce the same hash output. While theoretical for strong algorithms like SHA-256, they are practical attacks against MD5 and SHA-1. Even with strong hashes, birthday paradox collisions can break systems that use truncated hashes or small hash spaces. This guide covers detection, prevention, and migration to safe algorithms.
Common errors covered
MD5 hash collision causes data integrity failure
Integrity check passed but file is corrupted
Two different files have the same MD5 checksum
Warning: MD5 collision detected in content-addressable storage
MD5 is cryptographically broken - chosen-prefix collision attacks can create two different files with the same MD5 hash in seconds. If your system relies on MD5 for integrity, an attacker can substitute malicious content that passes verification.
Step-by-step fix
- 1 Generate hashes for your files using the Hash Generator - compare MD5 vs SHA-256.
- 2 If two different files show the same MD5, confirm it is a collision (compare SHA-256 - they will differ).
- 3 Migrate from MD5 to SHA-256 for all integrity checks.
- 4 Audit your codebase for any remaining MD5 usage in security contexts.
# INSECURE: MD5 for integrity
import hashlib
def verify_download(filepath, expected_hash):
h = hashlib.md5(open(filepath, 'rb').read()).hexdigest()
return h == expected_hash # Vulnerable to collision attack
# SECURE: SHA-256 for integrity
import hashlib
def verify_download(filepath, expected_hash):
h = hashlib.sha256(open(filepath, 'rb').read()).hexdigest()
return h == expected_hash # Collision-resistant
Birthday paradox causes unexpected collisions in short hashes
DuplicateKeyError: hash 'a1b2c3d4' already exists
Collision rate 0.1% at 77000 entries (expected 0% for unique IDs)
When using truncated hashes (e.g., first 8 hex chars = 32 bits) as unique identifiers, the birthday paradox means you will see a collision after approximately 2^(n/2) entries. For 32-bit hashes, that is only ~65,000 entries.
Step-by-step fix
- 1 Calculate your collision probability: for n-bit hash and k items, probability is roughly k^2 / 2^(n+1).
- 2 Use the Hash Generator to compare output lengths of different algorithms.
- 3 Increase hash length: use at least 128 bits (32 hex chars) for unique IDs.
- 4 Add collision handling: detect and regenerate with a salt if needed.
# Truncated hash as unique ID - collision-prone
def generate_id(data):
full_hash = hashlib.sha256(data.encode()).hexdigest()
return full_hash[:8] # Only 32 bits - collision at ~65K items!
# Full-length hash or UUID for unique IDs
import uuid
def generate_id(data):
return hashlib.sha256(data.encode()).hexdigest() # 256 bits
# Or use UUID4 for random IDs:
# return str(uuid.uuid4()) # 122 bits of randomness
Hash table DoS via deliberate collisions (HashDoS)
Server response time degraded from 50ms to 30s
CPU at 100% processing single JSON request
Hash table operations O(n) instead of O(1)
Attackers craft JSON keys or form parameters that all hash to the same bucket in your language's hash table implementation. This degrades hash table lookup from O(1) to O(n), causing quadratic overall performance. Known to affect PHP, Python, Java, and Node.js.
Step-by-step fix
- 1 Check if your framework/language uses randomized hash seeds (Python 3.3+, Ruby 1.9+, Node.js do this by default).
- 2 Limit the number of parameters/keys accepted in a single request.
-
3
Set
PYTHONHASHSEED=random(default in Python 3.3+) to prevent predictable hashing. - 4 Use the Hash Generator to understand how different algorithms distribute values.
# No parameter limit - vulnerable to HashDoS
from flask import request
@app.route('/submit', methods=['POST'])
def submit():
data = request.form # Accepts unlimited parameters
# Limit parameters and add request size limit
from flask import request, abort
@app.route('/submit', methods=['POST'])
def submit():
if request.content_length > 1_000_000: # 1MB limit
abort(413)
data = dict(list(request.form.items())[:100]) # Max 100 params
Prevention Tips
- Never use MD5 or SHA-1 for security purposes (integrity, authentication, signatures). Use SHA-256 or SHA-3.
- Use at least 128 bits of hash output when using hashes as unique identifiers to avoid birthday paradox collisions.
- For password hashing, use dedicated algorithms: bcrypt, scrypt, or Argon2 - never raw SHA-256.
- Monitor for unusual collision rates in production - a sudden spike may indicate an attack.
Frequently Asked Questions
Is MD5 still safe for non-security uses like checksums?
MD5 is fast and fine for non-adversarial checksums (e.g., detecting accidental corruption, cache keys, deduplication in trusted environments). But if an attacker could substitute content, use SHA-256.
How many items before a collision becomes likely?
For an n-bit hash: ~2^(n/2) items gives ~50% collision chance. MD5 (128-bit): ~2^64 ≈ 18 quintillion. SHA-256 (256-bit): ~2^128. Truncated to 32 bits: only ~65,000. Use the full hash length for safety.
What is the difference between collision resistance and preimage resistance?
Collision resistance: hard to find ANY two inputs with the same hash. Preimage resistance: given a hash, hard to find an input that produces it. MD5 has broken collision resistance but still has (weakened) preimage resistance.
Related Error Guides
Related Tools
Still stuck? Try our free tools
All tools run in your browser, no signup required.