How to fix: UTF-8 text displayed as garbage characters (mojibake)

UTF-8 encoded text is being read as Latin-1 (ISO-8859-1). Each multi-byte UTF-8 character is split into separate Latin-1 characters. The pattern Ã followed by another character is a telltale sign of UTF-8-as-Latin-1 misinterpretation. Fix: 1. Look for the Ã pattern in the garbled text - this confirms UTF-8 read as Latin-1. 2. Re-decode the garbled text as Latin-1 to get bytes, then re-interpret as UTF-8. 3. Fix the source system to declare the correct encoding (add charset=utf-8 headers). 4. Use the URL Encoder to inspect how special characters are encoded.

How to fix: UnicodeDecodeError when processing non-ASCII content

Your code or library defaults to ASCII encoding, which only supports characters 0-127. Any byte above 127 (accented characters, emoji, CJK) triggers this error. This is especially common in Python 2 legacy code or systems with LANG=C. Fix: 1. Identify the actual encoding of your data (check HTTP headers, file BOM, or database charset). 2. Explicitly specify encoding='utf-8' in all file operations. 3. Set PYTHONIOENCODING=utf-8 and LANG=en_US.UTF-8 environment variables. 4. Use the Base64 tool to safely transport binary data through ASCII-only channels.

How to fix: Database returns wrong characters after migration

The database connection charset does not match the table or column charset. Data is stored correctly but converted incorrectly during read/write. Common after migrations between MySQL versions or cloud providers. Fix: 1. Check database charset: SHOW VARIABLES LIKE 'character_set%'; 2. Check table charset: SHOW CREATE TABLE tablename; 3. Set connection charset explicitly: charset=utf8mb4 in connection string. 4. For already-corrupted data, use a double-decode fix in your application layer.

Encoding 2026-04-24

Fix Encoding Format Mismatch Errors

Fix encoding format mismatch errors between UTF-8, ASCII, Latin-1, and other character sets. Diagnose mojibake, decode failures, and data corruption.

%20 URL Encode/Decode Free B64 Base64 Encode/Decode Free

encoding mismatch error utf-8 encoding error character encoding mismatch mojibake fix encoding format mismatch wrong encoding fix

Encoding format mismatch errors happen when data is written in one character encoding (e.g., UTF-8) but read as another (e.g., Latin-1). The result is mojibake (garbled text), decode errors, or silent data corruption. This guide helps you diagnose which encoding was used and convert correctly.

Common errors covered

1 UTF-8 text displayed as garbage characters (mojibake)
2 UnicodeDecodeError when processing non-ASCII content
3 Database returns wrong characters after migration

UTF-8 text displayed as garbage characters (mojibake)

Error message

Expected: café → Displayed: cafÃ©
Expected: naïve → Displayed: naÃ¯ve
Expected: résumé → Displayed: rÃ©sumÃ©

Root cause

UTF-8 encoded text is being read as Latin-1 (ISO-8859-1). Each multi-byte UTF-8 character is split into separate Latin-1 characters. The pattern Ã followed by another character is a telltale sign of UTF-8-as-Latin-1 misinterpretation.

Step-by-step fix

1 Look for the Ã pattern in the garbled text - this confirms UTF-8 read as Latin-1.
2 Re-decode the garbled text as Latin-1 to get bytes, then re-interpret as UTF-8.
3 Fix the source system to declare the correct encoding (add charset=utf-8 headers).
4 Use the URL Encoder to inspect how special characters are encoded.

Wrong

# Reading UTF-8 file as Latin-1
with open('data.txt', encoding='latin-1') as f:
    text = f.read()  # Produces mojibake

Correct

# Correct encoding declaration
with open('data.txt', encoding='utf-8') as f:
    text = f.read()  # Correct output

# Fix already-garbled text
def fix_mojibake(garbled):
    return garbled.encode('latin-1').decode('utf-8')

UnicodeDecodeError when processing non-ASCII content

Error message

UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 4: ordinal not in range(128)
UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 42

Root cause

Your code or library defaults to ASCII encoding, which only supports characters 0-127. Any byte above 127 (accented characters, emoji, CJK) triggers this error. This is especially common in Python 2 legacy code or systems with LANG=C.

Step-by-step fix

1 Identify the actual encoding of your data (check HTTP headers, file BOM, or database charset).
2 Explicitly specify encoding='utf-8' in all file operations.
3 Set PYTHONIOENCODING=utf-8 and LANG=en_US.UTF-8 environment variables.
4 Use the Base64 tool to safely transport binary data through ASCII-only channels.

Wrong

# Python defaults to platform encoding
with open('data.txt') as f:  # May default to ASCII
    text = f.read()

# Encoding non-ASCII for URL
url = 'https://example.com/' + city_name  # Crashes on 'München'

Correct

# Always specify UTF-8
with open('data.txt', encoding='utf-8') as f:
    text = f.read()

# URL-encode non-ASCII characters
from urllib.parse import quote
url = 'https://example.com/' + quote(city_name)  # Works with any charset

Database returns wrong characters after migration

Error message

Data looks correct in database but garbled in application
MySQL Warning: Incorrect string value '\xC3\xA9' for column 'name'

Root cause

The database connection charset does not match the table or column charset. Data is stored correctly but converted incorrectly during read/write. Common after migrations between MySQL versions or cloud providers.

Step-by-step fix

1 Check database charset: SHOW VARIABLES LIKE 'character_set%';
2 Check table charset: SHOW CREATE TABLE tablename;
3 Set connection charset explicitly: charset=utf8mb4 in connection string.
4 For already-corrupted data, use a double-decode fix in your application layer.

Wrong

# No charset in connection string
import mysql.connector
conn = mysql.connector.connect(host='db', user='app', database='mydb')
# Connection may use latin1 while table is utf8mb4

Correct

# Explicit charset in connection
import mysql.connector
conn = mysql.connector.connect(
    host='db', user='app', database='mydb',
    charset='utf8mb4', collation='utf8mb4_unicode_ci'
)

Prevention Tips

Default to UTF-8 everywhere: files, databases, HTTP headers, environment variables.
Always declare encoding explicitly - never rely on platform defaults.
Add <meta charset='utf-8'> to HTML and Content-Type: application/json; charset=utf-8 to APIs.
Test with non-ASCII characters (accented letters, emoji, CJK) during development, not just ASCII.

Frequently Asked Questions

How do I detect which encoding a file uses?

Use Python's chardet library: chardet.detect(raw_bytes) returns the detected encoding with confidence score. For web content, check the Content-Type header or <meta charset> tag.

What is the difference between UTF-8 and UTF-8 BOM?

UTF-8 BOM (Byte Order Mark) adds 3 bytes (EF BB BF) at the start of a file. Some Windows programs require it, but most Unix tools and web servers do not expect it and may display it as garbage. Prefer UTF-8 without BOM.

Why does MySQL have utf8 and utf8mb4?

MySQL's utf8 only supports 3-byte characters (no emoji, no some CJK). utf8mb4 is true UTF-8 supporting all Unicode characters including emoji. Always use utf8mb4.