Encoding 2026-04-24

Fix Encoding Format Mismatch Errors

Fix encoding format mismatch errors between UTF-8, ASCII, Latin-1, and other character sets. Diagnose mojibake, decode failures, and data corruption.

encoding mismatch error utf-8 encoding error character encoding mismatch mojibake fix encoding format mismatch wrong encoding fix

Encoding format mismatch errors happen when data is written in one character encoding (e.g., UTF-8) but read as another (e.g., Latin-1). The result is mojibake (garbled text), decode errors, or silent data corruption. This guide helps you diagnose which encoding was used and convert correctly.

Common errors covered

  1. 1 UTF-8 text displayed as garbage characters (mojibake)
  2. 2 UnicodeDecodeError when processing non-ASCII content
  3. 3 Database returns wrong characters after migration
1

UTF-8 text displayed as garbage characters (mojibake)

Error message
Expected: café → Displayed: café Expected: naïve → Displayed: naïve Expected: résumé → Displayed: résumé
Root cause

UTF-8 encoded text is being read as Latin-1 (ISO-8859-1). Each multi-byte UTF-8 character is split into separate Latin-1 characters. The pattern à followed by another character is a telltale sign of UTF-8-as-Latin-1 misinterpretation.

Step-by-step fix

  1. 1 Look for the à pattern in the garbled text - this confirms UTF-8 read as Latin-1.
  2. 2 Re-decode the garbled text as Latin-1 to get bytes, then re-interpret as UTF-8.
  3. 3 Fix the source system to declare the correct encoding (add charset=utf-8 headers).
  4. 4 Use the URL Encoder to inspect how special characters are encoded.
Wrong
# Reading UTF-8 file as Latin-1
with open('data.txt', encoding='latin-1') as f:
    text = f.read()  # Produces mojibake
Correct
# Correct encoding declaration
with open('data.txt', encoding='utf-8') as f:
    text = f.read()  # Correct output

# Fix already-garbled text
def fix_mojibake(garbled):
    return garbled.encode('latin-1').decode('utf-8')

2

UnicodeDecodeError when processing non-ASCII content

Error message
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 4: ordinal not in range(128) UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 42
Root cause

Your code or library defaults to ASCII encoding, which only supports characters 0-127. Any byte above 127 (accented characters, emoji, CJK) triggers this error. This is especially common in Python 2 legacy code or systems with LANG=C.

Step-by-step fix

  1. 1 Identify the actual encoding of your data (check HTTP headers, file BOM, or database charset).
  2. 2 Explicitly specify encoding='utf-8' in all file operations.
  3. 3 Set PYTHONIOENCODING=utf-8 and LANG=en_US.UTF-8 environment variables.
  4. 4 Use the Base64 tool to safely transport binary data through ASCII-only channels.
Wrong
# Python defaults to platform encoding
with open('data.txt') as f:  # May default to ASCII
    text = f.read()

# Encoding non-ASCII for URL
url = 'https://example.com/' + city_name  # Crashes on 'München'
Correct
# Always specify UTF-8
with open('data.txt', encoding='utf-8') as f:
    text = f.read()

# URL-encode non-ASCII characters
from urllib.parse import quote
url = 'https://example.com/' + quote(city_name)  # Works with any charset

3

Database returns wrong characters after migration

Error message
Data looks correct in database but garbled in application MySQL Warning: Incorrect string value '\xC3\xA9' for column 'name'
Root cause

The database connection charset does not match the table or column charset. Data is stored correctly but converted incorrectly during read/write. Common after migrations between MySQL versions or cloud providers.

Step-by-step fix

  1. 1 Check database charset: SHOW VARIABLES LIKE 'character_set%';
  2. 2 Check table charset: SHOW CREATE TABLE tablename;
  3. 3 Set connection charset explicitly: charset=utf8mb4 in connection string.
  4. 4 For already-corrupted data, use a double-decode fix in your application layer.
Wrong
# No charset in connection string
import mysql.connector
conn = mysql.connector.connect(host='db', user='app', database='mydb')
# Connection may use latin1 while table is utf8mb4
Correct
# Explicit charset in connection
import mysql.connector
conn = mysql.connector.connect(
    host='db', user='app', database='mydb',
    charset='utf8mb4', collation='utf8mb4_unicode_ci'
)

Prevention Tips

  • Default to UTF-8 everywhere: files, databases, HTTP headers, environment variables.
  • Always declare encoding explicitly - never rely on platform defaults.
  • Add <meta charset='utf-8'> to HTML and Content-Type: application/json; charset=utf-8 to APIs.
  • Test with non-ASCII characters (accented letters, emoji, CJK) during development, not just ASCII.

Frequently Asked Questions

How do I detect which encoding a file uses?

Use Python's chardet library: chardet.detect(raw_bytes) returns the detected encoding with confidence score. For web content, check the Content-Type header or <meta charset> tag.

What is the difference between UTF-8 and UTF-8 BOM?

UTF-8 BOM (Byte Order Mark) adds 3 bytes (EF BB BF) at the start of a file. Some Windows programs require it, but most Unix tools and web servers do not expect it and may display it as garbage. Prefer UTF-8 without BOM.

Why does MySQL have utf8 and utf8mb4?

MySQL's utf8 only supports 3-byte characters (no emoji, no some CJK). utf8mb4 is true UTF-8 supporting all Unicode characters including emoji. Always use utf8mb4.

Related Error Guides

Related Tools

Still stuck? Try our free tools

All tools run in your browser, no signup required.