Batch HTML Entity Encode/Decode
Encode or decode HTML entities in bulk using CLI tools, Python, and Node.js. Essential for sanitizing user input, processing CMS exports, and preparing content for web display.
The Problem
You're migrating content between CMS platforms, processing user-submitted data, or preparing translated strings for HTML templates. Thousands of strings contain special characters (&, <, >, quotes) that need proper entity encoding to display correctly and prevent XSS.
Why Batch Processing Matters
Batch HTML entity encoding is critical for CMS migrations (WordPress to static site), securing user-generated content before database storage, processing translation files (PO/XLIFF) with special characters, and generating HTML email templates from CSV contact lists.
Common Use Cases
- Sanitize user-submitted form data before storing in a database
- Migrate CMS content between platforms with different escaping rules
- Encode translated strings (i18n) that contain special characters
- Process CSV exports for HTML table display
- Decode double-encoded entities from legacy database dumps
Step-by-Step Instructions
Identify strings that need encoding
Search for unescaped special characters: grep -P '[<>&"\'\x80-\xff]' content.csv. Count matches to estimate the scope of the job.
Choose encode or decode direction
Encoding converts & to & (for safe HTML display). Decoding converts & back to & (for editing or plain text). Test one string first with the HTML Entity Encoder.
Run batch encoding
Use the scripts below. Python's html module and Node's he library handle the full HTML5 entity set, including named entities like —.
Verify output integrity
Check that encoded strings render correctly in a browser. Look for double-encoding issues (&amp; instead of &). Validate HTML output with the W3C validator.
Code Examples
# Encode HTML entities in a text file (basic: & < > " ')
sed -e 's/&/\&/g' -e 's/</\</g' -e 's/>/\>/g' \
-e 's/"/\"/g' -e "s/'/\'/g" input.txt > output.txt
# Decode HTML entities using perl (handles named + numeric)
perl -MHTML::Entities -pe 'decode_entities($_)' input.html > output.txt
# Encode a specific CSV column (column 3)
awk -F',' -v OFS=',' '{
gsub(/&/, "\&", $3)
gsub(/</, "\<", $3)
gsub(/>/, "\>", $3)
print
}' data.csv > data_encoded.csv
# Count strings that need encoding
grep -cP '[<>&]' content.csv
Single vs Batch Comparison
Paste 'Price: $5 < $10 & tax' → get 'Price: $5 < $10 & tax'
$ wc -l content.csv 4,521 content.csv $ python encode_entities.py Encoded 1,847 cells with special characters $ head -3 content_encoded.csv id,title,description 1,Tom & Jerry,Classic cartoon — cat & mouse 2,5 < 10,Math comparison example
Frequently Asked Questions
What's the difference between HTML encoding and URL encoding?
HTML encoding converts & to & for safe display inside HTML documents. URL encoding converts spaces to %20 for safe use in URLs. They serve different purposes — use HTML encoding for page content, URL encoding for query parameters and paths.
Should I encode all characters or just the dangerous five?
For most cases, encoding the five critical characters (& < > " ') is sufficient. Use full encoding (all non-ASCII) only if your HTML lacks a proper <meta charset="utf-8"> declaration or you're targeting legacy systems.
How do I detect and fix double-encoded entities?
Look for patterns like &amp; or &lt;. In Python: while '&' in text: text = html.unescape(text). Always decode first, then encode once — never encode already-encoded content.
Related Batch Guides
Try these tools interactively
Each tool runs in your browser with no signup required. Process single items instantly.