Encoding Batch Processing 2026-04-10

Batch HTML Entity Encode/Decode

Encode or decode HTML entities in bulk using CLI tools, Python, and Node.js. Essential for sanitizing user input, processing CMS exports, and preparing content for web display.

Tools used: HTML Entity Encoder All Free

The Problem

You're migrating content between CMS platforms, processing user-submitted data, or preparing translated strings for HTML templates. Thousands of strings contain special characters (&, <, >, quotes) that need proper entity encoding to display correctly and prevent XSS.

Why Batch Processing Matters

Batch HTML entity encoding is critical for CMS migrations (WordPress to static site), securing user-generated content before database storage, processing translation files (PO/XLIFF) with special characters, and generating HTML email templates from CSV contact lists.

Common Use Cases

Sanitize user-submitted form data before storing in a database
Migrate CMS content between platforms with different escaping rules
Encode translated strings (i18n) that contain special characters
Process CSV exports for HTML table display
Decode double-encoded entities from legacy database dumps

Step-by-Step Instructions

Identify strings that need encoding

Search for unescaped special characters: grep -P '[<>&"\'\x80-\xff]' content.csv. Count matches to estimate the scope of the job.

Choose encode or decode direction

Encoding converts & to & (for safe HTML display). Decoding converts & back to & (for editing or plain text). Test one string first with the HTML Entity Encoder.

Run batch encoding

Use the scripts below. Python's html module and Node's he library handle the full HTML5 entity set, including named entities like —.

Verify output integrity

Check that encoded strings render correctly in a browser. Look for double-encoding issues (&amp; instead of &). Validate HTML output with the W3C validator.

Code Examples

# Encode HTML entities in a text file (basic: & < > " ')
sed -e 's/&/\&amp;/g' -e 's/</\&lt;/g' -e 's/>/\&gt;/g' \
    -e 's/"/\&quot;/g' -e "s/'/\&#39;/g" input.txt > output.txt

# Decode HTML entities using perl (handles named + numeric)
perl -MHTML::Entities -pe 'decode_entities($_)' input.html > output.txt

# Encode a specific CSV column (column 3)
awk -F',' -v OFS=',' '{
  gsub(/&/, "\&amp;", $3)
  gsub(/</, "\&lt;", $3)
  gsub(/>/, "\&gt;", $3)
  print
}' data.csv > data_encoded.csv

# Count strings that need encoding
grep -cP '[<>&]' content.csv

import csv
import html

input_file = "content.csv"
output_file = "content_encoded.csv"
columns_to_encode = [1, 2]  # 0-indexed columns to encode

with open(input_file) as fin, open(output_file, "w", newline="") as fout:
    reader = csv.reader(fin)
    writer = csv.writer(fout)
    header = next(reader)
    writer.writerow(header)

    encoded_count = 0
    for row in reader:
        for col in columns_to_encode:
            original = row[col]
            row[col] = html.escape(original)
            if row[col] != original:
                encoded_count += 1
        writer.writerow(row)

print(f"Encoded {encoded_count} cells with special characters")

# Decode HTML entities back to plain text
for row in reader:
    for col in columns_to_encode:
        row[col] = html.unescape(row[col])

const fs = require('fs');
const he = require('he');  // npm install he

const input = fs.readFileSync('content.csv', 'utf8');
const lines = input.trim().split('\n');
const COLS = [1, 2];  // columns to encode

const output = [lines[0]];  // keep header
let count = 0;

for (let i = 1; i < lines.length; i++) {
  const cols = lines[i].split(',');
  COLS.forEach(c => {
    const original = cols[c];
    cols[c] = he.encode(original, { useNamedReferences: true });
    if (cols[c] !== original) count++;
  });
  output.push(cols.join(','));
}

fs.writeFileSync('content_encoded.csv', output.join('\n'));
console.log(`Encoded ${count} cells with special characters`);

// Decode: he.decode('&amp;mdash;') → '—'

Single vs Batch Comparison

Single string (web tool)

Paste 'Price: $5 < $10 & tax' → get 'Price: $5 &lt; $10 &amp; tax'

Batch output (CLI)

$ wc -l content.csv
4,521 content.csv

$ python encode_entities.py
Encoded 1,847 cells with special characters

$ head -3 content_encoded.csv
id,title,description
1,Tom &amp; Jerry,Classic cartoon &mdash; cat &amp; mouse
2,5 &lt; 10,Math comparison example

Frequently Asked Questions

What's the difference between HTML encoding and URL encoding?

HTML encoding converts & to & for safe display inside HTML documents. URL encoding converts spaces to %20 for safe use in URLs. They serve different purposes — use HTML encoding for page content, URL encoding for query parameters and paths.

Should I encode all characters or just the dangerous five?

For most cases, encoding the five critical characters (& < > " ') is sufficient. Use full encoding (all non-ASCII) only if your HTML lacks a proper <meta charset="utf-8"> declaration or you're targeting legacy systems.

How do I detect and fix double-encoded entities?

Look for patterns like &amp; or &lt;. In Python: while '&' in text: text = html.unescape(text). Always decode first, then encode once — never encode already-encoded content.

Related Batch Guides

🔗

Each tool runs in your browser with no signup required. Process single items instantly.

Open HTML Entity Encoder

← All Batch Guides All Developer Guides