What to redact: the checklist professionals use
Redaction failures are usually omissions — the third mention of a name on page 14. Before starting, list every category present, then sweep for each one across the whole document:
- Direct identifiers — names, signatures, photos, employee/customer IDs
- Contact data — addresses, phone numbers, personal emails
- Government and financial numbers — national ID/TC kimlik, SSN, IBAN, card numbers, tax IDs
- Account and case references — claim numbers, patient/case IDs, internal reference codes that allow re-identification via another system
- Third parties — names of people who aren’t the subject but appear incidentally; their privacy rights apply too
- Indirect identifiers in combination — birth date + postcode + employer can identify a person even with the name removed; consider context, not just patterns
Search the document for each category systematically (the digits of a phone number, the @ of emails) rather than scanning visually — eyes miss what’s in footnotes and tables.
Redaction vs anonymization vs pseudonymization
Redaction removes information; anonymization replaces it so the document stays readable (“Person A”, “Hospital 1”); pseudonymization replaces it consistently so relationships survive (“A complained to B” remains traceable as a structure). Legal disclosure usually wants redaction; research and case-study publication usually want pseudonymization. Choosing the wrong one wastes the work: a redacted document where every actor is a black box can be useless for the recipient’s purpose, forcing a do-over. Decide what the recipient must still be able to understand before choosing what to remove.
Why “drawing a black box” became a scandal genre
Court filings, government releases, and corporate disclosures have repeatedly leaked because someone drew an opaque rectangle over text in an editor — the text remained in the file, one copy-paste away. The pattern recurs every year, involving courts, ministries, and Fortune-500 legal teams; the lesson is structural: visual concealment and data removal are different operations, and only inspecting the file’s actual content (the search test above) proves which one happened. True redaction rewrites the page content so the sensitive bytes no longer exist — which is also why it’s irreversible, and why you should always keep your own unredacted original archived separately.