I’ve rewritten the Word HTML Cleaner from the ground up. It should run faster, accept larger files, and retain a wider range of structure tags.
I’m at something of a disadvantage as far as testing goes, though, as the French version of Office X I’m running will only produce proper Unicode entities for dodgy characters when generating UTF-8 web pages, which would be fine if all the other code in those pages didn’t fail in browsers produced by, uh, Microsoft. Anyone finding strange results, please pipe up.
* * *