@aral Ever seen the HTML produced by an MS Word export? Blood fountains from your eyes. I once cleaned up such a HTML, and the size of the resulting clean HTML was less than 20% of the original. They even used CSS classes, but then explicitly repeated their definitions every.single.time (like "<p class="foo"><font size…><font family…><b><i>blabber</i></b></font></font></p>", with "foo" of course having font size, family, italics, bold already).
But right, worse examples are no excuses…