Microsoft Naff Characters

I should have done this ages ago, the daft none standard Microsoft characters have bitten me in the bum for years, and I always have to strip them out, here is a quick and dirty function for it

 public String removeMSRubbish(String body) {
        body = body.replaceAll("[u2018|u2019|u201A]", "'");
        // smart double quotes
        body = body.replaceAll("[u201C|u201D|u201E]", """);
        // ellipsis
        body = body.replaceAll("u2026", "...");
        // dashes
        body = body.replaceAll("[u2013|u2014]", "-");
        // circumflex
        body = body.replaceAll("u02C6", "^");
        // open angle bracket
        body = body.replaceAll("u2039", "<");
        // close angle bracket
        body = body.replaceAll("u203A", ">");
        // spaces
        body = body.replaceAll("[u02DC|u00A0]", " ");
        return body; 
    }

Leave a Reply

Your email address will not be published. Required fields are marked *