I should have done this ages ago, the daft none standard Microsoft characters have bitten me in the bum for years, and I always have to strip them out, here is a quick and dirty function for it
public String removeMSRubbish(String body) { body = body.replaceAll("[u2018|u2019|u201A]", "'"); // smart double quotes body = body.replaceAll("[u201C|u201D|u201E]", """); // ellipsis body = body.replaceAll("u2026", "..."); // dashes body = body.replaceAll("[u2013|u2014]", "-"); // circumflex body = body.replaceAll("u02C6", "^"); // open angle bracket body = body.replaceAll("u2039", "<"); // close angle bracket body = body.replaceAll("u203A", ">"); // spaces body = body.replaceAll("[u02DC|u00A0]", " "); return body; }