How can I replace non-printable Unicode characters in Java?

dagnelies picture dagnelies · Jun 1, 2011 · Viewed 133.2k times · Source

The following will replace ASCII control characters (shorthand for [\x00-\x1F\x7F]):

my_string.replaceAll("\\p{Cntrl}", "?");

The following will replace all ASCII non-printable characters (shorthand for [\p{Graph}\x20]), including accented characters:

my_string.replaceAll("[^\\p{Print}]", "?");

However, neither works for Unicode strings. Does anyone has a good way to remove non-printable characters from a unicode string?

Answer

Op De Cirkel picture Op De Cirkel · Jun 1, 2011
my_string.replaceAll("\\p{C}", "?");

See more about Unicode regex. java.util.regexPattern/String.replaceAll supports them.