I have this input String (containg tabs, spaces, linebreaks):
That is a test.
seems to work pretty good? working.
Another test again.
[Edit]: I should have provided the String for better testing as stackoverflow removes all special characters (tabs, ...)
String testContent = "\n\t\n\t\t\t\n\t\t\tDas ist ein Test.\t\t\t \n\tsoweit scheint das \t\tganze zu? funktionieren.\n\n\n\n\t\t\n\t\t\n\t\t\t \n\t\t\t \n \t\t\t\n \tNoch ein Test.\n \t\n \t\n \t";
And I want to reach this state:
That is a test.
seems to work pretty good? working.
Another test again.
String expectedOutput = "Das ist ein Test.\nsoweit scheint das ganze zu? funktionieren.\nNoch ein Test.\n";
Any ideas? Can this be achieved using regexes?
replaceAll("\\s+", " ")
is NOT what I'm looking for. If this regex would preserve exactly 1 newline of the ones existing it would be perfect.
I have tried this but this seems suboptimal to me...:
BufferedReader bufReader = new BufferedReader(new StringReader(testContent));
String line = null;
StringBuilder newString = new StringBuilder();
while ((line = bufReader.readLine()) != null) {
String temp = line.replaceAll("\\s+", " ");
if (!temp.trim().equals("")) {
newString.append(temp.trim());
newString.append("\n");
}
}
In a single regex (plus a small patch for tabs):
input.replaceAll("^\\s+|\\s+$|\\s*(\n)\\s*|(\\s)\\s*", "$1$2")
.replace("\t"," ");
The regex looks daunting, but in fact decomposes nicely into these parts that are OR-ed together:
^\s+
– match whitespace at the beginning;\s+$
– match whitespace at the end;\s*(\n)\s*
– match whitespace containing a newline, and capture that newline;(\s)\s*
– match whitespace, capturing the first whitespace character.The result will be a match with two capture groups, but only one of the groups may be non-empty at a time. This allows me to replace the match with "$1$2"
, which means "concatenate the two capture groups."
The only remaining problem is that I can't replace a tab with a space using this approach, so I fix that up with a simple non-regex character replacement.