I have below content in Java where I want to strip only html tags but not new line characters
<p>test1 <b>test2</b> test 3 </p> //line 1
<p>test4 </p> //line 2
If I open above content in text rich editor, line 1 and line 2 are displayed in different lines(without showing </p>
tag).But in notepad content is shown along with </p>
tags. To remove all html tags I used
Jsoup.parse(aboveContent).text()
It removes all html characters. But it shows all line 1 and line 2 in same line in notepad. Somehow Jsoup also removes newline character.
What I tried:-
I also tried replacing </p>
with \r\n
and then do to remove html tags
Jsoup.parse(contentWith\r\n-Insteadof-</p>Tag ).text()
but still Jsoup removes end of line character(as in the debugger I can see both line1 and line2) in same line.
How I can make Jsoup to strip only html character but not new line character?
You can also do this:
public static String cleanNoMarkup(String input) {
final Document.OutputSettings outputSettings = new Document.OutputSettings().prettyPrint(false);
String output = Jsoup.clean(input, "", Whitelist.none(), outputSettings);
return output;
}
The important things here are: 1. Whitelist.none() - so no markup is allowed 2..prettyPrint(false) - so linebreaks are not removed