How to replace one or more \ in string with just \?

Destructor picture Destructor · May 26, 2014 · Viewed 13.5k times · Source

Consider the string,

this\is\\a\new\\string

The output should be:

this\is\a\new\string

So basically one or more \ character should be replaced with just one \. I tried the following:

str = str.replace("[\\]+","\")

but it was no use. The reason I used two \ in [\\]+ was because internally \ is stored as \\. I know this might be a basic regex question, but I am able to replace one or more normal alphabets but not \ character. Any help is really appreciated.

Answer

Pshemo picture Pshemo · May 26, 2014

str.replace("[\\]+", "\") has few problems,

  • replace doesn't use regex (replaceAll does) so "[\\]" will represent [\] literal, not \ nor \\ (depending on what you think it would represent)
  • even if it did accept regex "[\\]" would not be correct regex because \\] would escape ] so you would end up with unclosed character class [..
  • it will not compile (your replacement String is not closed)

It will not compile because \ is start of escape sequence \X where X needs to be either

  • changed from being String special character to simple literal, like in your case \" will escape " to be literal (so you could print it for instance) instead of being start/end of String,
  • changed from being normal character to be special one like in case of line separators \n \r or tabulations \t.

Now we know that \ is special and is used to escape other character. So what do you think we need to do to make \ represent literal (when we want to print \). If you guessed that it needs to be escaped with another \ then you are right. To create \ literal we need to write it in String as "\\".

Since you know how to create String containing \ literal (escaped \) you can start thinking about how to create your replacements.

Regex which represents one or more \ can look like

\\+

But that is its native form, and we need to create it using String. I used \\ here because in regex \ is also special character (for instance \d represents digits , not \ literal followed by d) so it also needs to be escaped first to represent \ literal. Just like in String we can escape it with another \.

So String representing this regex will need to be written as

"\\\\+" (we escaped \ twice, once in regex \\+ and once in string)

You can use it as first argument of replaceAll (because replace as mentioned earlier doesn't accept regex).

Now last problem you will face is second argument of replaceAll method. If you write

replaceAll("\\\\+", "\\")

and it will find match for regex you will see exception

java.lang.IllegalArgumentException: character to be escaped is missing

It is because in replacement part (second argument in replaceAll method) we can also use special formula $x which represents current match from group with index x. So to be able to escape $ into literal we need some escape mechanism, and again \ was used here for that purpose. So \ is also special in replacement part of our method.
So again to create \ literal we need to escape it with another \, and string literal representing expression \\ is "\\\\".

But lets get back to earlier exception: message "character to be escaped is missing" refers to X part of \X formula (X is character we want to be escaped). Problem is that earlier your replacement "\\" represented only \ part, so this method expected either $ to create \$, or \\ to create \ literal. So valid replacements would be "\\$ or "\\\\".


To make things work you need to write your replacing method as

str = str.replaceAll("\\\\+", "\\\\")