How can I remove all leading and trailing punctuation?

user1618820 picture user1618820 · Sep 20, 2012 · Viewed 11k times · Source

I want to remove all the leading and trailing punctuation in a string. How can I do this?

Basically, I want to preserve punctuation in between words, and I need to remove all leading and trailing punctuation.

  1. ., @, _, &, /, - are allowed if surrounded by letters or digits
  2. \' is allowed if preceded by a letter or digit

I tried

Pattern p = Pattern.compile("(^\\p{Punct})|(\\p{Punct}$)");
Matcher m = p.matcher(term);
boolean a = m.find();
if(a)
    term=term.replaceAll("(^\\p{Punct})", "");

but it didn't work!!

Answer

K.L. picture K.L. · Sep 20, 2012

Ok. So basically you want to find some pattern in your string and act if the pattern in matched.

Doing this the naiive way would be tedious. The naiive solution could involve something like

while(myString.StartsWith("." || "," || ";" || ...)
  myString = myString.Substring(1);

If you wanted to do a bit more complex task, it could be even impossible to do the way i mentioned.

Thats why we use regular expressions. Its a "language" with which you can define a pattern. the computer will be able to say, if a string matches that pattern. To learn about regular expressions, just type it into google. One of the first links: http://www.codeproject.com/Articles/9099/The-30-Minute-Regex-Tutorial

As for your problem, you could try this:

myString.replaceFirst("^[^a-zA-Z]+", "")

The meaning of the regex:

  • the first ^ means that in this pattern, what comes next has to be at the start of the string.

  • The [] define the chars. In this case, those are things that are NOT (the second ^) letters (a-zA-Z).

  • The + sign means that the thing before it can be repeated and still match the regex.

You can use a similar regex to remove trailing chars.

myString.replaceAll("[^a-zA-Z]+$", "");

the $ means "at the end of the string"