How to find a whole word in a String in java

Nikola Yovchev picture Nikola Yovchev · Feb 23, 2011 · Viewed 215.6k times · Source

I have a String that I have to parse for different keywords. For example, I have the String:

"I will come and meet you at the 123woods"

And my keywords are

'123woods' 'woods'

I should report whenever I have a match and where. Multiple occurrences should also be accounted for. However, for this one, I should get a match only on 123woods, not on woods. This eliminates using String.contains() method. Also, I should be able to have a list/set of keywords and check at the same time for their occurrence. In this example, if I have '123woods' and 'come', I should get two occurrences. Method execution should be somewhat fast on large texts.

My idea is to use StringTokenizer but I am unsure if it will perform well. Any suggestions?

Answer

Chris picture Chris · Feb 23, 2011

The example below is based on your comments. It uses a List of keywords, which will be searched in a given String using word boundaries. It uses StringUtils from Apache Commons Lang to build the regular expression and print the matched groups.

String text = "I will come and meet you at the woods 123woods and all the woods";

List<String> tokens = new ArrayList<String>();
tokens.add("123woods");
tokens.add("woods");

String patternString = "\\b(" + StringUtils.join(tokens, "|") + ")\\b";
Pattern pattern = Pattern.compile(patternString);
Matcher matcher = pattern.matcher(text);

while (matcher.find()) {
    System.out.println(matcher.group(1));
}

If you are looking for more performance, you could have a look at StringSearch: high-performance pattern matching algorithms in Java.