What is the regex to extract all the emojis from a string?

Question 1

What is the regex to extract all the emojis from a string?

java regex utf-8 emoji

vishalaksh · Jul 19, 2014 · Viewed 84.5k times · Source

Answer

Answer

Using emoji-java i've wrote a simple method that removes all emojis including fitzpatrick modifiers. Requires an external library but easier to maintain than those monster regexes.

Use:

String input = "A string 😄with a \uD83D\uDC66\uD83C\uDFFFfew 😉emojis!";
String result = EmojiParser.removeAllEmojis(input);

emoji-java maven installation:

<dependency>
  <groupId>com.vdurmont</groupId>
  <artifactId>emoji-java</artifactId>
  <version>3.1.3</version>
</dependency>

gradle:

implementation 'com.vdurmont:emoji-java:3.1.3'

EDIT: previously submitted answer was pulled into emoji-java source code.

Question 2

I have a String encoded in UTF-8. For example:

Thats a nice joke 😆😆😆 😛

I have to extract all the emojis present in the sentence. And the emoji could be any

When this sentence is viewed in terminal using command less text.txt it is viewed as:

Thats a nice joke <U+1F606><U+1F606><U+1F606> <U+1F61B>

This is the corresponding UTF code for the emoji. All the codes for emojis can be found at emojitracker.

For the purpose of finding all the occurances, I used a regular expression pattern (<U\+\w+?>) but it didnt work for the UTF-8 encoded string.

Following is my code:

    String s="Thats a nice joke 😆😆😆 😛";
    Pattern pattern = Pattern.compile("(<U\\+\\w+?>)");
    Matcher matcher = pattern.matcher(s);
    List<String> matchList = new ArrayList<String>();

    while (matcher.find()) {
        matchList.add(matcher.group());
    }

    for(int i=0;i<matchList.size();i++){
        System.out.println(matchList.get(i));

    }

This pdf says Range: 1F300–1F5FF for Miscellaneous Symbols and Pictographs. So I want to capture any character lying within this range.

What is the regex to extract all the emojis from a string?

Answer

Related questions