Regex matching emoticons

CBarr picture CBarr · Jan 21, 2015 · Viewed 10k times · Source

We are working on a project where we want users to be able to use both emoji syntax (like :smile:, :heart:, :confused:,:stuck_out_tongue:) as well as normal emoticons (like :), <3, :/, :p)

I'm having trouble with the emoticon syntax because sometimes those character sequences will occur in:

  • normal strings or URL's - http://example.com
  • within the emoji syntax - :pencil:

How can I find these emoticon character sequences but not when other characters are near them?

The entire regex I'm using for all the emoticons is huge, so here's a trimed down version:

(\:\)|\:\(|<3|\:\/|\:-\/|\:\||\:p)

You can play with a demo of it in action here: http://regexr.com/3a8o5

Answer

Jesse Sweetland picture Jesse Sweetland · Jan 21, 2015

Match emoji first (to take care of the :pencil: example) and then check for a terminating whitespace or newline:

(\:\w+\:|\<[\/\\]?3|[\(\)\\\D|\*\$][\-\^]?[\:\;\=]|[\:\;\=B8][\-\^]?[3DOPp\@\$\*\\\)\(\/\|])(?=\s|[\!\.\?]|$)

This regex matches the following (preferring emoji) returning the match in matching group 1:

:( :) :P :p :O :3 :| :/ :\ :$ :* :@
:-( :-) :-P :-p :-O :-3 :-| :-/ :-\ :-$ :-* :-@
:^( :^) :^P :^p :^O :^3 :^| :^/ :^\ :^$ :^* :^@
): (: $: *:
)-: (-: $-: *-:
)^: (^: $^: *^:
<3 </3 <\3
:smile: :hug: :pencil:

It also supports terminal punctuation as a delimiter in addition to white space.

You can see more details and test it here: https://regex101.com/r/aM3cU7/4