I am writing a hashtag scraper for facebook, and every regex I come across to get hashtags seems to include punctuation as well as alphanumeric characters. Here's an example of what I would like:
Hello #world! I am #m4king a #fac_book scraper and would like a nice regular #expression.
I would like it to match world
, m4king
, fac
and expression
(note that I would like it to cut off if it reaches punctuation, including spaces). It would be nice if it didn't include the hash symbol, but it's not super important.
Just incase it's important, I will be using ruby's string scan method to grab possibly more than one tag.
Thanks heaps in advance!
A regex such as this: #([A-Za-z0-9]+)
should match what you need and place it in a capture group. You can then access this group later. Maybe this will help shed some light on regular expressions (from a Ruby context).
The regex above will start matching when it finds a #
tag and will throw any following letters or numbers into a capture group. Once it finds anything which is not a letter or a digit, it will stop the matching. In the end you will end up with a group containing what you are after.