An elegant way to get hashtags out of a string in Python?

Dan Abramov picture Dan Abramov · Jun 13, 2011 · Viewed 26.4k times · Source

I'm looking for a clean way to get a set (list, array, whatever) of words starting with # inside a given string.

In C#, I would write

var hashtags = input
    .Split (' ')
    .Where (s => s[0] == '#')
    .Select (s => s.Substring (1))
    .Distinct ();

What is comparatively elegant code to do this in Python?

EDIT

Sample input: "Hey guys! #stackoverflow really #rocks #rocks #announcement"
Expected output: ["stackoverflow", "rocks", "announcement"]

Answer

utdemir picture utdemir · Jun 13, 2011

With @inspectorG4dget's answer, if you want no duplicates, you can use set comprehensions instead of list comprehensions.

>>> tags="Hey guys! #stackoverflow really #rocks #rocks #announcement"
>>> {tag.strip("#") for tag in tags.split() if tag.startswith("#")}
set(['announcement', 'rocks', 'stackoverflow'])

Note that { } syntax for set comprehensions only works starting with Python 2.7.
If you're working with older versions, feed list comprehension ([ ]) output to set function as suggested by @Bertrand.