I am looking for a regex to extract the word that ONLY contain alphanumeic characters:
string = 'This is a $dollar sign !!'
matches = re.findall(regex, string)
matches = ['This', 'is', 'sign']
This can be done by tokenizing the string and evaluate each token individually using the following regex:
^[a-zA-Z0-9]+$
Due to performance issues, I want to able to extract the alphanumeric tokens without tokenizing the whole string. The closest I got to was
regex = \b[a-zA-Z0-9]+\b
, but it still extracts substrings containing alphanumeric characters:
string = 'This is a $dollar sign !!'
matches = re.findall(regex, string)
matches = ['This', 'is', 'dollar', 'sign']
Is there a regex able to pull this off? I've tried different things but can't come up with a solution.
There is no need to use regexs for this, python has a built in isalnum
string method. See below:
string = 'This is a $dollar sign !!'
matches = [word for word in string.split(' ') if word.isalnum()]