Hello I am new into regex and I'm starting out with python. I'm stuck at extracting all words from an English sentence. So far I have:
import re
shop="hello seattle what have you got"
regex = r'(\w*) '
list1=re.findall(regex,shop)
print list1
This gives output:
['hello', 'seattle', 'what', 'have', 'you']
If I replace regex by
regex = r'(\w*)\W*'
then output:
['hello', 'seattle', 'what', 'have', 'you', 'got', '']
whereas I want this output
['hello', 'seattle', 'what', 'have', 'you', 'got']
Please point me where I am going wrong.
Use word boundary \b
import re
shop="hello seattle what have you got"
regex = r'\b\w+\b'
list1=re.findall(regex,shop)
print list1
OP : ['hello', 'seattle', 'what', 'have', 'you', 'got']
or simply \w+
is enough
import re
shop="hello seattle what have you got"
regex = r'\w+'
list1=re.findall(regex,shop)
print list1
OP : ['hello', 'seattle', 'what', 'have', 'you', 'got']