Say I have a string s
containing letters and two delimiters 1
and 2
. I want to split the string in the following way:
t
falls between 1
and 2
, return t
So if s = 'ab1cd2efg1hij2k'
, the expected output is ['a', 'b', 'cd', 'e', 'f', 'g', 'hij', 'k']
.
I tried to use regular expressions:
import re
s = 'ab1cd2efg1hij2k'
re.findall( r'(1([a-z]+)2|[a-z])', s )
[('a', ''),
('b', ''),
('1cd2', 'cd'),
('e', ''),
('f', ''),
('g', ''),
('1hij2', 'hij'),
('k', '')]
From there i can do [ x[x[-1]!=''] for x in re.findall( r'(1([a-z]+)2|[a-z])', s ) ]
to get my answer, but I still don't understand the output. The documentation says that findall
returns a list of tuples if the pattern has more than one group. However, my pattern only contains one group. Any explanation is welcome.
You pattern has two groups, the bigger group:
(1([a-z]+)2|[a-z])
and the second smaller group which is a subset of your first group:
([a-z]+)
Here is a solution that gives you the expected result although mind you, it is really ugly and there is probably a better way. I just can't figure it out:
import re
s = 'ab1cd2efg1hij2k'
a = re.findall( r'((?:1)([a-z]+)(?:2)|([a-z]))', s )
a = [tuple(j for j in i if j)[-1] for i in a]
>>> print a
['a', 'b', 'cd', 'e', 'f', 'g', 'hij', 'k']