re.sub('a(b)','d','abc')
yields dc
, not adc
.
Why does re.sub
replace the entire capturing group, instead of just capturing group'(b)'?
Because it's supposed to replace the whole occurrence of the pattern:
Return the string obtained by replacing the leftmost non-overlapping occurrences of the pattern in string by the replacement repl.
If it were to replace only some subgroup, then complex regexes with several groups wouldn't work. There are several possible solutions:
re.sub('ab', 'ad', 'abc')
- my favorite, as it's very readable and explicit.re.sub('(a)b', r'\1d', 'abc')
repl
argument and make it process the Match
object and return required result.re.sub('(?<=a)b', r'd', 'abxb')
yields adxb
. The ?<=
in the beginning of the group says "it's a lookahead".