My program looks something like this:
import re
# Escape the string, in case it happens to have re metacharacters
my_str = "The quick brown fox jumped"
escaped_str = re.escape(my_str)
# "The\\ quick\\ brown\\ fox\\ jumped"
# Replace escaped space patterns with a generic white space pattern
spaced_pattern = re.sub(r"\\\s+", r"\s+", escaped_str)
# Raises error
The error is this:
Traceback (most recent call last):
File "<input>", line 1, in <module>
File "/home/swfarnsworth/programs/pycharm-2019.2/helpers/pydev/_pydev_bundle/pydev_umd.py", line 197, in runfile
pydev_imports.execfile(filename, global_vars, local_vars) # execute the script
File "/home/swfarnsworth/programs/pycharm-2019.2/helpers/pydev/_pydev_imps/_pydev_execfile.py", line 18, in execfile
exec(compile(contents+"\n", file, 'exec'), glob, loc)
File "/home/swfarnsworth/projects/medaCy/medacy/tools/converters/con_to_brat.py", line 255, in <module>
content = convert_con_to_brat(full_file_path)
File "/home/swfarnsworth/projects/my_file.py", line 191, in convert_con_to_brat
start_ind = get_absolute_index(text_lines, d["start_ind"], d["data_item"])
File "/home/swfarnsworth/projects/my_file.py", line 122, in get_absolute_index
entity_pattern_spaced = re.sub(r"\\\s+", r"\s+", entity_pattern_escaped)
File "/usr/local/lib/python3.7/re.py", line 192, in sub
return _compile(pattern, flags).sub(repl, string, count)
File "/usr/local/lib/python3.7/re.py", line 309, in _subx
template = _compile_repl(template, pattern)
File "/usr/local/lib/python3.7/re.py", line 300, in _compile_repl
return sre_parse.parse_template(repl, pattern)
File "/usr/local/lib/python3.7/sre_parse.py", line 1024, in parse_template
raise s.error('bad escape %s' % this, len(this))
re.error: bad escape \s at position 0
I get this error even if I remove the two backslashes before the '\s+'
or if I make the raw string (r"\\\s+"
) into a regular string. I checked the Python 3.7 documentation, and it appears that \s
is still the escape sequence for white space.
Try fiddling with the backslashes to avoid that regex tries to interpret \s
:
spaced_pattern = re.sub(r"\\\s+", "\\\s+", escaped_str)
now
>>> spaced_pattern
'The\\s+quick\\s+brown\\s+fox\\s+jumped'
>>> print(spaced_pattern)
The\s+quick\s+brown\s+fox\s+jumped
It seems that python tries to interpret \s
like it would interpret r"\n"
instead of leaving it alone like Python normally does. If you do. For example:
re.sub(r"\\\s+", r"\n+", escaped_str)
yields:
The
+quick
+brown
+fox
+jumped
even if \n
was used in a raw string.
The change was introduced in Issue #27030: Unknown escapes consisting of '\'
and ASCII letter in regular expressions now are errors.
The code that does the replacement is in sre_parse.py
(python 3.7):
else:
try:
this = chr(ESCAPES[this][1])
except KeyError:
if c in ASCIILETTERS:
raise s.error('bad escape %s' % this, len(this))
This code looks for what's behind a literal \
and tries to replace it by the proper non-ascii character. Obviously s
is not in ESCAPES
dictionary so the KeyError
exception is triggered, then the message you're getting.
On previous versions it just issued a warning:
import warnings
warnings.warn('bad escape %s' % this,
DeprecationWarning, stacklevel=4)
Looks that we're not alone to suffer from 3.6 to 3.7 upgrade: https://github.com/gi0baro/weppy/issues/227