Escaping strings for gsub

theta picture theta · Mar 20, 2012 · Viewed 19.4k times · Source

I read a file:

local logfile = io.open("log.txt", "r")
data = logfile:read("*a")
print(data)

output:

...
"(\.)\n(\w)", r"\1 \2"
"\n[^\t]", "", x, re.S
...

Yes, logfile looks awful as it's full of various commands

How can I call gsub and remove i.e. "(\.)\n(\w)", r"\1 \2" line from data variable?

Below snippet, does not work:

s='"(\.)\n(\w)", r"\1 \2"'
data=data:gsub(s, '')

I guess some escaping needs to be done. Any easy solution?


Update:

local data = [["(\.)\n(\w)", r"\1 \2"
"\n[^\t]", "", x, re.S]]

local s = [["(\.)\n(\w)", r"\1 \2"]]

local function esc(x)
   return (x:gsub('%%', '%%%%')
            :gsub('^%^', '%%^')
            :gsub('%$$', '%%$')
            :gsub('%(', '%%(')
            :gsub('%)', '%%)')
            :gsub('%.', '%%.')
            :gsub('%[', '%%[')
            :gsub('%]', '%%]')
            :gsub('%*', '%%*')
            :gsub('%+', '%%+')
            :gsub('%-', '%%-')
            :gsub('%?', '%%?'))
end

print(data:gsub(esc(s), ''))

This seems to works fine, only that I need to escape, escape character %, as it wont work if % is in matched string. I tried :gsub('%%', '%%%%') or :gsub('\%', '\%\%') but it doesn't work.


Update 2:

OK, % can be escaped this way if set first in above "table" which I just corrected

:terrible experience:

Update 3:

Escaping of ^ and $

As stated in Lua manual (5.1, 5.2, 5.3)

A caret ^ at the beginning of a pattern anchors the match at the beginning of the subject string. A $ at the end of a pattern anchors the match at the end of the subject string. At other positions, ^ and $ have no special meaning and represent themselves.

So a better idea would be to escape ^ and $ only when they are found (respectively) and the beginning or the end of the string.

Lua 5.1 - 5.2+ incompatibilities

string.gsub now raises an error if the replacement string contains a % followed by a character other than the permitted % or digit.

There is no need to double every % in the replacement string. See lua-users.

Answer

FSMaxB picture FSMaxB · Jan 22, 2016

According to Programming in Lua:

The character `%´ works as an escape for those magic characters. So, '%.' matches a dot; '%%' matches the character `%´ itself. You can use the escape `%´ not only for the magic characters, but also for all other non-alphanumeric characters. When in doubt, play safe and put an escape.

Doesn't this mean that you can simply put % in front of every non alphanumeric character and be fine. This would also be future proof (in the case that new special characters are introduced). Like this:

function escape_pattern(text)
    return text:gsub("([^%w])", "%%%1")
end

It worked for me on Lua 5.3.2 (only rudimentary testing was performed). Not sure if it will work with older versions.