Remove escape sequence characters like newline, tab and carriage return from JSON file

user3792699 picture user3792699 · Oct 29, 2016 · Viewed 14k times · Source

I have a JSON with 80+ fields. While extracting the message field in the below mentioned JSON file using jq, I'm getting newline characters and tab spaces. I want to remove the escape sequence characters and I have tried it using sed, but it did not work.

Sample JSON file:

{
"HOSTNAME":"server1.example",
"level":"WARN",
"level_value":30000,
"logger_name":"server1.example.adapter",
"content":{"message":"ERROR LALALLA\nERROR INFO NANANAN\tSOME MORE ERROR INFO\nBABABABABABBA\n BABABABA\t ABABBABAA\n\n BABABABAB\n\n"}
}

Can anyone help me on this?

Answer

mklement0 picture mklement0 · Oct 29, 2016

A pure jq solution:

$ jq -r '.content.message | gsub("[\\n\\t]"; "")' file.json
ERROR LALALLAERROR INFO NANANANSOME MORE ERROR INFOBABABABABABBA BABABABA ABABBABAA BABABABAB

If you want to keep the enlosing " characters, omit -r.

Note: peak's helpful answer contains a generalized regular expression that matches all control characters in the ASCII and Latin-1 Unicode range by way of a Unicode category specifier, \p{Cc}. jq uses the Oniguruma regex engine.


Other solutions, using an additional utility, such as sed and tr.

Using sed to unconditionally remove escape sequences \n and t:

$ jq '.content.message' file.json | sed 's/\\[tn]//g'
"ERROR LALALLAERROR INFO NANANANSOME MORE ERROR INFOBABABABABABBA BABABABA ABABBABAA BABABABAB"

Note that the enclosing " are still there, however. To remove them, add another substitution to the sed command:

$ jq '.content.message' file.json | sed 's/\\[tn]//g; s/"\(.*\)"/\1/'
ERROR LALALLAERROR INFO NANANANSOME MORE ERROR INFOBABABABABABBA BABABABA ABABBABAA BABABABAB

A simpler option that also removes the enclosing " (note: output has no trailing \n):

$ jq -r '.content.message' file.json | tr -d '\n\t'
ERROR LALALLAERROR INFO NANANANSOME MORE ERROR INFOBABABABABABBA BABABABA ABABBABAA BABABABAB

Note how -r is used to make jq interpolate the string (expanding the \n and \t sequences), which are then removed - as literals - by tr.