I have a JSON with 80+ fields. While extracting the message field in the below mentioned JSON file using jq, I'm getting newline characters and tab spaces. I want to remove the escape sequence characters and I have tried it using sed, but it did not work.
Sample JSON file:
{
"HOSTNAME":"server1.example",
"level":"WARN",
"level_value":30000,
"logger_name":"server1.example.adapter",
"content":{"message":"ERROR LALALLA\nERROR INFO NANANAN\tSOME MORE ERROR INFO\nBABABABABABBA\n BABABABA\t ABABBABAA\n\n BABABABAB\n\n"}
}
Can anyone help me on this?
A pure jq
solution:
$ jq -r '.content.message | gsub("[\\n\\t]"; "")' file.json
ERROR LALALLAERROR INFO NANANANSOME MORE ERROR INFOBABABABABABBA BABABABA ABABBABAA BABABABAB
If you want to keep the enlosing "
characters, omit -r
.
Note: peak's helpful answer contains a generalized regular expression that matches all control characters in the ASCII and Latin-1 Unicode range by way of a Unicode category specifier, \p{Cc}
. jq
uses the Oniguruma regex engine.
Other solutions, using an additional utility, such as sed
and tr
.
Using sed
to unconditionally remove escape sequences \n
and t
:
$ jq '.content.message' file.json | sed 's/\\[tn]//g'
"ERROR LALALLAERROR INFO NANANANSOME MORE ERROR INFOBABABABABABBA BABABABA ABABBABAA BABABABAB"
Note that the enclosing "
are still there, however.
To remove them, add another substitution to the sed
command:
$ jq '.content.message' file.json | sed 's/\\[tn]//g; s/"\(.*\)"/\1/'
ERROR LALALLAERROR INFO NANANANSOME MORE ERROR INFOBABABABABABBA BABABABA ABABBABAA BABABABAB
A simpler option that also removes the enclosing "
(note: output has no trailing \n
):
$ jq -r '.content.message' file.json | tr -d '\n\t'
ERROR LALALLAERROR INFO NANANANSOME MORE ERROR INFOBABABABABABBA BABABABA ABABBABAA BABABABAB
Note how -r
is used to make jq
interpolate the string (expanding the \n
and \t
sequences), which are then removed - as literals - by tr
.