Create JSON using jq from pipe-separated keys and values in bash

michael_65 picture michael_65 · Aug 9, 2016 · Viewed 38.5k times · Source

I am trying to create a json object from a string in bash. The string is as follows.

CONTAINER|CPU%|MEMUSAGE/LIMIT|MEM%|NETI/O|BLOCKI/O|PIDS
nginx_container|0.02%|25.09MiB/15.26GiB|0.16%|0B/0B|22.09MB/4.096kB|0

The output is from docker stats command and my end goal is to publish custom metrics to aws cloudwatch. I would like to format this string as json.

{
    "CONTAINER":"nginx_container",
    "CPU%":"0.02%", 
    ....
}

I have used jq command before and it seems like it should work well in this case but I have not been able to come up with a good solution yet. Other than hardcoding variable names and indexing using sed or awk. Then creating a json from scratch. Any suggestions would be appreciated. Thanks.

Answer

Charles Duffy picture Charles Duffy · Aug 10, 2016

Prerequisite

For all of the below, it's assumed that your content is in a shell variable named s:

s='CONTAINER|CPU%|MEMUSAGE/LIMIT|MEM%|NETI/O|BLOCKI/O|PIDS
nginx_container|0.02%|25.09MiB/15.26GiB|0.16%|0B/0B|22.09MB/4.096kB|0'

What (modern jq)

# thanks to @JeffMercado and @chepner for refinements, see comments
jq -Rn '
( input  | split("|") ) as $keys |
( inputs | split("|") ) as $vals |
[[$keys, $vals] | transpose[] | {key:.[0],value:.[1]}] | from_entries
' <<<"$s"

How (modern jq)

This requires very new (probably 1.5?) jq to work, and is a dense chunk of code. To break it down:

  • Using -n prevents jq from reading stdin on its own, leaving the entirety of the input stream available to be read by input and inputs -- the former to read a single line, and the latter to read all remaining lines. (-R, for raw input, causes textual lines rather than JSON objects to be read).
  • With [$keys, $vals] | transpose[], we're generating [key, value] pairs (in Python terms, zipping the two lists).
  • With {key:.[0],value:.[1]}, we're making each [key, value] pair into an object of the form {"key": key, "value": value}
  • With from_entries, we're combining those pairs into objects containing those keys and values.

What (shell-assisted)

This will work with a significantly older jq than the above, and is an easily adopted approach for scenarios where a native-jq solution can be harder to wrangle:

{
   IFS='|' read -r -a keys # read first line into an array of strings

   ## read each subsequent line into an array named "values"
   while IFS='|' read -r -a values; do

    # setup: positional arguments to pass in literal variables, query with code    
    jq_args=( )
    jq_query='.'

    # copy values into the arguments, reference them from the generated code    
    for idx in "${!values[@]}"; do
        [[ ${keys[$idx]} ]] || continue # skip values with no corresponding key
        jq_args+=( --arg "key$idx"   "${keys[$idx]}"   )
        jq_args+=( --arg "value$idx" "${values[$idx]}" )
        jq_query+=" | .[\$key${idx}]=\$value${idx}"
    done

    # run the generated command
    jq "${jq_args[@]}" "$jq_query" <<<'{}'
  done
} <<<"$s"

How (shell-assisted)

The invoked jq command from the above is similar to:

jq --arg key0   'CONTAINER' \
   --arg value0 'nginx_container' \
   --arg key1   'CPU%' \
   --arg value1 '0.0.2%' \
   --arg key2   'MEMUSAGE/LIMIT' \
   --arg value2 '25.09MiB/15.26GiB' \
   '. | .[$key0]=$value0 | .[$key1]=$value1 | .[$key2]=$value2' \
   <<<'{}'

...passing each key and value out-of-band (such that it's treated as a literal string rather than parsed as JSON), then referring to them individually.


Result

Either of the above will emit:

{
  "CONTAINER": "nginx_container",
  "CPU%": "0.02%",
  "MEMUSAGE/LIMIT": "25.09MiB/15.26GiB",
  "MEM%": "0.16%",
  "NETI/O": "0B/0B",
  "BLOCKI/O": "22.09MB/4.096kB",
  "PIDS": "0"
}

Why

In short: Because it's guaranteed to generate valid JSON as output.

Consider the following as an example that would break more naive approaches:

s='key ending in a backslash\
value "with quotes"'

Sure, these are unexpected scenarios, but jq knows how to deal with them:

{
  "key ending in a backslash\\": "value \"with quotes\""
}

...whereas an implementation that didn't understand JSON strings could easily end up emitting:

{
  "key ending in a backslash\": "value "with quotes""
}