Go JSON decoding is very slow. What would be a better way to do it?

shiro_goto picture shiro_goto · Mar 26, 2015 · Viewed 8.7k times · Source

I am using Go, Revel WAF and Redis.

I have to store large json data in Redis (maybe 20MB).

json.Unmarshal() takes about roughly 5 seconds. What would be a better way to do it?

I tried JsonLib, encode/json, ffjson, megajson, but none of them were fast enough.

I thought about using groupcache, but Json is updated in real time.

This is the sample code:

package main

import (
 "github.com/garyburd/redigo/redis"
  json "github.com/pquerna/ffjson/ffjson"
)

func main() {
  c, err := redis.Dial("tcp", ":6379")
  defer c.Close()
  pointTable, err := redis.String(c.Do("GET", "data"))
  var hashPoint map[string][]float64
  json.Unmarshal([]byte(pointTable), &hashPoint) //Problem!!!
}

Answer

Tobia picture Tobia · Mar 29, 2015

Parsing large JSON data does seem to be slower than it should be. It would be worthwhile to pinpoint the cause and submit a patch to the Go authors.

In the meantime, if you can avoid JSON and use a binary format, you will not only avoid this issue; you will also gain the time your code is now spending parsing ASCII decimal representations of numbers into their binary IEEE 754 equivalents (and possibly introducing rounding errors while doing so.)

If both your sender and receiver are written in Go, I suggest using Go's binary format: gob.

Doing a quick test, generating a map with 2000 entries, each a slice with 1050 simple floats, gives me 20 MB of JSON, which takes 1.16 sec to parse on my machine.

For these quick benchmarks, I take the best of three runs, but I make sure to only measure the actual parsing time, with t0 := time.Now() before the Unmarshal call and printing time.Now().Sub(t0) after it.

Using GOB, the same map results in 18 MB of data, which takes 115 ms to parse:
one tenth the time.

Your results will vary depending on how many actual floats you have there. If yours floats have a lot of significant digits, deserving their float64 representation, then 20 MB of JSON will contain much fewer than my two million floats. In that case the difference between JSON and GOB will be ever starker.

BTW, this proves that the problem lies indeed in the JSON parser, not in the amount of data to parse, nor in the memory structures to create (because both tests are parsing ~ 20 MB of data and recreating the same slices of floats.) Replacing all the floats with strings in the JSON gives me a parsing time of 1.02 sec, confirming that the conversion from string representation to binary floats does takes a certain time (compared to just moving bytes around) but is not the main culprit.

If the sender and the parser are not both Go, or if you want to squeeze the performance even further than GOB, you should use your own customised binary format, either using Protocol Buffers or manually with "encoding/binary" and friends.