jackson vs json simple for stream parsing

Jilles van Gurp picture Jilles van Gurp · Apr 20, 2013 · Viewed 10.9k times · Source

I have a json library on github https://github.com/jillesvangurp/jsonj

This library has a parser based on json simple, which uses a handler class to do all the work of creating instances of JsonObject,JsonArray, and JsonPrimitive that I have in my library.

I've seen people post various benchmarks suggesting that the jackson parser is about as good as it gets in terms of performance and that json simple is one of the slower options. So, to see if I could boost performance, I created an alternative parser that uses the jackson streaming API and calls the same handler that I used for the original parser. This works fine from a functional perspective and was pretty straightforward.

You can find the relevant classes here (JsonHandler, JsonParser and JsonParserNg): https://github.com/jillesvangurp/jsonj/tree/master/src/main/java/com/github/jsonj/tools

However, I'm not seeing any improvement on the various tests I ran.

So, my question: should I be seeing any improvement at all and if so why? It seems to me that in streaming API mode at least, both libraries have similar performance.

I'd be very interested in other people's experience with this.

Answer

StaxMan picture StaxMan · Apr 21, 2013

I wrote "On proper performance testing of Java JSON processing" a while ago, to enumerate common problems I have seen with performance benchmarking. There are lots of relatively simple ways to mess up comparison. I am assuming you are not making any of mistakes mentioned, but it is worth mentioning. Especially part about using raw input: there are very few cases where real JSON data comes as String -- so make sure to use InputStream / OutputStream (or byte arrays).

Second thing to note that is that if you use tree model (like JsonObject) you are already adding lots of potentially avoidable overhead: you are building a Map/List structures that use 3x memory that POJOs would use; and are slower to operate on. In this case, actual parsing/generation overhead is typically minority component anyway. Sometimes tree style processing makes sense, and this is acceptable overhead.

So if performance matters a lot, one typically either:

  1. Uses streaming API to build your own objects -- not an in-memory tree, or
  2. Uses data-binding to/from POJOs. This can be close to speed of (1)

both of which will be faster than building trees (and to some degree, serializing). For some reason many developers somehow assume that dealing with tree representations is as efficient way to deal with data as any -- this is not the case, and seen in benchmarks like https://github.com/eishay/jvm-serializers

I did not see Jackson-related code via link, so I am assuming it works as expected. The main things to look for (wrt performance probs) really are to:

  1. Always close JsonParser and JsonGenerator (needed for some of recycling) and
  2. Reuse JsonFactory and/or ObjectMapper instances: they are thread-safe, reuse of some components (symbol tables, serializers) occurs through these objects.
  3. As mentioned earlier, always use most raw input/output destinations if possible (InputStream, OutputStream).