Comparing Scala JSON Libraries

Evan Chan
  • Software Development

We were working on a Spark job to read JSON files out of HDFS, and it seemed to be running way too slowly. It turns out it was the JSON parsing library. So, here are some notes to help others navigate the Scala JSON parsing landscape, where there are at least 6 different libraries -- on both performance and correctness.

Our use case

We needed to de/serialize:

  • Files with nested JSON objects, one per line, with both string and numeric values - so basically a Map[String, Any]. We would like the nested maps to all be scala.Map's.
  • A case class with optional fields, some of which are Lists, and ideally should be initialized to Nil by default.

The test data is a 91,582-line file with one JSON blob per line, 34.5MB. The base time for reading this file using something like scala.io.Source(logfile).getlines.length is ~130ms.

All benchmarking was conducted on a late model MacBook Pro, using Scala 2.9.3.

The contenders:

JacksMapper 2.1.4

A thin Scala wrapper around Jackson, JacksMapper has a super simple and easy-to-use interface.

  • 80 seconds -- by far the worst time -- for deserializing the 34.5MB file
  • Natively handles deserializing to Map[String, Any], including nested maps
  • case class deserialization - missing optional fields, even Lists which default to Nil are correctly constructed. This library seems to have the best default value initialization around.

Spray-json 1.2.3

Spray-json is based on the parboiled parsing library.

  • 8 seconds to deserialize the file - 10x faster than JacksMapper. Woohoo!
  • Does not natively unpack to Map[String, Any] -- we needed to supply a new type class to handle this
  • Only treats case class fields of Option[_] type as optional - any other fields that are missing from the JSON will cause an exception to be thrown. We did not test this out as our case class did not have Options.
  • One benefit is that it has a easy API to generate pretty-printed JSON. Oh, and of course it natively integrates with spray, soon to be akka-http.

NOTE: A major new version of spray-json's backend Parboiled parser has been made available, which should result in order-of-magnitude improvements in parsing times. Unfortunately it's not been incorporated into spray-json yet as of the time of this testing.

Jerkson 0.5.0

Jerkson is an abandoned project written by Coda Hale when he was still hacking on Scala.

  • Incredibly fast - averaged 650ms for deserializing the whole file!
  • Deserializes to Map[String, Any] but nested maps are java.util.Maps -- which doesn't meet our original criteria.

Jackson-scala-module 2.1.2

This is the official Scala support module for Jackson. It's just as fast as Jerkson, and may have inherited Jerkson's work.

  • It's also around 650ms
  • Serialization doesn't work.. at least in the REPL. This is rather disappointing. It throws java.lang.AbstractMethodError.
  • Missing case class fields all get initialized with nulls. This is not bad, I suppose, but I wish proper default values such as Nil were used instead.

Json4s 3.2.5

A very promising project started by the guys from Wordnik (of Swagger fame), it aims to unify Scala JSON ASTs, sports multiple backends (including Jackson), and has native support from both Scalatra and Spray.

  • Native deserialization - 940 ms (based on the Lift web framework JSON parser)
  • Jackson deserialization - 670 ms
  • Can deserialize to Map[String, Any], including nested ones, but using some clumsy workaround, instead of the native read method
  • Missing case class fields throws an exception. :( Although, you can define alternative constructors to get around part of the issue, and writing a custom type class for deserialization is pretty easy.
  • Easy pretty printing (Serialization.writePretty)

One thing that json4s has that the others don't, is an extremely rich functional API for transforming the AST. It can also work with XML, apparently.

Summary Table

Library Time (sec)
io.Source.getLines 0.130
json4s (Jackson) 0.670
json4s (native/Lift) 0.940
jackson-scala-module 0.650
jerkson 0.650
spray-json 8
JacksMapper 80

Conclusion

None of the tested frameworks is perfect. If I had to pick one, I would go with json4s -- it has the most support and features, and with the jackson backend it performs just as fast as jackson-scala-module and jerkson.

All of the frameworks offer rich ASTs for transformation of JSON entities before finally converting back into actual Scala objects. In theory you can build an even faster

I know this post will attract lots of comments from folks saying "But what about XXX?" I apologize in advance; we only had time to test a few that we were considering to improve our correctness and performance, but suggestions are welcome.

Tags: 

LEAVE A COMMENT