Ultrafast JSON Parsing

Fork on Github
Download the Nuget package

I previously blogged about parsing JSON using JSON.NET’s JsonTextReader, during which I touched on a key point; the Large Object Heap, and why to avoid it.
We’ve all been asked at some point or another to fix a buckled project while consuming minimal cost in terms of development effort. This generally happens when a project grinds to a halt due to poor performance as a result of bad design. Nobody wants to acknowledge that they’ve commissioned one of these, so at that point, you’re called in, and expected to deliver this, within the constraints of this.

I’ve had to privilege of taking part in such projects several times, and one in particular comes to mind. The project marshalled objects from one tier to another in JSON-format. These objects were unusually large, over 1MB in most cases and overall performance was poor. The interesting thing was that only certain segments of the JSON file were necessary for the various HTTP endpoints to process. Pruning the JSON structure wasn’t an option, so I started to look at optimisation.

The first thing to come to mind was the Large Object Heap in .NET. Any .NET object over 85K in size is considered a large object, and can cause performance issues. Prior to Garbage Collection, all threads apart from the thread that triggered Garbage Collection are suspended to allow the various Generations to be released. Releasing LOH objects can introduce performance bottlenecks. In a nutshell, it takes time to release such objects (170,000 cycles, at least), and can result in excessive Garbage Collection. You may have experienced this before, when for example, IIS inexplicably hangs, though that’s not always as a result of Garbage Collection.

In any case, it occurred to me that every time a large JSON object was returned during a HTTP request, it was cached in a string variable, and from there straight onto the LOH. I mentioned that a great deal of this JSON was superfluous, so I thought about a way to parse the large JSON object and extract the necessary embedded objects. The key to this is to avoid strings, and instead deal with raw bytes through streaming. Streamed objects are processed in small chunks, where each chunk will occupy a small section of memory. Processing the large JSON structure chunk by chunk until I find what I’m looking for, sounds like a good way to avoid the LOH.

My initial post on this provided a nice and simple class to achieve this. Since then, given the potential complexity of large JSON objects, I’ve put together a library to cover more complex scenarios. The library provides the following capabilities:

  • Parse JSON 7+ times faster than JSON.NET
  • Serialise JSON several hundred times faster than JSON.NET
  • Remove specific section(s) from large JSON structures while avoiding the LOH
  • Provide a simple interface to serialise and deserialise while avoiding reflection

Here are some example scenarios:

Retrieving an Embedded object, or Series of Objects from a Large JSON Structure

Let’s say you have a large JSON object that contains among other things, 1 or more embedded objects of the following structure:

"simpleObject": {
  "name": "Simple Object",
  "count": 1
}

You can retrieve all instances of these objects with the following code:

    byte[] largeJson = GetLargeJson();
    var parser = new JsonObjectParser();

    Json.Parse(parser, new MemoryStream(largeJson), "simpleObject");
    var embeddedObjects = parser.Objects;

We’ve provided a large JSON structure in byte-format, and instructed the API to retrieve all embedded objects called “simpleObject”. The Parse method contains a reference to JsonObjectParser. This is one of two types of parser. Its counterpart is JsonArrayParser. You can alternate between the two depending on the nature of the JSON you’re parsing, whether it be object or array.

Serialising an Object without Reflection

Traditional serialisation techniques that leverage custom Attributes involve Reflection. This comes with a performance overhead. While this may be nominal, we can avoid it altogether to serialise our objects. The API provides a way to achieve this through the IHaveSerialisableProperties interface. The following object extends this interface and explicitly exposes serialisation metadata:

class ComplexArrayObject : IHaveSerialisableProperties {
  public string Name { get; set; }
  public string Description { get; set; }

  public SerialisableProperties GetSerializableProperties() {
    return new SerialisableProperties("complexArrayObject", new List {
      new StringJsonProperty {
        Key = "name",
        Value = Name
      },
      new StringJsonProperty {
        Key = "description",
        Value = Description
      }
    });
  }
}

Now that the class exposes its serialisation metadata, we can serialise it as follows:

var writer = new BinaryWriter(new MemoryStream(), new UTF8Encoding(false));
var serialisableProperties = myObject.GetSerializableProperties();

using (var serialisor = new StandardJsonSerialisationStrategy(writer)) {
  Json.Serialise(serialisor, new JsonPropertiesSerialisor(serialisableProperties));
  return serialisor.SerialisedObject;
}

We use a BinaryWriter to serialise the object, and provide its serialiable properties. We use the StandardJsonSerialisationStrategy class to effect the serialisation. This is a Builder class. Its abstraction allows us to modify the serialisation process if necessary. The returned object is serialised to a byte array. The end result is a serialised object, processed without reflection. There is a full suite of BDD specifications included with the API, including some speed tests. Incidentally, the included serialisation speed-test indicates an overall processing time of at least 20, to several hundred times faster than JSON.NET.

Deserialising an Object without Reflection

The API provides a means to traverse large JSON structures in order to extract embedded objects, very quickly. Once our object(s) are extracted, we likely want to deserialise them. From a practical perspective, I’m assuming that the extracted objects are of a reasonable size. If not, they likely need to be segmented further by extracting their contents. Assuming that the resulting object(s) are ready for serialisation, we are ready to do so. Again, let’s avoid reflection in order to optimise performance. JSON.NET offers an efficient way to do this using the JsonTextReader class. Rather than reinvent the wheel, I’ve wrapped this class in a manner that reads serialised properties into a NameValueCollection, and allows you to extract them to a POCO as follows:

Let’s say we have the following POCO:

class SimpleObject {
  public string Name { get; set; }
  public int Count { get; set; }
}

We create an associated class to facilitate deserialization:

class SimpleObjectDeserialiser : Deserialiser {
  public SimpleObjectDeserialiser(SimpleJSONParser parser) : base(parser) {}

  public override SimpleObject Deserialise() {
    var properties = parser.Parse();

    return new SimpleObject {
      Name = properties.Get("simpleObject.name"),
      Count = Convert.ToInt32(properties.Get("simpleObject.count"))
    };
  }
}

Here, we inherit from an abstraction that parses our JSON object. The parsed results are loaded into a NameValueCollection, in dot-notation format to signify hierarchy. E.g.:

  • simpleObject.name
  • simpleObject.count

We override the default Deserialise method and map the parsed properties to our POCO.
Here is an example using the above classes:

    var simpleObjectDeserialiser = new SimpleObjectDeserialiser(new SimpleJSONParser(myJson));
    var simpleObject = Json.Deserialise(simpleObjectDeserialiser);

The end result is a POCO instantiated from our JSON object.

Summary

Sometimes you can’t avoid dealing with large JSON objects – but you can avoid falling prey to memory-related performance issues.

The purpose of this API is to provide a means of segmenting JSON files, and to serialise and deserialise JSON objects in a performance-optimised manner. Leveraging this API, you can

  • Avoid the Large Object Heap
  • Avoid reflection
  • Segment unmanageable JSON into manageable chunks
  • All at exceptionally fast speeds

“Should I only use the library if I deal with big JSON objects?”

“No! Even on smaller JSON objects, this library is faster than JSON.NET – there’s a performance-win either way. This API deals with raw bytes, and leverages a concept called deferred-execution. I’m happy to talk about these, and other elements of the design, in more detail if you contact me directly.”

For a step-by-step tutorial, check out the next post.

Connect with me:

RSSGitHubTwitter
LinkedInYouTubeGoogle+

10 thoughts on “Ultrafast JSON Parsing

    1. Paul Mooney Post author

      The benchmarks that I ran were executed on a 64-bit machine, so the timings should actually improve if you run them on a 32-bit machine.
      Apart from the memory-related optimisations pertaining to 64-bit systems, there are other considerations. For example, IIS will suspend worker threads in order to handle the LOH during Garbage Collection. Certainly on a 64-bit system, this will have less of an impact in terms of performance, but our threads will still be suspended regardless of the underlying OS-architecture. I’m sure you’d agree that it would be nice to avoid this, which is essentially what JSON# does.

      Reply
  1. azsoftwaredeveloper

    How does the performance compare against someone using a custom JsonConverter where the converter uses the JsonTextWriter to manually serialize the object. And uses the JsonTextReader to manually deserialize the object? In that case we are not relying on reflection. We could run into the LOH issue if the JSON payload contains a large array of objects that we are deserializing.

    Reply
    1. Paul Mooney Post author

      Thanks for your comment. I’ll be uploading a suite of benchmarks that will compare the several mechanisms of serialisation more thoroughly shortly. In theory though, they should be on par from a performance perspective. However, doing so would require that you write a lot of code, customised for each POCO. JSON# saves you the hassle; its serialisation mechanism is completely bespoke, but it leverages the JsonTextReader during deserialisation, wrapping it in order to provide a reusable feature to reduce code volume while retaining optimal performance.

      Reply

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s