Tag Archives: optimisation

JSON# – Tutorial #3: Serialising Complex Objects

Fork on Github
Download the Nuget package

The last tutorial focused on serialising simple JSON objects. This tutorial contains a more complex example.

Real-world objects are generally more complex than typical “Hello, World” examples. Let’s build such an object; and object that contains complex properties, such as other objects and collections. We’ll start by defining a sub-object:

class SimpleSubObject: IHaveSerialisableProperties {
    public string Name { get; set; }
    public string Description { get; set; }

    public SerialisableProperties GetSerializableProperties() {
        return new SerialisableProperties("simpleSubObject", new List<JsonProperty> {
            new StringJsonProperty {
                Key = "name",
                Value = Name
            },
            new StringJsonProperty {
                Key = "description",
                Value = Description
            }
        });
    }
}

This object contains 2 simple properties; Name and Description. As before, we implement the IHaveSerialisableProperties interface to allow JSON# to serialise the object. Now let’s define an object with a property that is a collection of SimpleSubObjects:

class ComplexObject: IHaveSerialisableProperties {
    public string Name { get; set; }
    public string Description { get; set; }

    public List<SimpleSubObject> SimpleSubObjects { get; set; }
    public List<double> Doubles { get; set; }

    public SerialisableProperties GetSerializableProperties() {
        return new SerialisableProperties("complexObject", new List<JsonProperty> {
            new StringJsonProperty {
                Key = "name",
                Value = Name
            },
            new StringJsonProperty {
                Key = "description",
                Value = Description
            }
        }, 
        new List<JsonSerialisor> {
            new ComplexJsonArraySerialisor("simpleSubObjects",
                SimpleSubObjects.Select(c => c.GetSerializableProperties())),
            new JsonArraySerialisor("doubles",
                Doubles.Select(d => d.ToString(CultureInfo.InvariantCulture)), JsonPropertyType.Numeric)
        });
    }
}

This object contains some simple properties, as well as 2 collections; the first, a collection of Double, the second, a collection of SimpleSubObject type.

Note the GetSerializableProperties method in ComplexObject. It accepts a collection parameter of type JsonSerialisor, whichrepresents the highest level of abstraction in terms of the core serialisation components in JSON#. In order to serialise our collection of SimpleSubObjects, we leverage an implementation of JsonSerialisor called ComplexJsonArraySerialisor, designed specifically to serialise collections of objects, as opposed to primitive types. Given that each SimpleSubObject in our collection contains an implementation of GetSerializableProperties, we simply pass the result of each method to the ComplexJsonArraySerialisor constructor. It will handle the rest.

We follow a similar process to serialise the collection of Double, in this case leveraging JsonArraySerialisor, another implementation of JsonSerialisor, specifically designed to manage collections of primitive types. We simply provide the collection of Double in their raw format to the serialisor.

Let’s instantiate a new instance of ComplexObject:

var complexObject = new ComplexObject {
    Name = "Complex Object",
    Description = "A complex object",

    SimpleSubObjects = new List<SimpleSubObject> {
        new SimpleSubObject {
            Name = "Sub Object #1",
            Description = "The 1st sub object"
        },
            new SimpleSubObject {
            Name = "Sub Object #2",
            Description = "The 2nd sub object"
        }
    },
    Doubles = new List<double> {
        1d, 2.5d, 10.8d
    }
};

As per the previous tutorial, we serialise as follows:

var writer = new BinaryWriter(new MemoryStream(), new UTF8Encoding(false));
var serialisableProperties = complexObject.GetSerializableProperties();

using (var serialisor = new StandardJsonSerialisationStrategy(writer))
    Json.Serialise(serialisor, new JsonPropertiesSerialisor(serialisableProperties));

Note the use of StandardJsonSerialisationStrategy here. This is the only implementation of JsonSerialisationStrategy, one of the core serialisation components in JSON#. The abstraction exists to provide extensibility, so that different strategies might be applied at runtime, should specific serialisation rules vary across requirements.

In the next tutorial I’ll discuss deserialising objects using JSON#.

Connect with me:

RSSGitHubTwitter
LinkedInYouTubeGoogle+

JSON# – Tutorial #2: Serialising Simple Objects

Fork on Github
Download the Nuget package

The last tutorial focused on parsing embedded JSON objects. This time, we’ll focus on serialising simple objects in C#.

Object serialisation using JSON# is 25 times to several hundred times faster than serialisation using JSON.NET, on a quad-core CPU with 16GB RAM. The source code is written in a BDD-manner, and the associated BDD features contain performance tests that back up these figures.

Let’s start with a basic class in C#:

class SimpleObject {
    public string Name { get; set; }
    public int Count { get; set; }
}

Our first step is to provide serialisation metadata to JSON#. Traditionally, most frameworks use Reflection to achieve this. While this works very well, it requires the component to know specific assembly metadata that describes your object. This comes with a slight performance penalty.

Ideally, when leveraging Reflection, the optimal design is a solution that reads an object’s assembly metadata once, and caches the result for the duration of the application’s run-time. This is generally not achievable with stateless HTTP calls. Using Reflection, we will likely query the object’s assembly during each HTTP request when serialising or de-serialising an object, suffering the associated performance-overhead for each request.

JSON# allows us to avoid that overhead by exposing serialisation metadata in the class itself:

class SimpleObject : IHaveSerialisableProperties {
    public string Name { get; set; }
    public int Count { get; set; }

    public virtual SerialisableProperties GetSerializableProperties() {
        return new SerialisableProperties("simpleObject", 
        new List<JsonProperty> {
            new StringJsonProperty {
                Key = "name",
                Value = Name
            },
            new NumericJsonProperty {
                Key = "count",
                Value = Count
            }
        });
    }
}

First, we need to implement the IHaveSerialisableProperties interface, allowing JSON# to serialise our object. Notice the new method, GetSerializableProperties, that returns a SerialisableProperties object, which looks like this:

public class SerialisableProperties {
   public string ObjectName { get; set; }
       public IEnumerable<JsonProperty> Properties { get; private set; }
       public IEnumerable<JsonSerialisor> Serialisors { get; set; }

       public SerialisableProperties(IEnumerable<JsonProperty> properties) {
           Properties = properties;
       }

       public SerialisableProperties(IEnumerable<JsonSerialisor> serialisors) {
           Serialisors = serialisors;
       }

       public SerialisableProperties(string objectName,
           IEnumerable<JsonProperty> properties) : this(properties) {
           ObjectName = objectName;
       }

       public SerialisableProperties(string objectName,
           IEnumerable<JsonSerialisor> serialisors) : this(serialisors) {
           ObjectName = objectName;
       }

       public SerialisableProperties(IEnumerable<JsonProperty> properties,
            IEnumerable<JsonSerialisor> serialisors) : this(properties) {
            Serialisors = serialisors;
        }

        public SerialisableProperties(string objectName,
            IEnumerable<JsonProperty> properties, IEnumerable<JsonSerialisor> serialisors)
            : this(properties, serialisors) {
            ObjectName = objectName;
        }
    }
}

This object is essentially a mapper that outlines how an object should be serialised. Simple types are stored in the Properties property, while more complex types are retrieved through custom JsonSerialisor objects, which I will discuss in the next tutorial. The following code outlines the process involved in serialising a SimpleObject instance:

First, we initialise our object

    var simpleObject = new SimpleObject {Name = "Simple Object", Count = 10};

Now initialise a BinaryWriter, setting the appropriate Encoding. This will be used to build the object’s JSON-representation, under-the-hood.

var writer = new BinaryWriter(new MemoryStream(), new UTF8Encoding(false));

Now we use our Json library to serialise the object

var serialisableProperties = simpleObject.GetSerializableProperties();
byte[] serialisedObject;

using (var serialisor = new StandardJsonSerialisationStrategy(writer)) {
    Json.Serialise(serialisor, new JsonPropertiesSerialisor(serialisableProperties));
    serialisedObject = serialisor.SerialisedObject;
}

Below is the complete code-listing:

var simpleObject = new SimpleObject {Name = "Simple Object", Count = 10};

var writer = new BinaryWriter(new MemoryStream(), new UTF8Encoding(false));
var serialisableProperties = simpleObject.GetSerializableProperties();

byte[] serialisedObject;

using (var serialisor = new StandardJsonSerialisationStrategy(writer)) {
    Json.Serialise(serialisor, new JsonPropertiesSerialisor(serialisableProperties));
    serialisedObject = serialisor.SerialisedObject;
}

Now our serialisedObject variable contains a JSON-serialised representation of our SimpleObject instance, as an array of raw bytes. We’ve achieved this without Reflection, by implementing a simple interface, IHaveSerialisableProperties in our SimpleObject class, and have avoided potentially significant performance-overhead; while a single scenario involving reflection might involve very little performance-overhead, consider a web application under heavy load, leveraging Reflection. We can undoubtedly support more concurrent users per application tier if we avoid Reflection. JSON# allows us to do just that.

In the next tutorial, I’ll discuss serialising complex objects.

Connect with me:

RSSGitHubTwitter
LinkedInYouTubeGoogle+

JSON# – Tutorial #1: Returning Embedded-Objects

Fork on Github
Download the Nuget package

I’ve previously blogged about the premise behind JSON#. For a full explanation of the theory behind the code, check out this post.

Now, let’s dive into an example…

Let’s say that we have a JSON object that represents a real-world object, like a classroom full of students. It might look something like this:

{
    "classroom": {
        "teachers": [
            {
                "name": "Pablo",
                "age": 33
            },
            {
                "name": "John",
                "age": 28
            }
        ],
        "blackboard": {
            "madeOf": "wood",
            "height": "100",
            "width": "500"
        }
    }
}

This doesn’t look like it presents any great challenge to parse on any platform. But what if we expand it further to describe a school:

{
    "school": {
        "classrooms": [
            {
                "name": "Room #1",
                "teachers": [
                    {
                        "name": "Pablo",
                        "age": 33
                    },
                    {
                        "name": "John",
                        "age": 28
                    }
                ],
                "blackboard": {
                    "madeOf": "wood",
                    "height": "100",
                    "width": "500"
                }
            },
            {
                "name": "Room #2",
                "teachers": [
                    {
                        "name": "David",
                        "age": 33
                    },
                    {
                        "name": "Mary",
                        "age": 28
                    }
                ],
                "blackboard": {
                    "madeOf": "metal",
                    "height": "200",
                    "width": "600"
                }
            }
        ]
    }
}

Notice that our school object contains 2 classrooms, each of which contain similar objects, such as “blackboard”. Imagine that our school object needs to represent every school in the country. For argument’s sake, let’s say that we need to retrieve details about the blackboard in every classroom of every school. How would we go about that?

Well, we could refer to one of numerous JSON-parsing tools available. But how do these tools actually operate? Firstly, our massive JSON object will likely end up a large object. I’ve mentioned in previous blogs that objects greater than 85KB can significantly impact performance. So, immediately we’re potentially in trouble.

We can always cache the JSON object as a Stream, and read from it byte-by-byte. Tools like JSON.net offer capabilities like this, using components such as the JsonTextReader. So we’ve overcome the performance overhead associated with storing big strings in memory. But now we have another problem – we’re drilling into a massive JSON file, and searching for metadata that’s spread widely. We’re going to have to implement a lot of logic in order to draw the “blackboard” objects out.

What if requirements change, and we no longer need the “blackboard” objects? Now we just need the height of each blackboard. Well, we’ll have to throw out a lot of code, which is essentially wasted effort. Requirements change again, and we no longer need “blackboard” objects at all – now we need “teacher” objects. We need to rewrite all of our logic. Not the most flexible solution.

Let’s do this instead:

First, download the JSON# library from Github (MIT license).

Now let’s get those “blackboard” objects:

    const string schoolMetadata = @"{ "school": {...";
    var jsonParser = new JsonObjectParser();

    using (var stream = new MemoryStream(Encoding.UTF8.GetBytes(schoolMetadata))) {
        Json.Parse(jsonParser, stream, "blackboard");
    }

This will return all “blackboard” objects from the JSON file. Let’s say that our requirements change. Now we need all “teacher” objects instead. Simply change the code as follows:

    const string schoolMetadata = @"{ "school": {...";
    var jsonParser = new JsonObjectParser();

    using (var stream = new MemoryStream(Encoding.UTF8.GetBytes(schoolMetadata))) {
        Json.Parse(jsonParser, stream, "teachers");
    }

Such a change would have required significant effort, had we implemented our own custom logic using JSON.net’s JsonTextReader, or a similar component. Using JSON#, we achieve this by changing a single word. Now we’ve:

  • Optimised performance
  • Reduced our application’s memory-footprint
  • Avoided the Large Object Heap
  • Reduced development-time

The next tutorial outlines how to serialise objects using JSON#.

Connect with me:

RSSGitHubTwitter
LinkedInYouTubeGoogle+

Ultrafast JSON Parsing

Fork on Github
Download the Nuget package

I previously blogged about parsing JSON using JSON.NET’s JsonTextReader, during which I touched on a key point; the Large Object Heap, and why to avoid it.
We’ve all been asked at some point or another to fix a buckled project while consuming minimal cost in terms of development effort. This generally happens when a project grinds to a halt due to poor performance as a result of bad design. Nobody wants to acknowledge that they’ve commissioned one of these, so at that point, you’re called in, and expected to deliver this, within the constraints of this.

I’ve had to privilege of taking part in such projects several times, and one in particular comes to mind. The project marshalled objects from one tier to another in JSON-format. These objects were unusually large, over 1MB in most cases and overall performance was poor. The interesting thing was that only certain segments of the JSON file were necessary for the various HTTP endpoints to process. Pruning the JSON structure wasn’t an option, so I started to look at optimisation.

The first thing to come to mind was the Large Object Heap in .NET. Any .NET object over 85K in size is considered a large object, and can cause performance issues. Prior to Garbage Collection, all threads apart from the thread that triggered Garbage Collection are suspended to allow the various Generations to be released. Releasing LOH objects can introduce performance bottlenecks. In a nutshell, it takes time to release such objects (170,000 cycles, at least), and can result in excessive Garbage Collection. You may have experienced this before, when for example, IIS inexplicably hangs, though that’s not always as a result of Garbage Collection.

In any case, it occurred to me that every time a large JSON object was returned during a HTTP request, it was cached in a string variable, and from there straight onto the LOH. I mentioned that a great deal of this JSON was superfluous, so I thought about a way to parse the large JSON object and extract the necessary embedded objects. The key to this is to avoid strings, and instead deal with raw bytes through streaming. Streamed objects are processed in small chunks, where each chunk will occupy a small section of memory. Processing the large JSON structure chunk by chunk until I find what I’m looking for, sounds like a good way to avoid the LOH.

My initial post on this provided a nice and simple class to achieve this. Since then, given the potential complexity of large JSON objects, I’ve put together a library to cover more complex scenarios. The library provides the following capabilities:

  • Parse JSON 7+ times faster than JSON.NET
  • Serialise JSON several hundred times faster than JSON.NET
  • Remove specific section(s) from large JSON structures while avoiding the LOH
  • Provide a simple interface to serialise and deserialise while avoiding reflection

Here are some example scenarios:

Retrieving an Embedded object, or Series of Objects from a Large JSON Structure

Let’s say you have a large JSON object that contains among other things, 1 or more embedded objects of the following structure:

"simpleObject": {
  "name": "Simple Object",
  "count": 1
}

You can retrieve all instances of these objects with the following code:

    byte[] largeJson = GetLargeJson();
    var parser = new JsonObjectParser();

    Json.Parse(parser, new MemoryStream(largeJson), "simpleObject");
    var embeddedObjects = parser.Objects;

We’ve provided a large JSON structure in byte-format, and instructed the API to retrieve all embedded objects called “simpleObject”. The Parse method contains a reference to JsonObjectParser. This is one of two types of parser. Its counterpart is JsonArrayParser. You can alternate between the two depending on the nature of the JSON you’re parsing, whether it be object or array.

Serialising an Object without Reflection

Traditional serialisation techniques that leverage custom Attributes involve Reflection. This comes with a performance overhead. While this may be nominal, we can avoid it altogether to serialise our objects. The API provides a way to achieve this through the IHaveSerialisableProperties interface. The following object extends this interface and explicitly exposes serialisation metadata:

class ComplexArrayObject : IHaveSerialisableProperties {
  public string Name { get; set; }
  public string Description { get; set; }

  public SerialisableProperties GetSerializableProperties() {
    return new SerialisableProperties("complexArrayObject", new List {
      new StringJsonProperty {
        Key = "name",
        Value = Name
      },
      new StringJsonProperty {
        Key = "description",
        Value = Description
      }
    });
  }
}

Now that the class exposes its serialisation metadata, we can serialise it as follows:

var writer = new BinaryWriter(new MemoryStream(), new UTF8Encoding(false));
var serialisableProperties = myObject.GetSerializableProperties();

using (var serialisor = new StandardJsonSerialisationStrategy(writer)) {
  Json.Serialise(serialisor, new JsonPropertiesSerialisor(serialisableProperties));
  return serialisor.SerialisedObject;
}

We use a BinaryWriter to serialise the object, and provide its serialiable properties. We use the StandardJsonSerialisationStrategy class to effect the serialisation. This is a Builder class. Its abstraction allows us to modify the serialisation process if necessary. The returned object is serialised to a byte array. The end result is a serialised object, processed without reflection. There is a full suite of BDD specifications included with the API, including some speed tests. Incidentally, the included serialisation speed-test indicates an overall processing time of at least 20, to several hundred times faster than JSON.NET.

Deserialising an Object without Reflection

The API provides a means to traverse large JSON structures in order to extract embedded objects, very quickly. Once our object(s) are extracted, we likely want to deserialise them. From a practical perspective, I’m assuming that the extracted objects are of a reasonable size. If not, they likely need to be segmented further by extracting their contents. Assuming that the resulting object(s) are ready for serialisation, we are ready to do so. Again, let’s avoid reflection in order to optimise performance. JSON.NET offers an efficient way to do this using the JsonTextReader class. Rather than reinvent the wheel, I’ve wrapped this class in a manner that reads serialised properties into a NameValueCollection, and allows you to extract them to a POCO as follows:

Let’s say we have the following POCO:

class SimpleObject {
  public string Name { get; set; }
  public int Count { get; set; }
}

We create an associated class to facilitate deserialization:

class SimpleObjectDeserialiser : Deserialiser {
  public SimpleObjectDeserialiser(SimpleJSONParser parser) : base(parser) {}

  public override SimpleObject Deserialise() {
    var properties = parser.Parse();

    return new SimpleObject {
      Name = properties.Get("simpleObject.name"),
      Count = Convert.ToInt32(properties.Get("simpleObject.count"))
    };
  }
}

Here, we inherit from an abstraction that parses our JSON object. The parsed results are loaded into a NameValueCollection, in dot-notation format to signify hierarchy. E.g.:

  • simpleObject.name
  • simpleObject.count

We override the default Deserialise method and map the parsed properties to our POCO.
Here is an example using the above classes:

    var simpleObjectDeserialiser = new SimpleObjectDeserialiser(new SimpleJSONParser(myJson));
    var simpleObject = Json.Deserialise(simpleObjectDeserialiser);

The end result is a POCO instantiated from our JSON object.

Summary

Sometimes you can’t avoid dealing with large JSON objects – but you can avoid falling prey to memory-related performance issues.

The purpose of this API is to provide a means of segmenting JSON files, and to serialise and deserialise JSON objects in a performance-optimised manner. Leveraging this API, you can

  • Avoid the Large Object Heap
  • Avoid reflection
  • Segment unmanageable JSON into manageable chunks
  • All at exceptionally fast speeds

“Should I only use the library if I deal with big JSON objects?”

“No! Even on smaller JSON objects, this library is faster than JSON.NET – there’s a performance-win either way. This API deals with raw bytes, and leverages a concept called deferred-execution. I’m happy to talk about these, and other elements of the design, in more detail if you contact me directly.”

For a step-by-step tutorial, check out the next post.

Connect with me:

RSSGitHubTwitter
LinkedInYouTubeGoogle+