Category Archives: Microservice Architecture

JSON# – Tutorial #4: Deserialising Simple Objects

Fork on Github
Download the Nuget package

The previous tutorial focused on serialising complex JSON objects. This tutorial describes the process of deserialisation using JSON#.

The purpose of JSON# is to allow memory-efficient JSON processing. At the root of this is the ability to dissect large JSON files and extract smaller structures from within, as discussed in Tutorial #1.

One thing, among many, that JSON.NET achieves, is allow memory-efficient JSON-parsing using the JsonTextReader component. The last thing that I want to do is reinvent the wheel; to that end, JSON# also provides a simple means of deserialisation, which wraps JsonTextReader. Using JsonTextReader alone, requires quite a lot of custom code, so I’ve provided a wrapper to make things simpler and allow reusability.

At the root of the deserialization process lies the StandardJsonNameValueCollection class. This is an implementation of the JsonNameValueCollection, from which custom implementations can be derived, if necessary, in a classic bridge design, leveraged from the Deserialiser component. Very simply, it reads JSON node values and stores them as key-value pairs; they key is the node’s path from root, and the value is the node’s value. This allows us to cache the JSON object and store it in a manner that provides easy access to its values, without traversing the tree:

class StandardJsonNameValueCollection : JsonNameValueCollection {
    public StandardJsonNameValueCollection(string json) : base(json) {}

    public override NameValueCollection Parse() {
    var parsed = new NameValueCollection();

    using (var reader = new JsonTextReader(new StringReader(json))) {
        while (reader.Read()) {
            if (reader.Value != null &amp;&amp; !reader.TokenType.Equals(JsonToken.PropertyName))
                parsed.Add(reader.Path, reader.Value.ToString());
            }
            return parsed;
        }
    }
}

Let’s work through an example using our SimpleObject class from previous tutorials:

class SimpleObject : IHaveSerialisableProperties {
    public string Name { get; set; }
    public int Count { get; set; }

    public virtual SerialisableProperties GetSerializableProperties() {
        return new SerialisableProperties("simpleObject", new List {
            new StringJsonProperty {
                Key = "name",
                Value = Name
            },
            new NumericJsonProperty {
                Key = "count",
                Value = Count
            }
        });
    }
}

Consider this class represented as a JSON object:

{
    "simpleObject": {
        "name": "SimpleObject",
        "count": 1
    }
}

The JSON# JsonNameValueCollection implementation will read each node in this JSON object and return the values in a NameValueCollection. Once stored, we need only provide a mechanism to instantiate a new SimpleObject POCO with the key-value pairs. JSON# provides the Deserialiser class as an abstraction to provide this functionality. Let’s create a class that accepts the JsonNameValueCollection and uses it to populate an associated POCO:

class SimpleObjectDeserialiser : Deserialiser {
    public SimpleObjectDeserialiser(JsonNameValueCollection parser) : base(parser) {}

    public override SimpleObject Deserialise() {
        var properties = jsonNameValueCollection.Parse();
        return new SimpleObject {
            Name = properties.Get("simpleObject.name"),
            Count = Convert.ToInt32(properties.Get("simpleObject.count"))
        };
    }
}

This class contains a single method designed to map properties. As you can see from the code snippet above, we read each key as a representation of corresponding node’s path, and then bind the associated value to a POCO property.
JSON# leverages the SimpleObjectDeserialiser as a bridge to deserialise the JSON string:

const string json = "{\"simpleObject\":{\"name\":\"SimpleObject\",\"count\":1}}";
var simpleObjectDeserialiser = new SimpleObjectDeserialiser(new StandardJsonNameValueCollection(json));
var simpleObject = Json.Deserialise(_simpleObjectDeserialiser);

So, why do this when I can just bind my objects dynamically using JSON.NET:

JsonConvert.DeserializeObject(json);

There is nothing wrong with deserialising in the above manner. But let’s look at what happens. First, it will use Reflection to look up CLR metadata pertaining to your object, in order to construct an instance. Again, this is ok, but bear in mind the performance overhead involved, and consider the consequences in a web application under heavy load.

Second, it requires that you decorate your object with serialisation attributes, which lack flexibility in my opinion.

Thirdly, if your object is quite large, specifically over 85K in size, you may run into memory issues if your object is bound to the Large Object Heap.

Deserialisation using JSON#, on the other hand, reads the JSON in stream-format, byte-by-byte, guaranteeing a small memory footprint, nor does it require Reflection. Instead, your custom implementation of Deserialiser allows a greater degree of flexibility when populating your POCO, than simply decorating properties with attributes.

I’ll cover deserialising more complex objects in the next tutorial, specifically, objects that contain complex properties such as arrays.

Connect with me:

JSON# – Tutorial #3: Serialising Complex Objects

Leave a reply

Fork on Github
Download the Nuget package

The last tutorial focused on serialising simple JSON objects. This tutorial contains a more complex example.

Real-world objects are generally more complex than typical “Hello, World” examples. Let’s build such an object; and object that contains complex properties, such as other objects and collections. We’ll start by defining a sub-object:

class SimpleSubObject: IHaveSerialisableProperties {
    public string Name { get; set; }
    public string Description { get; set; }

    public SerialisableProperties GetSerializableProperties() {
        return new SerialisableProperties(&quot;simpleSubObject&quot;, new List&lt;JsonProperty&gt; {
            new StringJsonProperty {
                Key = &quot;name&quot;,
                Value = Name
            },
            new StringJsonProperty {
                Key = &quot;description&quot;,
                Value = Description
            }
        });
    }
}

This object contains 2 simple properties; Name and Description. As before, we implement the IHaveSerialisableProperties interface to allow JSON# to serialise the object. Now let’s define an object with a property that is a collection of SimpleSubObjects:

class ComplexObject: IHaveSerialisableProperties {
    public string Name { get; set; }
    public string Description { get; set; }

    public List&lt;SimpleSubObject&gt; SimpleSubObjects { get; set; }
    public List&lt;double&gt; Doubles { get; set; }

    public SerialisableProperties GetSerializableProperties() {
        return new SerialisableProperties(&quot;complexObject&quot;, new List&amp;lt;JsonProperty&amp;gt; {
            new StringJsonProperty {
                Key = &quot;name&quot;,
                Value = Name
            },
            new StringJsonProperty {
                Key = &quot;description&quot;,
                Value = Description
            }
        }, 
        new List&lt;JsonSerialisor&gt; {
            new ComplexJsonArraySerialisor(&quot;simpleSubObjects&quot;,
                SimpleSubObjects.Select(c =&amp;gt; c.GetSerializableProperties())),
            new JsonArraySerialisor(&quot;doubles&quot;,
                Doubles.Select(d =&amp;gt; d.ToString(CultureInfo.InvariantCulture)), JsonPropertyType.Numeric)
        });
    }
}

This object contains some simple properties, as well as 2 collections; the first, a collection of Double, the second, a collection of SimpleSubObject type.

Note the GetSerializableProperties method in ComplexObject. It accepts a collection parameter of type JsonSerialisor, whichrepresents the highest level of abstraction in terms of the core serialisation components in JSON#. In order to serialise our collection of SimpleSubObjects, we leverage an implementation of JsonSerialisor called ComplexJsonArraySerialisor, designed specifically to serialise collections of objects, as opposed to primitive types. Given that each SimpleSubObject in our collection contains an implementation of GetSerializableProperties, we simply pass the result of each method to the ComplexJsonArraySerialisor constructor. It will handle the rest.

We follow a similar process to serialise the collection of Double, in this case leveraging JsonArraySerialisor, another implementation of JsonSerialisor, specifically designed to manage collections of primitive types. We simply provide the collection of Double in their raw format to the serialisor.

Let’s instantiate a new instance of ComplexObject:

var complexObject = new ComplexObject {
    Name = &quot;Complex Object&quot;,
    Description = &quot;A complex object&quot;,

    SimpleSubObjects = new List&lt;SimpleSubObject&gt; {
        new SimpleSubObject {
            Name = &quot;Sub Object #1&quot;,
            Description = &quot;The 1st sub object&quot;
        },
            new SimpleSubObject {
            Name = &quot;Sub Object #2&quot;,
            Description = &quot;The 2nd sub object&quot;
        }
    },
    Doubles = new List&lt;double&gt; {
        1d, 2.5d, 10.8d
    }
};

As per the previous tutorial, we serialise as follows:

var writer = new BinaryWriter(new MemoryStream(), new UTF8Encoding(false));
var serialisableProperties = complexObject.GetSerializableProperties();

using (var serialisor = new StandardJsonSerialisationStrategy(writer))
    Json.Serialise(serialisor, new JsonPropertiesSerialisor(serialisableProperties));

Note the use of StandardJsonSerialisationStrategy here. This is the only implementation of JsonSerialisationStrategy, one of the core serialisation components in JSON#. The abstraction exists to provide extensibility, so that different strategies might be applied at runtime, should specific serialisation rules vary across requirements.

In the next tutorial I’ll discuss deserialising objects using JSON#.

Connect with me:

Load Balancing a RabbitMQ Cluster

82 Replies

The Problem

Given a cluster of RabbitMQ nodes, we want to achieve effective load-balancing.

The Solution

First, let’s look at the feature at the core of RabbitMQ – the Queue itself.
RabbitMQ Queues are singular structures that do not exist on more than one Node. Let’s say that you have a load-balanced, HA (Highly Available) RabbitMQ cluster as follows:

RabbitMQ Cluster with Load Balancer

Nodes 1 – 3 are replicated across each other, so that snapshots of all HA-compliant Queues are synchronised across each node. Suppose that we log onto the RabbitMQ Admin Console and create a new HA-configured Queue. Our Load Balancer is configured in a Round Robin manner, and in this instance, we are directed to Node #2, for argument’s sake. Our new Queue is created on Node #2. Note: it is possible to explicitly choose the node that you would like your Queue to reside on, but let’s ignore that for the purpose of this example.

Now our new Queue, “NewQueue”, exists on Node #2. Our HA policy kicks in, and the Queue is replicated across all nodes. We start adding messages to the Queue, and those messages are also replicated across each node. Essentially, a snapshot of the Queue is taken, and this snapshot is replicated across each node after a non-determinable period of time has elapsed (it actually occurs as an asynchronous background task, when the Queue’s state changes).

We may look at this from an intuitive perspective and conclude that each node now contains an instance of our new Queue. Thus, when we connect to RabbitMQ, targeting “NewQueue”, and are directed to an appropriate node by our Load Balancer, we might assume that we are connected to the instance of “NewQueue” residing on that node. This is not the case.

I mentioned that a RabbitMQ Queue is a singular structure. It exists only on the node that it was created, regardless of HA policy. A Queue is always its own master, and consists of 0…N slaves. Based on the above example, “NewQueue” on Node #2 is the Master-Queue, because this is the node on which the Queue was created. It contains 2 Slave-Queues – it’s counterparts on nodes #1 and #3. Let’s assume that Node #2 dies, for whatever reason; let’s say that the entire server is down. Here’s what will happen to “NewQueue”.

Node #2 does not return a heartbeat, and is considered de-clustered
The “NewQueue” master Queue is no longer available (it died with Node #2)
RabbitMQ promotes the “NewQueue” slave instance on either Node #1 or #3 to master

This is standard HA behaviour in RabbitMQ. Let’s look at the default scenario now, where all 3 nodes are alive and well, and the “NewQueue” instance on Node #2 is still master.

We connect to RabbitMQ, targeting “NewQueue”
Our Load Balancer determines an appropriate Node, based on round robin
We are directed to an appropriate node (let’s say, Node #3)
RabbitMQ determines that the “NewQueue” master node is on Node #2
RabbitMQ redirects us to Node #2
We are successfully connected to the master instance of “NewQueue”

Despite the fact that our Queues are replicated across each HA node, there is only one available instance of each Queue, and it resides on the node on which it was created, or in the case of failure, the instance that is promoted to master. RabbitMQ is conveniently routing us to that node in this case:

RabbitMQ Cluster Exhibiting Extra Network-hop

Unfortunately for us, this means that we suffer an extra, unnecessary network hop in order to reach our intended Queue. This may not seem a major issue, but consider that in the above example, with 3 nodes and an evenly-balanced Load Balancer, we are going to incur that extra network hop on approximately 66% of requests. Only one in every three requests (assuming that in any grouping of three unique requests we are directed to a different node) will result in our request being directed to the correct node.

“Does this mean that we can’t implement round robin load-balancing?”
“No, but if we do, it will result in a less than optimal solution.”
“So, what’s the alternative?”

Well, in order to ensure that each request is routed to the correct node, we have two choices:

Explicitly connect to the node on which the target Queue resides
Spread Queues evenly as possible across nodes

Both solutions immediately introduce problems. In the first instance, our client application must be aware of all nodes in our RabbitMQ cluster, and must also know where each master Queue resides. If a Node goes down, how will our application know? Not to mention that this design breaks the Single Responsibility principle, and increases the level of coupling in the application.

The second solution offers a design in which our Queues are not linked to single nodes. Based on our “NewQueue” example, we would not simply instantiate a new Queue on a single node. Instead, in a 3-node scenario, we might instantiate 3 Queues; “NewQueue1”, “NewQueue2” and “NewQueue3”, where each Queue is instantiated on a separate node.

The second solution offers a design in which our Queues are not linked to single Nodes. Based on our “NewQueue” example, we would not simply instantiate a new Queue on a single Node. Instead, in a 3-node scenario, we might instantiate 3 Queues; “NewQueue1”, “NewQueue2” and “NewQueue3”, where each Queue is instantiated on a separate Node:

RabbitMQ Cluster with Separate Queues

Our client application can now implement, for example, a simple randomising function that selects one of the above Queues and explicitly connects to it. In a web-based application, given 3 separate HTTP requests, each request would target one of the above Queues, and no Queue would feature more than once across all 3 requests. Now we’ve achieved a reasonable balance of load across our cluster, without the use of a traditional Load Balancer.

RabbitMQ Cluster with Randomiser

But we’re still faced with the same problem; our client application needs to know where our Queues reside. So let’s look at advancing the solution further, so that we can avoid this shortcoming.

Firstly, we need to provide mapping metadata that describes our RabbitMQ infrastructure. Specifically, where Queues reside. This should be a resilient data-source such as a database or cache, as opposed to something like a flat file, because multiple sources (2, at least) may access this data concurrently.

Now introduce an always-on service that polls RabbitMQ, determining whether or not nodes are alive. New Queues should also be registered with this service, and it should keep an up-to-date registry, providing metadata about Nodes and their Queues:

RabbitMQ Cluster with Monitor Service

Our client application, on initial load, should poll this service and retrieve RabbitMQ metadata, which should then be retained for incoming requests. Should a request fail due to the fact that a Node becomes compromised, the client application can poll the Queue Metadata Store, return up-to-date RabbitMQ metadata, and re-route the message to a working Node.

This approach is a design that I’ve had some success with. From a conceptual point-of-view, it forms a small section of an overall Microservice Architecture, which I’ll talk about in a future post.

Connect with me:

RabbitMQ QOS vs. Competing Consumers

2 Replies

I recently answered a question on stackoverflow that revolved around this argument:
“Should I scale out my AMQP consumers, tweak the QOS values, or both?”
This is a broad question. The first thing to consider is the actual business process that facilitated by AMQP. Is your business process time-sensitive? For example, let’s say that you have a process which persist data to a database. How important from a business perspective is it that this data is saved immediately?

Quality of Service (QOS)

AMQP Channels contain a property called QOS. The value of this property determines the manner in which AMQP Consumers read messages from Queues. A QOS value of “1” ensures that only a single message at a time will be de-queued. The next message will not be processed until the current message has been handled. Consider the implications of this. Given a Queue that contains 5 messages, and a Consumer, with QOS set to “1”, that reads from that Queue, message #5 will not be processed until messages 1 – 4 have been de-queued and processed:

AMQP Queue with 5 Messages

If it takes 200ms to process each message, then M5 will not be completely processed for 1 second. This figure will rise exponentially as the Queue grows in length. If a Publisher, or set of Publishers suddenly load 1,000 messages onto the Queue, our Consumer’s processing time now becomes minutes, let alone seconds.

Single AMQP Consumer

So let’s increase our QOS value to “5”. Now our Consumer reads from the Queue at a rate of 5 messages at a time. Our Queue is now empty. But what’s happening at the Consumer? The Consumer manages its own Shared Queue in memory. This Shared Queue is a local representation of an AMQP Queue. When a Consumer de-queues a message, that message is cached in the Consumer’s Shared Queue, and processed accordingly.

Based on the above example, our Consumer’s Shared Queue now contains messages 1 – 5, and our actual AMQP Queue is empty. This offers a win; remember that our goal with RabbitMQ, and AMQP in general, is to keep our Queues empty, or near as possible, at any given time to reduce resource consumption. In this case, our Queue is empty.

Well, we’ve reduced RabbitMQ’s resource-footprint, but what’s the catch? Well, we’re still faced with the same problem, which is that our messages are processed sequentially, so we haven’t reduced the overall time that it takes to process the messages – we’ve just moved the problem from one context to another.

Competing Consumers

If we could process each message in parallel, then we could reduce or processing time. We can achieve this by adding multiple Consumers. Each message takes 200ms for a single Consumer to process, so 5 Consumers can process 5 messages in 200ms.

Multiple AMQP Consumers

Actually, this won’t happen, or at least is not likely to happen, because we’ve left our QOS-value at “5”, so let’s follow the process flow, assuming that we’ve started each Consumer in order of 1 through 5. This is important because in situations where multiple Consumers are listening to a Queue, RabbitMQ will prioritise delivery of messages in the same order that the Consumers started listening. If Consumer #1 was the first Consumer to start, it will be the first to receive messages.

This is the problem, in our case. Remember, a QOS value of “5” means that the Consumer will read 5 messages at a time. So in this case, assuming that our Queue contains 5 messages, Consumer #1 will read all 5 messages, and process them sequentially. Consumer’s 2 – 5 will remain idle and wasted.

Our problem is now that our Consumers are not adequately balanced. To facilitate accurate round-robin style balancing among Consumers, we simply set the QOS value of each Consumer to “1”. This results in the following behaviour:

Each Consumer reads 1 message at a time
Older Consumers will receive messages first, assuming that they are not busy

Summary

We’ve reduced our overall processing time to 200MS. We can now process 5 messages in 200MS. So why not do this all the time? There are 2 answers to this. The first concerns the technical aspect of the design. Consider that running multiple Consumers costs resources, especially memory. Just remember this before you think about creating thousands of Consumers. The above problem is simplistic, but this design may not suit your infrastructure for real-world problems.

The second answer concerns the business needs of the actual process that we are managing. Let’s say that our process is a data-logging mechanism, and that our messages contain logging metadata. In this case, we may not care whether or not it takes several minutes to save this data. Logging data is rarely referenced until something goes wrong, so our initial single Consumer solution may be adequate. Now consider a business process that persists time-sensitive metadata to a database. In this case, our multiple Consumer solution may better suit our needs.

What if 1 second is an acceptable processing time? Then we can use a combination of both solutions. We can introduce multiple Consumers, and set their QOS values to “5”. The result is that no Consumer will ever process more than 5 messages at a time, and therefore won’t take more than 1 second to process. Assuming that we introduce 5 Consumers, we can process 25 messages per second. Why not just use 5 Consumers with a QOS value of “1”? Won’t this achieve the same result – 25 messages per second? Yes, or at least close enough, but likely not exactly, because there is more network traffic involved, as we’re now reading 25 messages individually as opposed to 25 messages in batches of 5.

It is critical to take into account the underlying business process when considering an AMQP solution. Generally, a one-size-fits-all approach for all business processes is too broad an approach.

Connect with me:

	Paul Mooney on Load Balancing a RabbitMQ…
	Sachin Parate on Load Balancing a RabbitMQ…
	Paul Mooney on JSON# – Tutorial #4: Deseriali…
	Paul Mooney on Microservices in C# Part 1: Bu…
	archana sharma on Microservices in C# Part 1: Bu…

Share this:

Share this:

The Problem

The Solution

Share this:

Quality of Service (QOS)

Competing Consumers

Summary

Share this: