Author Archives: Paul Mooney

Object Oriented, Test Driven Design in C# and Java: A Practical Example Part #1

Download the code in C#
Download the code in Java

Check out my interview on .NET Rocks! – TDD on .NET and Java with Paul Mooney

For a brief overview, please refer to this post.

At this point, many tutorials start by launching into a “Hello, World” style tutorial, with very little practical basis.

This isn’t the most exciting concept, so let’s try a more practical example. Instead of churning out boring pleasantries, our application is going to do something a bit more interesting…build robots.

Specifically, our application is going to build awesome robots with big guns.

Ok, let’s get started with a narrative description of what we’re going to do.

“Mechs with Big Guns” is a factory that produces large, robotic vehicles designed to shoot other large, robotic vehicles. Robots are composed of several robotic parts, delivered by suppliers. Parts are loaded into a delivery bay, and are transported by worker drones to various rooms; functional parts such as arms, legs, etc., are dispatched to an assembly room. Guns and explosives are dispatched to an armoury.
The factory hosts many worker drones to assemble the robots. These drones will specialise in the construction of 1 specific robot, and will require all parts that make up that robot in order to build it. Once the drone has acquired all parts, it will enter the assembly room and build the robot. Newly built robots are transported to the armoury where waiting drones outfit them with guns. From time to time, two robots will randomly be selected from the armoury, and will engage one another in the arena, an advanced testing-ground in the factory. The winning robot will be repaired in the arena by repair drones. Its design will be promoted on a leader board, tracking each design and their associated victories.

The first thing we’ll do is look at the narrative again, this time highlighting each noun.

“Mechs with Big Guns” is a factory that produces large, robotic vehicles designed to shoot other large, robotic vehicles. Robots are composed of several robotic parts, delivered by suppliers. Parts are loaded into a delivery bay, and are transported by worker drones to various rooms; functional parts such as arms, legs, etc., are dispatched to an assembly room. Guns and explosives are dispatched to the armoury.
The factory hosts many worker drones to assemble the robots. These drones will specialise in the construction of 1 specific robot, and will require all parts that make up that robot in order to build it. Once the drone has acquired all parts, it will enter the assembly room and build the robot. Newly built robots are transported to the armoury where waiting drones outfit them with weapon assemblies. From time to time, two robots will randomly be selected from the armoury, and will engage one another in the arena, an advanced testing-ground in the factory. The winning robot will be repaired in the arena by repair drones. Its design will be promoted on a leader board, tracking each design and their associated victories.

Each highlight represents an actual object that will exist in our simulation.

“But where do I start with all of this?”

I recommend starting with a low-level component; that is, a component with little or no dependency on other objects. Supplier looks like a good place to start. It doesn’t do much, other than deliver RobotParts to a DeliveryBay, so let’s start with that.

I’m assuming at this point, if you’re working with .NET, that you have a new Visual Studio solution and have created a Class Library project including NUnit as an included package. NUnit is the testing framework that we will use to validate our core components.

Or,if you’re working with Java, that you’ve set up a new Java 8 project in your IDE of choice, and have configured JUnit, the testing framework that we will use to validate our core components.

Add a class called RobotPartSupplierTests and add the following method:

        [Test]
        public void RobotPartSupplierDeliversRobotParts() {
            var mechSupplier = new RobotPartSupplier {
                RobotParts = new List<RobotPart> {
                    new MockedRobotPart(),
                    new MockedRobotPart()
                }
            };

            var deliveryBay = new MockedDeliveryBay();
            mechSupplier.DeliverRobotParts(deliveryBay);

            Assert.AreEqual(2, deliveryBay.RobotParts.Count);
            Assert.AreEqual(0, mechSupplier.RobotParts.Count);
        }

Java

    @Test
    public void supplierDeliversRobotParts() {
        robotPartSupplier robotPartSupplier = new robotPartSupplier();
        List<robotPart> robotParts = new ArrayList<robotPart>();

        robotParts.add(new mockedRobotPart());
        robotParts.add(new mockedRobotPart());

        robotPartSupplier.setRobotPart(robotParts);

        deliveryBay mockedDeliveryBay = new mockedDeliveryBay();
        robotPartSupplier.deliverRobotParts(mockedDeliveryBay);

        assertEquals(2, mockedDeliveryBay.getRobotParts().size());
        assertEquals(0, robotPartSupplier.getRobotParts().size());
    }

Here, we declare an instance of RobotPartSupplier, an implementation of the Supplier abstract class:

 public abstract class Supplier {
        public List<RobotPart> RobotParts { get; set; }
    }

Java

public abstract class supplier {
    private List<robotPart> _robotParts;

    public List<robotPart> getRobotParts() {
        return _robotParts;
    }

    public void setRobotPart(List<robotPart> value) {
        _robotParts = value;
    }
}

“Why create an abstraction? Can’t we just deal directly with the concrete implementation?”

Yes, we can. We’ve abstracted to the Supplier class because at some point, we’re going to have to test a component that has a dependency on a RobotPartSupplier. The point of each test in a Test Driven project is that it tests exactly 1 unit of work, and no more. In our case, we’re testing to ensure that a RobotPartSupplier can deliver RobotParts to a DeliveryBay. But if you follow the logic through, we’re also testing concrete implementations of DeliveryBay and RobotPart. This can have negative consequences later on.

What would happen, for example, if we were to change the underlying behaviour of DeliveryBay? For starters, it would break associated DeliveryBay tests.

“OK, so what?”

That’s fine – that’s what the tests are there for. But, it will also break our RobotPartSupplier tests. The point is to isolate each problem domain into a set of tests, and to apply boundaries to those tests such that changes in other problem domains do not impact.
This might sound a bit pedantic. If so, bear with me. It actually provides a level of neatness to our code, which is particularly helpful as codebases grow larger.

Essentially, the rule of thumb is to abstract everything.

Notice that our RobotPartSupplier contains a List of RobotPart. Based on the above concept, The actual RobotPart instances that are loaded in this test are mocked instance. These are dummy implementations of the core abstraction, RobotPart, that will never make it to our actual core components.

“Why bother with them then?”

They provide simple implementations of dependent classes in order to protect our RobotSupplier tests from changes in concrete implementations of those dependent classes.

As you can see, we’ve mocked DeliveryBay and RobotPart abstractions.
Let’s carry on. The first thing that we want to prove here is that our RobotPartSupplier can deliver RobotParts to a DeliveryBay, so our test requires us to implement the DeliverRobotParts method:

    public class RobotPartSupplier : Supplier {
        public void DeliverRobotParts(DeliveryBay deliveryBay) {
            deliveryBay.RobotParts = new List<RobotPart>(RobotParts);
            RobotParts.Clear();
        }
    }

Java

    public void deliverRobotParts(deliveryBay deliveryBay) {
        deliveryBay.setRobotPart(new ArrayList<robotPart>(this.getRobotParts()));
        getRobotParts().clear();
    }

Notice that two things happen here:

The RobotPartSupplier’s RobotParts are copied to the DeliveryBay
The RobotPartSupplier’s RobotParts are emptied

As per the brief, this concludes our delivery-related functionality. Our test’s assertions determine that the RobotParts have indeed been copied from our RobotPartSupplier to our mocked DeliveryBay.

In next week’s tutorial, we’ll continue our implementation, focusing on WorkerDrones, and their functionality.

Connect with me:

Object Oriented, Test Driven Design in C# and Java

3 Replies

Check out my interview on .NET Rocks! – TDD on .NET and Java with Paul Mooney

Overview

Providing performance-optimised frameworks is both a practical and theoretical compulsion. Thus far, my posts have covered my own bespoke frameworks designed to optimise performance or enhance security. I’ve outlined those frameworks’ design, and provided tutorials describing several implementation examples.

It occurred to me that providing such frameworks is not just about the practical – designing and distributing code libraries – but also about the theoretical – how to go about designing solutions from the ground up, with performance optimisation in mind.

With that in mind, this post will mark the first in a series of posts aimed at offering step-by-step tutorials outlining the fundamentals of Object Oriented and Test Driven design in C# and Java.

“But what do Object Oriented and Test Driven Design have in common with performance optimisation? Surely components implemented in a functional, or other capacity will yield similar results in terms of performance?”

Well, that’s a subjective opinion, regardless of which is beside the point. Let’s start with TDD. In essence, when designing software, always subscribe to the principal that less is more, and strive to deliver solutions of minimal size. You enjoy the following when applying this methodology:

Your code is more streamlined, and easier to navigate
Less code, less components, less working parts, less friction, potentially less bugs
Less working parts mean less interactions, and potentially faster throughput of data

Friction in software systems occurs when components interact with one another. The more components you have, the more friction occurs, and the greater the likelihood that friction will result in bugs, or performance-related issues.
This is where Test Driven Design comes in. Essentially, you start with a test.

“OK, but what exactly is a test?!?”

Let me first offer a disclaimer: I won’t quote scripture on this blog, nor offer a copy-and-paste explanation of technical terms. Instead, I’ll attempt to offer explanations and opinions in as practical a manner as possible. In other words, plain English:

A TEST IS A SOFTWARE FUNCTION THAT PROVES THE COMPONENT YOU’RE BUILDING DOES WHAT IT’S SUPPOSED TO DO.

That’s it. I can expand on this to a great degree, but in essence, that’s all you need to know.

“Great. But how do tests help?”

Tests focus on one thing only – ensuring that the tested component achieves its purpose, and nothing more. In other words, when our component is finished, it should consist of exactly the amount of code necessary to fulfil its purpose, and no more.

“That makes sense. What about Object Oriented Design? I don’t see how that helps. Will systems designed in an object-oriented manner run more efficiently than others?”

No, not necessarily. However, object-oriented systems can potentially offer a great degree of flexibility and reusability. Let’s assume that we have a working system. Step back and consider that system in terms of its core components.

In an object-oriented design, the system will consist of a series of objects, interacting with one and other in a loosely-coupled fashion, so that each object is not (or at least should not be) dependent on the other. Theoretically, we achieve two things from this:

We can identify and extract application logic replacing it with new objects, should requirements change
Objects can be reused across the application, where logic overlaps

These are generally harder to achieve in unstructured systems. Using a combination of Object Oriented and Test Driven Design, we can achieve a design that:

is flexible
lends itself well to change
is protected by working tests
does not contain superfluous code
adheres to design patterns

Let’s explore some of these concepts that haven’t been covered so far:

Think of your tests like a contract. They define how your components behave. Significant changes to a component should cause associated tests to fail, thus protecting your application from breaking changes.

There are numerous articles online that argue the merits, or lack thereof, of design patterns. Some argue that all code should be structured based on design pattern, others that they add unnecessary complexity.

My own opinion is that over time, as software evolved, the same design problems occurred across systems as they were developed. Solutions to those problems eventually formed, until the most optimal solutions matured as established design patterns.

Every software problem you will ever face has been solved before. A certain pattern, or combination of patterns exists that offer a solution to your problem.

Let’s explore these concepts further by applying them to a practical example in next week’s follow-up post.

Connect with me:

JSON# – Tutorial #5: Deserialising Complex Objects

JSON# – Tutorial #4: Deserialising Simple Objects

4 Replies

Fork on Github
Download the Nuget package

The previous tutorial focused on serialising complex JSON objects. This tutorial describes the process of deserialisation using JSON#.

The purpose of JSON# is to allow memory-efficient JSON processing. At the root of this is the ability to dissect large JSON files and extract smaller structures from within, as discussed in Tutorial #1.

One thing, among many, that JSON.NET achieves, is allow memory-efficient JSON-parsing using the JsonTextReader component. The last thing that I want to do is reinvent the wheel; to that end, JSON# also provides a simple means of deserialisation, which wraps JsonTextReader. Using JsonTextReader alone, requires quite a lot of custom code, so I’ve provided a wrapper to make things simpler and allow reusability.

At the root of the deserialization process lies the StandardJsonNameValueCollection class. This is an implementation of the JsonNameValueCollection, from which custom implementations can be derived, if necessary, in a classic bridge design, leveraged from the Deserialiser component. Very simply, it reads JSON node values and stores them as key-value pairs; they key is the node’s path from root, and the value is the node’s value. This allows us to cache the JSON object and store it in a manner that provides easy access to its values, without traversing the tree:

class StandardJsonNameValueCollection : JsonNameValueCollection {
    public StandardJsonNameValueCollection(string json) : base(json) {}

    public override NameValueCollection Parse() {
    var parsed = new NameValueCollection();

    using (var reader = new JsonTextReader(new StringReader(json))) {
        while (reader.Read()) {
            if (reader.Value != null &amp;&amp; !reader.TokenType.Equals(JsonToken.PropertyName))
                parsed.Add(reader.Path, reader.Value.ToString());
            }
            return parsed;
        }
    }
}

Let’s work through an example using our SimpleObject class from previous tutorials:

class SimpleObject : IHaveSerialisableProperties {
    public string Name { get; set; }
    public int Count { get; set; }

    public virtual SerialisableProperties GetSerializableProperties() {
        return new SerialisableProperties("simpleObject", new List {
            new StringJsonProperty {
                Key = "name",
                Value = Name
            },
            new NumericJsonProperty {
                Key = "count",
                Value = Count
            }
        });
    }
}

Consider this class represented as a JSON object:

{
    "simpleObject": {
        "name": "SimpleObject",
        "count": 1
    }
}

The JSON# JsonNameValueCollection implementation will read each node in this JSON object and return the values in a NameValueCollection. Once stored, we need only provide a mechanism to instantiate a new SimpleObject POCO with the key-value pairs. JSON# provides the Deserialiser class as an abstraction to provide this functionality. Let’s create a class that accepts the JsonNameValueCollection and uses it to populate an associated POCO:

class SimpleObjectDeserialiser : Deserialiser {
    public SimpleObjectDeserialiser(JsonNameValueCollection parser) : base(parser) {}

    public override SimpleObject Deserialise() {
        var properties = jsonNameValueCollection.Parse();
        return new SimpleObject {
            Name = properties.Get("simpleObject.name"),
            Count = Convert.ToInt32(properties.Get("simpleObject.count"))
        };
    }
}

This class contains a single method designed to map properties. As you can see from the code snippet above, we read each key as a representation of corresponding node’s path, and then bind the associated value to a POCO property.
JSON# leverages the SimpleObjectDeserialiser as a bridge to deserialise the JSON string:

const string json = "{\"simpleObject\":{\"name\":\"SimpleObject\",\"count\":1}}";
var simpleObjectDeserialiser = new SimpleObjectDeserialiser(new StandardJsonNameValueCollection(json));
var simpleObject = Json.Deserialise(_simpleObjectDeserialiser);

So, why do this when I can just bind my objects dynamically using JSON.NET:

JsonConvert.DeserializeObject(json);

There is nothing wrong with deserialising in the above manner. But let’s look at what happens. First, it will use Reflection to look up CLR metadata pertaining to your object, in order to construct an instance. Again, this is ok, but bear in mind the performance overhead involved, and consider the consequences in a web application under heavy load.

Second, it requires that you decorate your object with serialisation attributes, which lack flexibility in my opinion.

Thirdly, if your object is quite large, specifically over 85K in size, you may run into memory issues if your object is bound to the Large Object Heap.

Deserialisation using JSON#, on the other hand, reads the JSON in stream-format, byte-by-byte, guaranteeing a small memory footprint, nor does it require Reflection. Instead, your custom implementation of Deserialiser allows a greater degree of flexibility when populating your POCO, than simply decorating properties with attributes.

I’ll cover deserialising more complex objects in the next tutorial, specifically, objects that contain complex properties such as arrays.

Connect with me:

JSON# – Tutorial #3: Serialising Complex Objects

JSON# – Tutorial #2: Serialising Simple Objects

2 Replies

Fork on Github
Download the Nuget package

The last tutorial focused on parsing embedded JSON objects. This time, we’ll focus on serialising simple objects in C#.

Object serialisation using JSON# is 25 times to several hundred times faster than serialisation using JSON.NET, on a quad-core CPU with 16GB RAM. The source code is written in a BDD-manner, and the associated BDD features contain performance tests that back up these figures.

Let’s start with a basic class in C#:

class SimpleObject {
    public string Name { get; set; }
    public int Count { get; set; }
}

Our first step is to provide serialisation metadata to JSON#. Traditionally, most frameworks use Reflection to achieve this. While this works very well, it requires the component to know specific assembly metadata that describes your object. This comes with a slight performance penalty.

Ideally, when leveraging Reflection, the optimal design is a solution that reads an object’s assembly metadata once, and caches the result for the duration of the application’s run-time. This is generally not achievable with stateless HTTP calls. Using Reflection, we will likely query the object’s assembly during each HTTP request when serialising or de-serialising an object, suffering the associated performance-overhead for each request.

JSON# allows us to avoid that overhead by exposing serialisation metadata in the class itself:

class SimpleObject : IHaveSerialisableProperties {
    public string Name { get; set; }
    public int Count { get; set; }

    public virtual SerialisableProperties GetSerializableProperties() {
        return new SerialisableProperties(&quot;simpleObject&quot;, 
        new List&lt;JsonProperty&gt; {
            new StringJsonProperty {
                Key = &quot;name&quot;,
                Value = Name
            },
            new NumericJsonProperty {
                Key = &quot;count&quot;,
                Value = Count
            }
        });
    }
}

First, we need to implement the IHaveSerialisableProperties interface, allowing JSON# to serialise our object. Notice the new method, GetSerializableProperties, that returns a SerialisableProperties object, which looks like this:

public class SerialisableProperties {
   public string ObjectName { get; set; }
       public IEnumerable&lt;JsonProperty&gt; Properties { get; private set; }
       public IEnumerable&lt;JsonSerialisor&gt; Serialisors { get; set; }

       public SerialisableProperties(IEnumerable&lt;JsonProperty&gt; properties) {
           Properties = properties;
       }

       public SerialisableProperties(IEnumerable&lt;JsonSerialisor&gt; serialisors) {
           Serialisors = serialisors;
       }

       public SerialisableProperties(string objectName,
           IEnumerable&lt;JsonProperty&gt; properties) : this(properties) {
           ObjectName = objectName;
       }

       public SerialisableProperties(string objectName,
           IEnumerable&lt;JsonSerialisor&gt; serialisors) : this(serialisors) {
           ObjectName = objectName;
       }

       public SerialisableProperties(IEnumerable&lt;JsonProperty&gt; properties,
            IEnumerable&lt;JsonSerialisor&gt; serialisors) : this(properties) {
            Serialisors = serialisors;
        }

        public SerialisableProperties(string objectName,
            IEnumerable&lt;JsonProperty&gt; properties, IEnumerable&lt;JsonSerialisor&gt; serialisors)
            : this(properties, serialisors) {
            ObjectName = objectName;
        }
    }
}

This object is essentially a mapper that outlines how an object should be serialised. Simple types are stored in the Properties property, while more complex types are retrieved through custom JsonSerialisor objects, which I will discuss in the next tutorial. The following code outlines the process involved in serialising a SimpleObject instance:

First, we initialise our object

    var simpleObject = new SimpleObject {Name = &quot;Simple Object&quot;, Count = 10};

Now initialise a BinaryWriter, setting the appropriate Encoding. This will be used to build the object’s JSON-representation, under-the-hood.

var writer = new BinaryWriter(new MemoryStream(), new UTF8Encoding(false));

Now we use our Json library to serialise the object

var serialisableProperties = simpleObject.GetSerializableProperties();
byte[] serialisedObject;

using (var serialisor = new StandardJsonSerialisationStrategy(writer)) {
    Json.Serialise(serialisor, new JsonPropertiesSerialisor(serialisableProperties));
    serialisedObject = serialisor.SerialisedObject;
}

Below is the complete code-listing:

var simpleObject = new SimpleObject {Name = &quot;Simple Object&quot;, Count = 10};

var writer = new BinaryWriter(new MemoryStream(), new UTF8Encoding(false));
var serialisableProperties = simpleObject.GetSerializableProperties();

byte[] serialisedObject;

using (var serialisor = new StandardJsonSerialisationStrategy(writer)) {
    Json.Serialise(serialisor, new JsonPropertiesSerialisor(serialisableProperties));
    serialisedObject = serialisor.SerialisedObject;
}

Now our serialisedObject variable contains a JSON-serialised representation of our SimpleObject instance, as an array of raw bytes. We’ve achieved this without Reflection, by implementing a simple interface, IHaveSerialisableProperties in our SimpleObject class, and have avoided potentially significant performance-overhead; while a single scenario involving reflection might involve very little performance-overhead, consider a web application under heavy load, leveraging Reflection. We can undoubtedly support more concurrent users per application tier if we avoid Reflection. JSON# allows us to do just that.

In the next tutorial, I’ll discuss serialising complex objects.

Connect with me:

JSON# – Tutorial #1: Returning Embedded-Objects

1 Reply

Fork on Github
Download the Nuget package

I’ve previously blogged about the premise behind JSON#. For a full explanation of the theory behind the code, check out this post.

Now, let’s dive into an example…

Let’s say that we have a JSON object that represents a real-world object, like a classroom full of students. It might look something like this:

{
    "classroom": {
        "teachers": [
            {
                "name": "Pablo",
                "age": 33
            },
            {
                "name": "John",
                "age": 28
            }
        ],
        "blackboard": {
            "madeOf": "wood",
            "height": "100",
            "width": "500"
        }
    }
}

This doesn’t look like it presents any great challenge to parse on any platform. But what if we expand it further to describe a school:

{
    "school": {
        "classrooms": [
            {
                "name": "Room #1",
                "teachers": [
                    {
                        "name": "Pablo",
                        "age": 33
                    },
                    {
                        "name": "John",
                        "age": 28
                    }
                ],
                "blackboard": {
                    "madeOf": "wood",
                    "height": "100",
                    "width": "500"
                }
            },
            {
                "name": "Room #2",
                "teachers": [
                    {
                        "name": "David",
                        "age": 33
                    },
                    {
                        "name": "Mary",
                        "age": 28
                    }
                ],
                "blackboard": {
                    "madeOf": "metal",
                    "height": "200",
                    "width": "600"
                }
            }
        ]
    }
}

Notice that our school object contains 2 classrooms, each of which contain similar objects, such as “blackboard”. Imagine that our school object needs to represent every school in the country. For argument’s sake, let’s say that we need to retrieve details about the blackboard in every classroom of every school. How would we go about that?

Well, we could refer to one of numerous JSON-parsing tools available. But how do these tools actually operate? Firstly, our massive JSON object will likely end up a large object. I’ve mentioned in previous blogs that objects greater than 85KB can significantly impact performance. So, immediately we’re potentially in trouble.

We can always cache the JSON object as a Stream, and read from it byte-by-byte. Tools like JSON.net offer capabilities like this, using components such as the JsonTextReader. So we’ve overcome the performance overhead associated with storing big strings in memory. But now we have another problem – we’re drilling into a massive JSON file, and searching for metadata that’s spread widely. We’re going to have to implement a lot of logic in order to draw the “blackboard” objects out.

What if requirements change, and we no longer need the “blackboard” objects? Now we just need the height of each blackboard. Well, we’ll have to throw out a lot of code, which is essentially wasted effort. Requirements change again, and we no longer need “blackboard” objects at all – now we need “teacher” objects. We need to rewrite all of our logic. Not the most flexible solution.

Let’s do this instead:

First, download the JSON# library from Github (MIT license).

Now let’s get those “blackboard” objects:

    const string schoolMetadata = @"{ "school": {...";
    var jsonParser = new JsonObjectParser();

    using (var stream = new MemoryStream(Encoding.UTF8.GetBytes(schoolMetadata))) {
        Json.Parse(jsonParser, stream, "blackboard");
    }

This will return all “blackboard” objects from the JSON file. Let’s say that our requirements change. Now we need all “teacher” objects instead. Simply change the code as follows:

    const string schoolMetadata = @"{ "school": {...";
    var jsonParser = new JsonObjectParser();

    using (var stream = new MemoryStream(Encoding.UTF8.GetBytes(schoolMetadata))) {
        Json.Parse(jsonParser, stream, "teachers");
    }

Such a change would have required significant effort, had we implemented our own custom logic using JSON.net’s JsonTextReader, or a similar component. Using JSON#, we achieve this by changing a single word. Now we’ve:

Optimised performance
Reduced our application’s memory-footprint
Avoided the Large Object Heap
Reduced development-time

The next tutorial outlines how to serialise objects using JSON#.

Connect with me:

Load Balancing a RabbitMQ Cluster

82 Replies

The Problem

Given a cluster of RabbitMQ nodes, we want to achieve effective load-balancing.

The Solution

First, let’s look at the feature at the core of RabbitMQ – the Queue itself.
RabbitMQ Queues are singular structures that do not exist on more than one Node. Let’s say that you have a load-balanced, HA (Highly Available) RabbitMQ cluster as follows:

RabbitMQ Cluster with Load Balancer

Nodes 1 – 3 are replicated across each other, so that snapshots of all HA-compliant Queues are synchronised across each node. Suppose that we log onto the RabbitMQ Admin Console and create a new HA-configured Queue. Our Load Balancer is configured in a Round Robin manner, and in this instance, we are directed to Node #2, for argument’s sake. Our new Queue is created on Node #2. Note: it is possible to explicitly choose the node that you would like your Queue to reside on, but let’s ignore that for the purpose of this example.

Now our new Queue, “NewQueue”, exists on Node #2. Our HA policy kicks in, and the Queue is replicated across all nodes. We start adding messages to the Queue, and those messages are also replicated across each node. Essentially, a snapshot of the Queue is taken, and this snapshot is replicated across each node after a non-determinable period of time has elapsed (it actually occurs as an asynchronous background task, when the Queue’s state changes).

We may look at this from an intuitive perspective and conclude that each node now contains an instance of our new Queue. Thus, when we connect to RabbitMQ, targeting “NewQueue”, and are directed to an appropriate node by our Load Balancer, we might assume that we are connected to the instance of “NewQueue” residing on that node. This is not the case.

I mentioned that a RabbitMQ Queue is a singular structure. It exists only on the node that it was created, regardless of HA policy. A Queue is always its own master, and consists of 0…N slaves. Based on the above example, “NewQueue” on Node #2 is the Master-Queue, because this is the node on which the Queue was created. It contains 2 Slave-Queues – it’s counterparts on nodes #1 and #3. Let’s assume that Node #2 dies, for whatever reason; let’s say that the entire server is down. Here’s what will happen to “NewQueue”.

Node #2 does not return a heartbeat, and is considered de-clustered
The “NewQueue” master Queue is no longer available (it died with Node #2)
RabbitMQ promotes the “NewQueue” slave instance on either Node #1 or #3 to master

This is standard HA behaviour in RabbitMQ. Let’s look at the default scenario now, where all 3 nodes are alive and well, and the “NewQueue” instance on Node #2 is still master.

We connect to RabbitMQ, targeting “NewQueue”
Our Load Balancer determines an appropriate Node, based on round robin
We are directed to an appropriate node (let’s say, Node #3)
RabbitMQ determines that the “NewQueue” master node is on Node #2
RabbitMQ redirects us to Node #2
We are successfully connected to the master instance of “NewQueue”

Despite the fact that our Queues are replicated across each HA node, there is only one available instance of each Queue, and it resides on the node on which it was created, or in the case of failure, the instance that is promoted to master. RabbitMQ is conveniently routing us to that node in this case:

RabbitMQ Cluster Exhibiting Extra Network-hop

Unfortunately for us, this means that we suffer an extra, unnecessary network hop in order to reach our intended Queue. This may not seem a major issue, but consider that in the above example, with 3 nodes and an evenly-balanced Load Balancer, we are going to incur that extra network hop on approximately 66% of requests. Only one in every three requests (assuming that in any grouping of three unique requests we are directed to a different node) will result in our request being directed to the correct node.

“Does this mean that we can’t implement round robin load-balancing?”
“No, but if we do, it will result in a less than optimal solution.”
“So, what’s the alternative?”

Well, in order to ensure that each request is routed to the correct node, we have two choices:

Explicitly connect to the node on which the target Queue resides
Spread Queues evenly as possible across nodes

Both solutions immediately introduce problems. In the first instance, our client application must be aware of all nodes in our RabbitMQ cluster, and must also know where each master Queue resides. If a Node goes down, how will our application know? Not to mention that this design breaks the Single Responsibility principle, and increases the level of coupling in the application.

The second solution offers a design in which our Queues are not linked to single nodes. Based on our “NewQueue” example, we would not simply instantiate a new Queue on a single node. Instead, in a 3-node scenario, we might instantiate 3 Queues; “NewQueue1”, “NewQueue2” and “NewQueue3”, where each Queue is instantiated on a separate node.

The second solution offers a design in which our Queues are not linked to single Nodes. Based on our “NewQueue” example, we would not simply instantiate a new Queue on a single Node. Instead, in a 3-node scenario, we might instantiate 3 Queues; “NewQueue1”, “NewQueue2” and “NewQueue3”, where each Queue is instantiated on a separate Node:

RabbitMQ Cluster with Separate Queues

Our client application can now implement, for example, a simple randomising function that selects one of the above Queues and explicitly connects to it. In a web-based application, given 3 separate HTTP requests, each request would target one of the above Queues, and no Queue would feature more than once across all 3 requests. Now we’ve achieved a reasonable balance of load across our cluster, without the use of a traditional Load Balancer.

RabbitMQ Cluster with Randomiser

But we’re still faced with the same problem; our client application needs to know where our Queues reside. So let’s look at advancing the solution further, so that we can avoid this shortcoming.

Firstly, we need to provide mapping metadata that describes our RabbitMQ infrastructure. Specifically, where Queues reside. This should be a resilient data-source such as a database or cache, as opposed to something like a flat file, because multiple sources (2, at least) may access this data concurrently.

Now introduce an always-on service that polls RabbitMQ, determining whether or not nodes are alive. New Queues should also be registered with this service, and it should keep an up-to-date registry, providing metadata about Nodes and their Queues:

RabbitMQ Cluster with Monitor Service

Our client application, on initial load, should poll this service and retrieve RabbitMQ metadata, which should then be retained for incoming requests. Should a request fail due to the fact that a Node becomes compromised, the client application can poll the Queue Metadata Store, return up-to-date RabbitMQ metadata, and re-route the message to a working Node.

This approach is a design that I’ve had some success with. From a conceptual point-of-view, it forms a small section of an overall Microservice Architecture, which I’ll talk about in a future post.

Connect with me:

RabbitMQ QOS vs. Competing Consumers

2 Replies

I recently answered a question on stackoverflow that revolved around this argument:
“Should I scale out my AMQP consumers, tweak the QOS values, or both?”
This is a broad question. The first thing to consider is the actual business process that facilitated by AMQP. Is your business process time-sensitive? For example, let’s say that you have a process which persist data to a database. How important from a business perspective is it that this data is saved immediately?

Quality of Service (QOS)

AMQP Channels contain a property called QOS. The value of this property determines the manner in which AMQP Consumers read messages from Queues. A QOS value of “1” ensures that only a single message at a time will be de-queued. The next message will not be processed until the current message has been handled. Consider the implications of this. Given a Queue that contains 5 messages, and a Consumer, with QOS set to “1”, that reads from that Queue, message #5 will not be processed until messages 1 – 4 have been de-queued and processed:

AMQP Queue with 5 Messages

If it takes 200ms to process each message, then M5 will not be completely processed for 1 second. This figure will rise exponentially as the Queue grows in length. If a Publisher, or set of Publishers suddenly load 1,000 messages onto the Queue, our Consumer’s processing time now becomes minutes, let alone seconds.

Single AMQP Consumer

So let’s increase our QOS value to “5”. Now our Consumer reads from the Queue at a rate of 5 messages at a time. Our Queue is now empty. But what’s happening at the Consumer? The Consumer manages its own Shared Queue in memory. This Shared Queue is a local representation of an AMQP Queue. When a Consumer de-queues a message, that message is cached in the Consumer’s Shared Queue, and processed accordingly.

Based on the above example, our Consumer’s Shared Queue now contains messages 1 – 5, and our actual AMQP Queue is empty. This offers a win; remember that our goal with RabbitMQ, and AMQP in general, is to keep our Queues empty, or near as possible, at any given time to reduce resource consumption. In this case, our Queue is empty.

Well, we’ve reduced RabbitMQ’s resource-footprint, but what’s the catch? Well, we’re still faced with the same problem, which is that our messages are processed sequentially, so we haven’t reduced the overall time that it takes to process the messages – we’ve just moved the problem from one context to another.

Competing Consumers

If we could process each message in parallel, then we could reduce or processing time. We can achieve this by adding multiple Consumers. Each message takes 200ms for a single Consumer to process, so 5 Consumers can process 5 messages in 200ms.

Multiple AMQP Consumers

Actually, this won’t happen, or at least is not likely to happen, because we’ve left our QOS-value at “5”, so let’s follow the process flow, assuming that we’ve started each Consumer in order of 1 through 5. This is important because in situations where multiple Consumers are listening to a Queue, RabbitMQ will prioritise delivery of messages in the same order that the Consumers started listening. If Consumer #1 was the first Consumer to start, it will be the first to receive messages.

This is the problem, in our case. Remember, a QOS value of “5” means that the Consumer will read 5 messages at a time. So in this case, assuming that our Queue contains 5 messages, Consumer #1 will read all 5 messages, and process them sequentially. Consumer’s 2 – 5 will remain idle and wasted.

Our problem is now that our Consumers are not adequately balanced. To facilitate accurate round-robin style balancing among Consumers, we simply set the QOS value of each Consumer to “1”. This results in the following behaviour:

Each Consumer reads 1 message at a time
Older Consumers will receive messages first, assuming that they are not busy

Summary

We’ve reduced our overall processing time to 200MS. We can now process 5 messages in 200MS. So why not do this all the time? There are 2 answers to this. The first concerns the technical aspect of the design. Consider that running multiple Consumers costs resources, especially memory. Just remember this before you think about creating thousands of Consumers. The above problem is simplistic, but this design may not suit your infrastructure for real-world problems.

The second answer concerns the business needs of the actual process that we are managing. Let’s say that our process is a data-logging mechanism, and that our messages contain logging metadata. In this case, we may not care whether or not it takes several minutes to save this data. Logging data is rarely referenced until something goes wrong, so our initial single Consumer solution may be adequate. Now consider a business process that persists time-sensitive metadata to a database. In this case, our multiple Consumer solution may better suit our needs.

What if 1 second is an acceptable processing time? Then we can use a combination of both solutions. We can introduce multiple Consumers, and set their QOS values to “5”. The result is that no Consumer will ever process more than 5 messages at a time, and therefore won’t take more than 1 second to process. Assuming that we introduce 5 Consumers, we can process 25 messages per second. Why not just use 5 Consumers with a QOS value of “1”? Won’t this achieve the same result – 25 messages per second? Yes, or at least close enough, but likely not exactly, because there is more network traffic involved, as we’re now reading 25 messages individually as opposed to 25 messages in batches of 5.

It is critical to take into account the underlying business process when considering an AMQP solution. Generally, a one-size-fits-all approach for all business processes is too broad an approach.

Connect with me:

Ultrafast JSON Parsing

10 Replies

Fork on Github
Download the Nuget package

I previously blogged about parsing JSON using JSON.NET’s JsonTextReader, during which I touched on a key point; the Large Object Heap, and why to avoid it.
We’ve all been asked at some point or another to fix a buckled project while consuming minimal cost in terms of development effort. This generally happens when a project grinds to a halt due to poor performance as a result of bad design. Nobody wants to acknowledge that they’ve commissioned one of these, so at that point, you’re called in, and expected to deliver this, within the constraints of this.

I’ve had to privilege of taking part in such projects several times, and one in particular comes to mind. The project marshalled objects from one tier to another in JSON-format. These objects were unusually large, over 1MB in most cases and overall performance was poor. The interesting thing was that only certain segments of the JSON file were necessary for the various HTTP endpoints to process. Pruning the JSON structure wasn’t an option, so I started to look at optimisation.

The first thing to come to mind was the Large Object Heap in .NET. Any .NET object over 85K in size is considered a large object, and can cause performance issues. Prior to Garbage Collection, all threads apart from the thread that triggered Garbage Collection are suspended to allow the various Generations to be released. Releasing LOH objects can introduce performance bottlenecks. In a nutshell, it takes time to release such objects (170,000 cycles, at least), and can result in excessive Garbage Collection. You may have experienced this before, when for example, IIS inexplicably hangs, though that’s not always as a result of Garbage Collection.

In any case, it occurred to me that every time a large JSON object was returned during a HTTP request, it was cached in a string variable, and from there straight onto the LOH. I mentioned that a great deal of this JSON was superfluous, so I thought about a way to parse the large JSON object and extract the necessary embedded objects. The key to this is to avoid strings, and instead deal with raw bytes through streaming. Streamed objects are processed in small chunks, where each chunk will occupy a small section of memory. Processing the large JSON structure chunk by chunk until I find what I’m looking for, sounds like a good way to avoid the LOH.

My initial post on this provided a nice and simple class to achieve this. Since then, given the potential complexity of large JSON objects, I’ve put together a library to cover more complex scenarios. The library provides the following capabilities:

Parse JSON 7+ times faster than JSON.NET
Serialise JSON several hundred times faster than JSON.NET
Remove specific section(s) from large JSON structures while avoiding the LOH
Provide a simple interface to serialise and deserialise while avoiding reflection

Here are some example scenarios:

Retrieving an Embedded object, or Series of Objects from a Large JSON Structure

Let’s say you have a large JSON object that contains among other things, 1 or more embedded objects of the following structure:

"simpleObject": {
  "name": "Simple Object",
  "count": 1
}

You can retrieve all instances of these objects with the following code:

    byte[] largeJson = GetLargeJson();
    var parser = new JsonObjectParser();

    Json.Parse(parser, new MemoryStream(largeJson), "simpleObject");
    var embeddedObjects = parser.Objects;

We’ve provided a large JSON structure in byte-format, and instructed the API to retrieve all embedded objects called “simpleObject”. The Parse method contains a reference to JsonObjectParser. This is one of two types of parser. Its counterpart is JsonArrayParser. You can alternate between the two depending on the nature of the JSON you’re parsing, whether it be object or array.

Serialising an Object without Reflection

Traditional serialisation techniques that leverage custom Attributes involve Reflection. This comes with a performance overhead. While this may be nominal, we can avoid it altogether to serialise our objects. The API provides a way to achieve this through the IHaveSerialisableProperties interface. The following object extends this interface and explicitly exposes serialisation metadata:

class ComplexArrayObject : IHaveSerialisableProperties {
  public string Name { get; set; }
  public string Description { get; set; }

  public SerialisableProperties GetSerializableProperties() {
    return new SerialisableProperties("complexArrayObject", new List {
      new StringJsonProperty {
        Key = "name",
        Value = Name
      },
      new StringJsonProperty {
        Key = "description",
        Value = Description
      }
    });
  }
}

Now that the class exposes its serialisation metadata, we can serialise it as follows:

var writer = new BinaryWriter(new MemoryStream(), new UTF8Encoding(false));
var serialisableProperties = myObject.GetSerializableProperties();

using (var serialisor = new StandardJsonSerialisationStrategy(writer)) {
  Json.Serialise(serialisor, new JsonPropertiesSerialisor(serialisableProperties));
  return serialisor.SerialisedObject;
}

We use a BinaryWriter to serialise the object, and provide its serialiable properties. We use the StandardJsonSerialisationStrategy class to effect the serialisation. This is a Builder class. Its abstraction allows us to modify the serialisation process if necessary. The returned object is serialised to a byte array. The end result is a serialised object, processed without reflection. There is a full suite of BDD specifications included with the API, including some speed tests. Incidentally, the included serialisation speed-test indicates an overall processing time of at least 20, to several hundred times faster than JSON.NET.

Deserialising an Object without Reflection

The API provides a means to traverse large JSON structures in order to extract embedded objects, very quickly. Once our object(s) are extracted, we likely want to deserialise them. From a practical perspective, I’m assuming that the extracted objects are of a reasonable size. If not, they likely need to be segmented further by extracting their contents. Assuming that the resulting object(s) are ready for serialisation, we are ready to do so. Again, let’s avoid reflection in order to optimise performance. JSON.NET offers an efficient way to do this using the JsonTextReader class. Rather than reinvent the wheel, I’ve wrapped this class in a manner that reads serialised properties into a NameValueCollection, and allows you to extract them to a POCO as follows:

Let’s say we have the following POCO:

class SimpleObject {
  public string Name { get; set; }
  public int Count { get; set; }
}

We create an associated class to facilitate deserialization:

class SimpleObjectDeserialiser : Deserialiser {
  public SimpleObjectDeserialiser(SimpleJSONParser parser) : base(parser) {}

  public override SimpleObject Deserialise() {
    var properties = parser.Parse();

    return new SimpleObject {
      Name = properties.Get("simpleObject.name"),
      Count = Convert.ToInt32(properties.Get("simpleObject.count"))
    };
  }
}

Here, we inherit from an abstraction that parses our JSON object. The parsed results are loaded into a NameValueCollection, in dot-notation format to signify hierarchy. E.g.:

simpleObject.name
simpleObject.count

We override the default Deserialise method and map the parsed properties to our POCO.
Here is an example using the above classes:

    var simpleObjectDeserialiser = new SimpleObjectDeserialiser(new SimpleJSONParser(myJson));
    var simpleObject = Json.Deserialise(simpleObjectDeserialiser);

The end result is a POCO instantiated from our JSON object.

Summary

Sometimes you can’t avoid dealing with large JSON objects – but you can avoid falling prey to memory-related performance issues.

The purpose of this API is to provide a means of segmenting JSON files, and to serialise and deserialise JSON objects in a performance-optimised manner. Leveraging this API, you can

Avoid the Large Object Heap
Avoid reflection
Segment unmanageable JSON into manageable chunks
All at exceptionally fast speeds

“Should I only use the library if I deal with big JSON objects?”

“No! Even on smaller JSON objects, this library is faster than JSON.NET – there’s a performance-win either way. This API deals with raw bytes, and leverages a concept called deferred-execution. I’m happy to talk about these, and other elements of the design, in more detail if you contact me directly.”

For a step-by-step tutorial, check out the next post.

Connect with me:

	Paul Mooney on Load Balancing a RabbitMQ…
	Sachin Parate on Load Balancing a RabbitMQ…
	Paul Mooney on JSON# – Tutorial #4: Deseriali…
	Paul Mooney on Microservices in C# Part 1: Bu…
	archana sharma on Microservices in C# Part 1: Bu…

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

The Problem

The Solution

Share this:

Quality of Service (QOS)

Competing Consumers

Summary

Share this:

Retrieving an Embedded object, or Series of Objects from a Large JSON Structure

Serialising an Object without Reflection

Deserialising an Object without Reflection

Summary

Share this: