Tag Archives: qos

Microservices in C# Part 4: Scaling Out

Fork me on GitHub

Scaling Out

Scaling out our Microservices

So far, we have

  • established a simple Microservice
  • abstracted and sufficiently covered the Microservice core logic in terms of tests
  • created a reusable Microservice template
  • implemented the queue-pooling concept to ensure reliable message delivery
  • run simple load tests to adequately size Queue resources

Now it’s time to scale out. Here’s how our design currently looks:

Our current design

Our current design

This design is fine for demonstration purposes, but requires augmentation to facilitate production release. Consider that the current design will only service a single request at any given time, and will service requests in a FIFO manner, assuming that no hardware failure, or otherwise, occurs.

Even under ideal conditions, assuming that each request takes exactly 1 second to complete, given 100 inbound HTTP requests, the 1st request will complete in 1 second. The final, 100th request, will complete in 100 seconds.

Clearly, this is less than ideal. Intuitively, we might consider optimising the processing speed of our Microservice. Certainly this will help, but does little to solve the problem. Let’s say that our engineers work tirelessly to cut response times in half:

Working tirelessly to shatter response-times!

Working tirelessly to shatter response-times!

Even if they achieve this, in a batch of 100 requests, the 100th request will still take 50 seconds to complete. Instead, let’s focus on serving multiple requests in a concurrent, and potentially parallel manner. Our augmented design will be as follows:

Augmented design

Augmented design

Notice that instead of a single instance of SimpleMathMicroservice, there are now multiple instances running. How many instances do we need? That depends on 2 factors – response times and something called Quality-of-Service (QOS).

Quality of Service

Quality of Service is a feature of AMQP that defines the level of service exhibited by AMQP Channels at any given time. QOS is expressed as a percentage; 100% suggests that any given channel is utilised to maximum effect. Essentially, we need to avoid downtime in terms of channel-usage. Downtime can be described as the period of time that a Microservice is idle, or not doing work.

Typically, such scenarios occur when a Microservice is waiting on messages in transit, or is itself transmitting message-receipt acknowledgements to the Message Bus. For more information on QOS, please refer to this post. For the moment, we’re going to begin with the most intuitive design possible, without delving deeply into the complexities of QOS, and related concepts such as prefetch-count.

To that end, we are going to deploy multiple instances of our SimpleMathMicroservice (10, to be exact), and retain the default message-delivery mechanism – to read each message from a Queue one-at-a-time. In order to achieve this, we must modify our application slightly, specifically, the Global.asax.cs file. First, add a simple collection to house multiple running SimpleMathMicroservice instances:

private readonly List<SimpleMathMicroservice> _simpleMathMicroservices = new List<SimpleMathMicroservice>();

Now, instantiate 10 unique instances of SimpleMathMicroservice, initialise each instance, and add it to the collection:

            for (var i = 0; i < 10; i++) {
                var simpleMathMicroservice = new SimpleMathMicroservice();


Finally, modify the Application_End function such that it gracefully shuts down each SimpleMathMicroservice instance:

            foreach (var simpleMathMicroservice in _simpleMathMicroservices) {

Now, on startup, 10 instances of SimpleMathMicroservice will be invoked, and will each actively listen to the Math Queue.

Message Distribution

SimpleMathMicroservice leverages a component called AMQPConsumer within the Daishi.AMQP library that defines the manner in which SimpleMathMicroservice will read messages from any given Queue. AMQPConsumer exposes a constructor that accepts a value called prefetchCount:

        protected AMQPConsumer(string queueName, int timeout, ushort prefetchCount = 1, bool noAck = false,
            bool createQueue = true, bool implicitAck = true, IDictionary<string, object> queueArgs = null) {
            this.queueName = queueName;
            this.prefetchCount = prefetchCount;
            this.noAck = noAck;
            this.createQueue = createQueue;
            this.timeout = timeout;
            this.implicitAck = implicitAck;
            this.queueArgs = queueArgs;

Notice the default prefetchCount value of 1. This default setting results behaviour that allows the component to read messages one-at-a-time. It also ensures that RabbitMQ will distribute messages evenly, in a round-robin manner, among consumers. Now our application is configured to process multiple requests in a concurrent manner.

Concurrency and Parallelism

Can our application now be described a parallel? That depends. Concurrency is essentially the act of performing multiple tasks on a single CPU, or core. Parallelism on the other hand, can be described as the act of performing multiple tasks, or multiple stages of a single task, across multiple cores.

By this definition, or application certainly operates in a concurrent manner. But does it also operate in a parallel manner? That depends. Running the application on a single core machine obviously prohibits parallelism. Running on multiple cores will very likely result in parallel processing. Under the hood, the Daishi.AMQP library invokes a new thread for each Microservice operation that consumes messages from a Queue:

        public void ConsumeAsync(AMQPConsumer consumer) {
            if (!IsConnected) Connect();

            var thread = new Thread(o => consumer.Start(this));

            while (!thread.IsAlive)

“Wait, you shouldn’t invoke threads manually! That’s what ThreadPool.QueueUserWorkItem() is for!”

ThreadPool.QueueUserWorkItem() invokes threads as background operations. We require foreground threads, to ensure that the OS provides enough resources to run sufficiently, and also to prevent the OS from pre-empting the thread altogether, in cases when heavy load reduces resource availability.

Assuming that batches of newly created threads run (or are context-switched) across multiple cores, one could argue that our application exhibits parallel processing behaviour.

Run an ApacheBench load test against the running application:

ab -n 10000 -c 10 http://localhost:46653/api/math/1500

While the test is running, refer to the Math Queue in the RabbitMQ Administrator interface:


Notice the number of Consumers (10) and the Consumer Utilisation figure. This figure represents the QOS value associated with the Queue. It should settle at the 100% mark for the duration of the test, indicating that each of all 10 SimpleMathMicroservice instances are constantly busy, and not idle:

Quality of Service

Quality of Service

Next Steps

Modify the number of running SimpleMathMicroservice instances, and apply load tests to each setting. Ideally, push the number of running instances upwards in reasonable increments (batches of 5-10) and observe the response times, comparing each run against the last.

Response times should improve incrementally, then plateau, and ultimately decrease as you increase the number of running instances. This is an indication that your application has reach critical mass, based on the law of diminishing returns. Doing this will yield the number of SimpleMathMicroservice instances that you should deploy in order to achieve optimal throughput.

Connect with me:


Microservices in C# Part 1: Building and Testing

Fork me on GitHub

Microservice Architecture

Microservice Architecture


I’m often asked how to design and build a Microservice framework. It’s a tricky concept, considering loose level of coupling between Microservices. Consider the following scenario outlining a simple business process that consists of 3 sub-processes, each managed by a separate Microservice:

Microservice Architecture core concept

Microservice Architecture core concept

In order to test this process, it would seem that we need a Service Bus, or at the very least, a Service Bus mock in order to link the Microservices. How else will the Microservices communicate? Without a Service Bus, each Microservice is effectively offline, and cannot communicate with any component outside its own context. Let’s examine that concept a little bit further…maybe there is a way that we can establish at least some level of testing without a Service Bus.

A Practical Example

First, let’s expand on the previous tutorial and actually implement a simple Microservice-based application. The application will have 2 primary features:

  • Take an integer-based input and double its value
  • Take a text-based input and reverse it

Both features will be exposed through Microservices. Our first task is to define the Microservice itself. At a fundamental level, Microservices consist of a simple set of functionality:

Anatomy of a Microservice

Anatomy of a Microservice

Consider each Microservice daemon in the above diagram. Essentially, each they consist of

  • a Message Dispatcher (publishes messages to a service bus
  • an Event Listener (receives messages from a service bus
  • Proprietary business logic (to handle inbound and outbound messages)

Let’s design a simple contract that encapsulates this:

    internal interface Microservice {
        void Init();
        void OnMessageReceived(object sender, MessageReceivedEventArgs e);
        void Shutdown();

Each Microservice implementation should expose this functionality. Let’s start with a simple math-based Microservice:

    public class SimpleMathMicroservice : Microservice {
        private RabbitMQAdapter _adapter;
        private RabbitMQConsumerCatchAll _rabbitMQConsumerCatchAll;

        public void Init() {
            _adapter = RabbitMQAdapter.Instance;
            _adapter.Init("localhost", 5672, "guest", "guest", 50);

            _rabbitMQConsumerCatchAll = new RabbitMQConsumerCatchAll("Math", 10);
            _rabbitMQConsumerCatchAll.MessageReceived += OnMessageReceived;


        public void OnMessageReceived(object sender, MessageReceivedEventArgs e) {
            var input = Convert.ToInt32(e.Message);
            var result = Functions.Double(input);

            _adapter.Publish(result.ToString(), "MathResponse");

        public void Shutdown() {
            if (_adapter == null) return;

            if (_rabbitMQConsumerCatchAll != null) {


Functionality in Detail


Establishes a connection to RabbitMQ. Remember, as per the previous tutorial, we need only a single connection. This connection is a TCP pipeline designed to funnel all communications to RabbitMQ from the Microservice, and back. Notice the RabbitMQConsumerCatchAll implementation. Here we’ve decided that in the event of an exception occurring, our Microservice will catch each exception and deal with it accordingly. Alternatively, we could have implemented RabbitMQConsumerCatchOne, which would cause the Microservice to disengage from the RabbitMQ Queue that it is listening to (essentially a Circuit Breaker, which I’ll talk about in a future post). In this instance, the Microservice is listening to a Queue called “Math”, to which messages will be published from external sources.


Our core business logic, in this case, multiplying an integer by 2, is implemented here. Once the calculation is complete, the result is dispatched to a Queue called “MathResponse”.


Gracefully closes the underlying connection to RabbitMQ.

Unit Testing

There are several moving parts here. How do we test this? Let’s extract the business logic from the Microservice. Surely testing this separately from the application will result in a degree of confidence in the inner workings of our Microservice. It’s a good place to start.
Here is the core functionality in our SimpleMathMicroservice (Functions class in Daishi.Math):

        public static int Double(int input) {
            return input * 2;

Introducing a Unit Test as follows ensures that our logic behaves as designed:

    internal class MathTests {
        public void InputIsDoubled() {
            const int input = 5;
            var output = Functions.Double(input);

            Assert.AreEqual(10, output);

Now our underlying application logic is sufficiently covered from a Unit Testing perspective. Let’s focus on the application once again.


How will users leverage our application? Do they interface with the Microservice framework through a UI? No, like all web applications, we must provide an API layer, exposing a HTTP channel through which users interact with our application. Let’s create a new ASP.NET application and edit Global.asax as follows:

            # region Microservice Init
            _simpleMathMicroservice = new SimpleMathMicroservice();


            #region RabbitMQAdapter Init

            RabbitMQAdapter.Instance.Init("localhost", 5672, "guest", "guest", 100);


We’re going to run an instance of our SimpleMathMicroservice alongside our ASP.NET application. This is fine for the purpose of demonstration, however each Microservice should run in its own context, as a daemon (*.exe in Windows) in a production equivalent. In the above code snippet, we initialise our SimpleMathMicroservice and also establish a separate connection to RabbitMQ to allow the ASP.NET application to publish and receive messages. Essentially, our SimpleMathService instance will run silently, listening for incoming messages on the “Math” Queue. Our adjacent ASP.NET application will publish messages to the “Math” Queue, and attempt to retrieve responses from SimpleMathService by listening to the “MathResponse” Queue. Let’s implement a new ASP.NET Controller class to achieve this:

    public class MathController : ApiController {
        public string Get(int id) {
            RabbitMQAdapter.Instance.Publish(id.ToString(), "Math");

            string message;
            BasicDeliverEventArgs args;
            var responded = RabbitMQAdapter.Instance.TryGetNextMessage("MathResponse", out message, out args, 5000);

            if (responded) {
                return message;
            throw new HttpResponseException(HttpStatusCode.BadGateway);

Navigating to http://localhost:{port}/api/math/100 will initiate the following process flow:

  • ASP.NET application publishes the integer value 100 to the “Math” Queue
  • ASP.NET application immediately polls the “MathResponse” Queue, awaiting a response from SimpleMathMicroservice
  • SimpleMathMicroservice receives the message, and invokes Math.Functions.Double on the integer value 100
  • SimpleMathService publishes the result to “MathResponse”
  • ASP.NET application receives and returns the response to the browser.


In this example, we have provided a HTTP endpoint to access our SimpleMathMicroservice, and have abstracted SimpleMathMicroservice’ core logic, and applied Unit Testing to achieve sufficient coverage. This is an entry-level requirement in terms of building Microservices. The next step, which I will cover in Part 2, focuses on ensuring reliable message delivery.

Connect with me:


RabbitMQ QOS vs. Competing Consumers

I recently answered a question on stackoverflow that revolved around this argument:
“Should I scale out my AMQP consumers, tweak the QOS values, or both?”
This is a broad question. The first thing to consider is the actual business process that facilitated by AMQP. Is your business process time-sensitive? For example, let’s say that you have a process which persist data to a database. How important from a business perspective is it that this data is saved immediately?

Quality of Service (QOS)

AMQP Channels contain a property called QOS. The value of this property determines the manner in which AMQP Consumers read messages from Queues. A QOS value of “1” ensures that only a single message at a time will be de-queued. The next message will not be processed until the current message has been handled. Consider the implications of this. Given a Queue that contains 5 messages, and a Consumer, with QOS set to “1”, that reads from that Queue, message #5 will not be processed until messages 1 – 4 have been de-queued and processed:

AMQP Queue with 5 Messages

AMQP Queue with 5 Messages

If it takes 200ms to process each message, then M5 will not be completely processed for 1 second. This figure will rise exponentially as the Queue grows in length. If a Publisher, or set of Publishers suddenly load 1,000 messages onto the Queue, our Consumer’s processing time now becomes minutes, let alone seconds.

Single AMQP Consumer

Single AMQP Consumer

So let’s increase our QOS value to “5”. Now our Consumer reads from the Queue at a rate of 5 messages at a time. Our Queue is now empty. But what’s happening at the Consumer? The Consumer manages its own Shared Queue in memory. This Shared Queue is a local representation of an AMQP Queue. When a Consumer de-queues a message, that message is cached in the Consumer’s Shared Queue, and processed accordingly.

Based on the above example, our Consumer’s Shared Queue now contains messages 1 – 5, and our actual AMQP Queue is empty. This offers a win; remember that our goal with RabbitMQ, and AMQP in general, is to keep our Queues empty, or near as possible, at any given time to reduce resource consumption. In this case, our Queue is empty.

Well, we’ve reduced RabbitMQ’s resource-footprint, but what’s the catch? Well, we’re still faced with the same problem, which is that our messages are processed sequentially, so we haven’t reduced the overall time that it takes to process the messages – we’ve just moved the problem from one context to another.

Competing Consumers

If we could process each message in parallel, then we could reduce or processing time. We can achieve this by adding multiple Consumers. Each message takes 200ms for a single Consumer to process, so 5 Consumers can process 5 messages in 200ms.

Multiple AMQP Consumers

Multiple AMQP Consumers

Actually, this won’t happen, or at least is not likely to happen, because we’ve left our QOS-value at “5”, so let’s follow the process flow, assuming that we’ve started each Consumer in order of 1 through 5. This is important because in situations where multiple Consumers are listening to a Queue, RabbitMQ will prioritise delivery of messages in the same order that the Consumers started listening. If Consumer #1 was the first Consumer to start, it will be the first to receive messages.

This is the problem, in our case. Remember, a QOS value of “5” means that the Consumer will read 5 messages at a time. So in this case, assuming that our Queue contains 5 messages, Consumer #1 will read all 5 messages, and process them sequentially. Consumer’s 2 – 5 will remain idle and wasted.

Our problem is now that our Consumers are not adequately balanced. To facilitate accurate round-robin style balancing among Consumers, we simply set the QOS value of each Consumer to “1”. This results in the following behaviour:

  • Each Consumer reads 1 message at a time
  • Older Consumers will receive messages first, assuming that they are not busy


We’ve reduced our overall processing time to 200MS. We can now process 5 messages in 200MS. So why not do this all the time? There are 2 answers to this. The first concerns the technical aspect of the design. Consider that running multiple Consumers costs resources, especially memory. Just remember this before you think about creating thousands of Consumers. The above problem is simplistic, but this design may not suit your infrastructure for real-world problems.

The second answer concerns the business needs of the actual process that we are managing. Let’s say that our process is a data-logging mechanism, and that our messages contain logging metadata. In this case, we may not care whether or not it takes several minutes to save this data. Logging data is rarely referenced until something goes wrong, so our initial single Consumer solution may be adequate. Now consider a business process that persists time-sensitive metadata to a database. In this case, our multiple Consumer solution may better suit our needs.

What if 1 second is an acceptable processing time? Then we can use a combination of both solutions. We can introduce multiple Consumers, and set their QOS values to “5”. The result is that no Consumer will ever process more than 5 messages at a time, and therefore won’t take more than 1 second to process. Assuming that we introduce 5 Consumers, we can process 25 messages per second. Why not just use 5 Consumers with a QOS value of “1”? Won’t this achieve the same result – 25 messages per second? Yes, or at least close enough, but likely not exactly, because there is more network traffic involved, as we’re now reading 25 messages individually as opposed to 25 messages in batches of 5.

It is critical to take into account the underlying business process when considering an AMQP solution. Generally, a one-size-fits-all approach for all business processes is too broad an approach.

Connect with me: