Monthly Archives: August 2015

Microservices in C# Part 4: Scaling Out

Quality of Service

Quality of Service is a feature of AMQP that defines the level of service exhibited by AMQP Channels at any given time. QOS is expressed as a percentage; 100% suggests that any given channel is utilised to maximum effect. Essentially, we need to avoid downtime in terms of channel-usage. Downtime can be described as the period of time that a Microservice is idle, or not doing work.

Typically, such scenarios occur when a Microservice is waiting on messages in transit, or is itself transmitting message-receipt acknowledgements to the Message Bus. For more information on QOS, please refer to this post. For the moment, we’re going to begin with the most intuitive design possible, without delving deeply into the complexities of QOS, and related concepts such as prefetch-count.

To that end, we are going to deploy multiple instances of our SimpleMathMicroservice (10, to be exact), and retain the default message-delivery mechanism – to read each message from a Queue one-at-a-time. In order to achieve this, we must modify our application slightly, specifically, the Global.asax.cs file. First, add a simple collection to house multiple running SimpleMathMicroservice instances:

private readonly List<SimpleMathMicroservice> _simpleMathMicroservices = new List<SimpleMathMicroservice>();

Now, instantiate 10 unique instances of SimpleMathMicroservice, initialise each instance, and add it to the collection:

            for (var i = 0; i < 10; i++) {
                var simpleMathMicroservice = new SimpleMathMicroservice();
                _simpleMathMicroservices.Add(simpleMathMicroservice);

                simpleMathMicroservice.Init();
            }

Finally, modify the Application_End function such that it gracefully shuts down each SimpleMathMicroservice instance:

            foreach (var simpleMathMicroservice in _simpleMathMicroservices) {
                simpleMathMicroservice.Shutdown();
            }

Now, on startup, 10 instances of SimpleMathMicroservice will be invoked, and will each actively listen to the Math Queue.

Message Distribution

SimpleMathMicroservice leverages a component called AMQPConsumer within the Daishi.AMQP library that defines the manner in which SimpleMathMicroservice will read messages from any given Queue. AMQPConsumer exposes a constructor that accepts a value called prefetchCount:

        protected AMQPConsumer(string queueName, int timeout, ushort prefetchCount = 1, bool noAck = false,
            bool createQueue = true, bool implicitAck = true, IDictionary<string, object> queueArgs = null) {
            this.queueName = queueName;
            this.prefetchCount = prefetchCount;
            this.noAck = noAck;
            this.createQueue = createQueue;
            this.timeout = timeout;
            this.implicitAck = implicitAck;
            this.queueArgs = queueArgs;
        }

Notice the default prefetchCount value of 1. This default setting results behaviour that allows the component to read messages one-at-a-time. It also ensures that RabbitMQ will distribute messages evenly, in a round-robin manner, among consumers. Now our application is configured to process multiple requests in a concurrent manner.

Concurrency and Parallelism

Can our application now be described a parallel? That depends. Concurrency is essentially the act of performing multiple tasks on a single CPU, or core. Parallelism on the other hand, can be described as the act of performing multiple tasks, or multiple stages of a single task, across multiple cores.

By this definition, or application certainly operates in a concurrent manner. But does it also operate in a parallel manner? That depends. Running the application on a single core machine obviously prohibits parallelism. Running on multiple cores will very likely result in parallel processing. Under the hood, the Daishi.AMQP library invokes a new thread for each Microservice operation that consumes messages from a Queue:

        public void ConsumeAsync(AMQPConsumer consumer) {
            if (!IsConnected) Connect();

            var thread = new Thread(o => consumer.Start(this));
            thread.Start();

            while (!thread.IsAlive)
                Thread.Sleep(1);
        }

“Wait, you shouldn’t invoke threads manually! That’s what ThreadPool.QueueUserWorkItem() is for!”

ThreadPool.QueueUserWorkItem() invokes threads as background operations. We require foreground threads, to ensure that the OS provides enough resources to run sufficiently, and also to prevent the OS from pre-empting the thread altogether, in cases when heavy load reduces resource availability.

Assuming that batches of newly created threads run (or are context-switched) across multiple cores, one could argue that our application exhibits parallel processing behaviour.

Run an ApacheBench load test against the running application:

ab -n 10000 -c 10 http://localhost:46653/api/math/1500

While the test is running, refer to the Math Queue in the RabbitMQ Administrator interface:

http://localhost:15672/#/queues/%2F/Math

Notice the number of Consumers (10) and the Consumer Utilisation figure. This figure represents the QOS value associated with the Queue. It should settle at the 100% mark for the duration of the test, indicating that each of all 10 SimpleMathMicroservice instances are constantly busy, and not idle:

Quality of Service

Next Steps

Modify the number of running SimpleMathMicroservice instances, and apply load tests to each setting. Ideally, push the number of running instances upwards in reasonable increments (batches of 5-10) and observe the response times, comparing each run against the last.

Response times should improve incrementally, then plateau, and ultimately decrease as you increase the number of running instances. This is an indication that your application has reach critical mass, based on the law of diminishing returns. Doing this will yield the number of SimpleMathMicroservice instances that you should deploy in order to achieve optimal throughput.

Connect with me:

Microservices in C# Part 3: Queue Pool Sizing

Exhaustion is Expensive

QueuePool exhaustion is something that we need to avoid. If our application frequently consumes all available Queues then the QueuePool will become ineffective. Let’s look at how we go about avoiding this scenario.

First, we need some targets. We need to know how much traffic our application will absorb in order to adequately size our resources. For argument’s sake, let’s assume that our MathController will be subjected to 100,000 inbound HTTP requests, delivered in batches of 10. In other words, at any given time, MathController will service 10 simultaneous requests, and will continue doing so until 100,000 requests have been served.

Stress Testing Using Apache Bench

Apache Bench is a very simple, lightweight tool designed to test web-based applications, and is bundled as part of the Apache Framework. Click here for simple download instructions. Assuming that our application runs on port 46653, here is the appropriate Apache Bench command to invoke 100 MathController HTTP requests in batches of 10:

-ab -n 100 -c 10 http://localhost:46653/api/math/150

Notice the “n” and “c” paramters; “n” refers to “number”, as in the number of requests, and “c” refers to “concurrency”, or the amount of requests to run in simultanously. Running this command will yield something along the lines of the following:
Benchmarking localhost (be patient).....done

Server Software: Microsoft-IIS/10.0
Server Hostname: localhost
Server Port: 46653

Document Path: /api/math/150
Document Length: 5 bytes

Concurrency Level: 10
Time taken for tests: 7.537 seconds
Complete requests: 100
Failed requests: 0
Total transferred: 39500 bytes
HTML transferred: 500 bytes
Requests per second: 13.27 [#/sec] (mean)
Time per request: 753.675 [ms] (mean)
Time per request: 75.368 [ms] (mean, across all concurrent requests)
Transfer rate: 5.12 [Kbytes/sec] received

Connection Times (ms)
min mean[+/-sd] median max
Connect: 0 0 0.4 0 1
Processing: 41 751 992.5 67 3063
Waiting: 41 751 992.5 67 3063
Total: 42 752 992.4 67 3063

Percentage of the requests served within a certain time (ms)
50% 67
66% 1024
75% 1091
80% 1992
90% 2140
95% 3058
98% 3061
99% 3063
100% 3063 (longest request)

Adjusting QueuePool for Optimal Results

Those results don’t look great. Incidentally, if you would like more information as regards how to interpret Apache Bench results, click here. Let’s focus on the final section, “Percentage of the requests served within a certain time (ms)”. Here we see that 75% of all requests took just over 1 second (1091 ms) to complete. 10% took over 2 seconds, and 5% took over 3 seconds to complete. That’s quite a long time for such a simple operation running on a local server. Let’s run the same command again:
Benchmarking localhost (be patient).....done

Server Software: Microsoft-IIS/10.0
Server Hostname: localhost
Server Port: 46653

Document Path: /api/math/100
Document Length: 5 bytes

Concurrency Level: 10
Time taken for tests: 0.562 seconds
Complete requests: 100
Failed requests: 0
Total transferred: 39500 bytes
HTML transferred: 500 bytes
Requests per second: 177.94 [#/sec] (mean)
Time per request: 56.200 [ms] (mean)
Time per request: 5.620 [ms] (mean, across all concurrent requests)
Transfer rate: 68.64 [Kbytes/sec] received

Connection Times (ms)
min mean[+/-sd] median max
Connect: 0 0 0.4 0 1
Processing: 29 54 11.9 49 101
Waiting: 29 53 11.9 49 101
Total: 29 54 11.9 49 101

Percentage of the requests served within a certain time (ms)
50% 49
66% 54
75% 57
80% 60
90% 73
95% 80
98% 94
99% 101
100% 101 (longest request)

OK. Those results look a lot better. Even the longest request took 101 ms, and 80% of all requests completed in <= 60 ms.

But where does this discrepancy come from? Remember, that on start-up there are no QueuePool Queues. The QueuePool is empty and does not have any resources to distribute. Therefore, inbound requests force QueuePool to create a new Queue in order to facilitate the request, and then reclaim that Queue when the request has completed.

Does this mean that when I deploy my application, the first batch of requests are going to run much more slowly than subsequent requests?

No, that’s where sizing comes in. As with all performance testing, the objective is to set a benchmark in terms of the expected volume that an application will absorb, and to determine that maximum impact that it can withstand, in terms of traffic. In order to sufficiently bootstrap QueuePool, so that it contains an adequate number of dispensable Queues, we can simply include ASP.NET controllers that leverage QueuePool in our performance run.

Suppose that we expect to handle 100 concurrent users over extended periods of time. Let’s run an Apache Bench command again, setting the level of concurrency to 100, with a suitably high number of requests in order to sustain that volume over a reasonably long period of time:

ab -n 1000 -c 100 http://localhost:46653/api/math/100

Percentage of the requests served within a certain time (ms) 50% 861 66% 938 75% 9560 80% 20802 90% 32949 95% 34748 98% 39756 99% 41071 100% 42163 (longest request)
Again, very poor, but expected results. More interesting is the number of Queues now active in RabbitMQ:

New QueuePool Queues

In my own environment, QueuePool created 100 Queues in order to facilitate all inbound requests. Let’s run the test again, and consider the results:
Percentage of the requests served within a certain time (ms) 50% 497 66% 540 75% 575 80% 591 90% 663 95% 689 98% 767 99% 816 100% 894 (longest request)
These results are much more respectable. Again, the discrepancy between performance runs is due to the fact that QueuePool was not adequately initialised during the first run. However, QueuePool was initialised with 100 Queues, a volume sufficient to facilitate the volume of request that the application is expected to serve. This is simple an example as possible.

Real world performance testing entails a lot more than simply executing isolated commands against single endpoints, however the principal remains the same. We have effectively determined the optimal size necessary for QueuePool to operate efficiently, and can now size it accordingly on application start-up, ensuring that all inbound requests are served quickly and without bias.

Those already versed in the area of Microservices might object at this point. There is only a single instance of our Microservice, SimpleMathMicroservice, running. One of the fundamental concepts behind Microservice design is scalability. In my next article, I’ll cover scaling, and we’ll drive those performance response times into the floor.

Connect with me:

	Paul Mooney on Load Balancing a RabbitMQ…
	Sachin Parate on Load Balancing a RabbitMQ…
	Paul Mooney on JSON# – Tutorial #4: Deseriali…
	Paul Mooney on Microservices in C# Part 1: Bu…
	archana sharma on Microservices in C# Part 1: Bu…

Quality of Service

Message Distribution

Concurrency and Parallelism

Next Steps

Share this:

Exhaustion is Expensive

Stress Testing Using Apache Bench

Adjusting QueuePool for Optimal Results

Share this: