Load Balancing 101

Nov 25, 2020

Load Balancing is an important aspect in the web architecture that eases the load on operational servers. Here, I'll briefly talk about Load Balancing and its underlying mechanism.

Intuition Behind Requests and Servers

Suppose you build an app on your local machine that suddenly catches the eye of a friend. They approach you to use the app on their own device. You agree, and soon after setting up an API gateway to your local machine, you give them your API spec. The friend then goes on to sending requests to your API which interfaces with the app sitting on top of the machine (aka, the server).

Very soon, on recommendation, friends of the friend start using the app via API requests. As your app's popularity increases, the server gains hundreds of thousands of requests in a set amount of time – a very unhealthy amount, actually.

You can think of each request to your server as a weight. The server can be thought of as a weightlifter 🏋🏻‍♀️ with a certain limit they can handle. For each successful ping, you can think of another weight being added to the barbell the weightlifter is lifting.

+1 request → +1 KG added to server (weightlifter) load.

Naturally, as a new request is made, another weight gets added to the weightlifter's barbell, getting dangerously closer to the weightlifter's limit. Once this limit has been breached, the weightlifter – the server – has no choice but to crash. It no longer has capacity to take in any more requests/weights.

A Quick Fix

What do we do when overwhelmed with requests? Simple really – we add a new server to equally distribute the load. Suppose users pay us (the genius app creators) for using our app. With enough cash in our hands, we can afford to buy a few more server machines and upload our program files to run on them.

"With great heaps of cash come great server space" ~ Uncle Ben

Hol Up

Now that we have more than one server, how do we decide which server gets to process an incoming request?

Enter Consistent Hashing

The smart thing to do is to generate a uniformly random ID that's tagged to each incoming request. Let's say function H(i)maps an incoming ID to an index between 0 and n-1 where n is the number of servers under your control. When a certain index is picked, the request is rerouted from this magic hashing box to the server at that index.

Let's say we have 4 servers initially. The load balancer uses this number 4 and a modulus-related hash function to generate an index to find a server to relay the incoming request to.

When you add another server to the mix, the balancer uses the number 5 and the modulus hashing function to reroute the new server traffic on your app.

UNDER THE HOOD
Say we have 12 incoming requests over 4 servers, each with different IDs ranging from 0 to 11.

Let's assume the hashing function uses a simple modulus (%) operator to get the routing index.

For the uninitiated, the modulus operation returns the remainder when a number is divided by another. Eg: 5 % 2 = 1

Let's look at which machine each request is sent to:

ID     Machine Idx
0          0
1          1
2          2
3          3
4          0
5          1
6          2
7          3
8          0
9          1
10         2
11         3

Each server has 3 requests being sent to it.

And just like that, out of a sample of 12 requests, we can see that the load across 4 machines is uniformly (evenly) distributed! When we scale this up to hundreds of requests with a bunch of servers concurrently serving the app, we can rest assured that the load won't cause any crashes.

Note: Of course, in real life, consistent hashing functions are much more complex. I've used a very simple modulus example for demo purposes.

In a Nutshell

Loading Balancing uniformly relays an incoming request to a server via Consistent Hashing so as to evenly distribute the load across all servers.

If there's a uniformly random chance of each server being picked using consistent hashing, statistically, we can assume that each server has uniform load. This way, your servers won't crash suddenly just because you went viral (congrats if you have, enjoy it!)

So, why do we want to add load balancers to our infrastructure? For a number of reasons really:

Servers can move data more efficiently → as an analogy, try opening Photoshop, Minecraft, and Google Chrome all on one laptop. Now try opening each app on 3 separate devices. Notice the difference?
Optimises the use of app delivery resources → important for widely-used CDNs
Load Balancers conduct continuous health checks on your system → raises alarm if servers are failing
Generally improves app reliability and architecture availability → may heavily improve UX

Thanks for reading!

RishTech

Discussion about this post