Many times, we come across solutions that are not scalable by nature. For example, to enable processing you might need to enable third-party tools on a server that requires licensing or some sort of manual intervention to set up the server.
With such kinds of limitations, auto-scaling becomes extremely difficult (and in the majority of cases the effort required to make this auto-scalable is just not worth it).
So how do you scale your infrastructure in such cases?
There are multiple scenarios that need to be taken care of which designing an efficient solution for this problem:
1. Office hours: During office hours you need to have basic instance/s in place to cover the usual load during office hours.
a. Start Primary Machine – For our example and to keep things simple we refer to our servers as primary and secondary (of course we can have multiple secondary servers, but again, keeping things simple). At a predefined time, you switch your server 1 on, this is usually when your office starts or your client’s office starts. You might also keep this instance running 24×7 but again depends on how your clients are using your product.
b. Stop Primary Machine – As soon as the office hours end, we check if there are any tasks that are waiting to be processed, if yes, we do not switch the server off, else we switch the server off. Again, the primary server can be kept running 24×7, depending on how you/your customers use your product.
2. Extra Load during office hours: There is an API that is always listening to the number of tasks that are waiting to be processed or are currently being processed. You set a certain threshold depending on the type of processing, the frequency, and the time taken for the same. Let us assume that the threshold for us is 5 tasks, now as soon as we reach the 5 tasks (waiting + processing), threshold, we start the secondary server. As soon as the load goes below the threshold we switch off the secondary server.
3. Extra Load during non-office hours: In our case the primary and the secondary servers, both do not work during non-office hours if there is no load. In case the listener service detects that there are tasks that are waiting, it switches the primary on first, and when the number of tasks crosses the threshold, the secondary server is started.
We use NodeJS leveraging Amazon SAM to build a serverless application for this purpose.
We defined 4 functions (lambda):
- Start Primary Server
- Stop Primary Server
- Scale Up
- Scale Down
We used tags of an instance to pick these instances (primary and secondary) dynamically rather than allocating instance ids in the code (which should not be followed anyway).
The lambda functions were connected to even bridge to help schedule these services. Keep in mind we did not use any API Gateways, we were directly able to access the lambdas using the Event Bridge.
Let’s do a cost analysis on how much money can be saved with this. Assuming both primary and secondary are using the same machine type – costing $x per hour.
Earlier costs on a monthly basis:
Assuming a t3.xlarge instance costing $0.24 per hour, we are looking at:
.24*24*30*2 = $345 per month costs. We are assuming only 2 machines as of now, when you are talking about scale and load you will easily be looking at 5-10x of this number.
Case when you keep Primary always on 24X7 and Secondary on load:
.24*24*30*1 + .24*24*30*1*.25 = 172+43 = $215
Case when you keep Primary always on during office hours and Secondary on load:
.24*24*30*.29 + .24*24*30*1*.25 = 43+50 = $93
We have been conservative with respect to our secondary calculation assuming it to be working all throughout the working hours (almost).
The amount saved in this is close to 75% and the more the number of instances, the more will be your savings.
This is a very common scenario in companies and they end up paying huge amounts of money to their cloud service providers or keep wasting time on vertical scaling based on the load.
If you have a similar problem, get in touch with us.