999 What’s Your Emergency?
In today’s post we are going to look at hardware and software uptime, the guarantees that companies make about uptime and how you should choose what is right for your business.
What is in a %?
You may be wondering what our post title is about. Well, a lot of systems quote uptime targets as 99.9%. Over the last 10 years that has improved massively, starting at 90%, then 99% through to the widely used current expectation of 99.9%. What people rarely stop to look at is what that means in terms of time, probably because outside of tech 90% would be great for most things.
|Downtime / Year
|0.876 (Approx. 53 mins)
|0.0876 (Approx. 5 mins)
Apart from the massive difference that the different percentages represent in potential downtime you might be thinking one of two things. Surely no one offers 99.999% target uptimes or conversely, we have a 100% uptime guarantee on our software, so why even consider less?
The Five Nines of Uptime
99.999%, often called five nines, is considered to be the gold standard of uptime targets. It is very achievable, to the extent that people are starting to talk about six nines targets. To achieve this, you need:
- The best equipment that is designed specifically to be robust and have the ability to quickly swap out replacement components.
- Multiple servers in separate locations (often different countries) with load balancing, failover routines and redundancy.
- Backup power supplies and connections to the internet.
- Replacement parts at every site.
- Automatic monitoring
- 24/7 tech support. At this level it needs to be better than on call, they need to be on site.
- Maintenance programs for both the hardware and software.
- Robust development pipelines, including testing servers to make sure updates will not cause issues.
- Tried and tested disaster recovery and backup plans.
The massive downside to five nines uptime targets is obviously cost. We will discuss how you should set your target below.
Targets vs Guarantees
Some companies talk about 100% uptime, so why is that not in our table above. Well, what they are usually talking about is an uptime guarantee. It is not that the service will be available 100% of the time, it is that there will be some form of redress when the service is not available. Often this is just a refund of that proportion of your licence fee.
You will find the details of this in the Service Level Agreement part of the terms and conditions. These will often define how uptime is monitored, what percentage uptime you should expect over what timeframe and the compensation if that target is not met.
When we talk to clients about their choice of hardware to run their bespoke software on, we prefer to talk about targets that they should aim for. Even 5 nines setups do not guarantee 99.999% uptime, they are just designed with that target in mind.
It Gets More Complicated
Feel free to skip this section but in the modern world a lot of bespoke software runs on cloud services. For example, AWS & Microsoft Azure. In these situations, you often spin up services for different parts of your software. In even the most simple applications the software will be running on one service and the database will be running on another. So now you have two uptime targets of 99.9%, so what is the overall uptime target? It’s 99.8%. Every service you add in is going to make the whole system worse.
There are a number of reasons that this is acceptable, in fact often preferable. First of all, you are getting hardware specifically designed for the job it is being asked to do. Secondly it allows a more tailored approach to uptime. For example, you could choose to have a backup server that only has a 99% uptime target but is cheaper.
Choosing the Target that is Right for You
Assuming no one has unlimited budget then there has to be a compromise made on uptime versus cost. If you do have an unlimited budget, set the target at 100%, you probably can’t achieve it but aim for the stars and you might hit the ceiling.
So, what do you need? If you want to spend no time thinking about it at all then for most situations, 99.9% is going to be absolutely fine. It is a very high level of uptime and there is a good chance that most of your users will not even notice that it is down for a few hours a year.
If you want to be more scientific about it then you need to start working out the cost of downtime on a particular service and compare that to the cost of improving the uptime. For example, say 1 hour of downtime cost the business £1 000. If the cost of moving from 99.9% to 99.99% (potentially approximately 8 hours extra uptime) is £7 000 or year or less then it might be worth it. Over this amount it is probably not.
Standard uptime targets are fine for most companies. The important takeaway is that there are things you can do if you have critical systems that need higher availability. It is also important to look at both the hardware and software, the full stack. If you need to discuss what can be done to meet your requirements then contact us for a quick chat.