The main aspect of every backup and recovery strategy is a balance of RTO and RPO objectives. They regard how quickly and precisely you will be able to get the data back if something goes wrong. In case of disaster, recovery time objective becomes one of the most valuable characteristics in business planning.
What is RTO
RTO can typically be defined as the desired time period needed to conduct all recovery tasks before an application or service will be able to perform requests normally again. RTO, from the perspective of IT infrastructure, has two ends.
- Nothing works at all. Service is down, the server itself is burnt to the ground and everything is very bad
- You have recovered all needed data, restored services and your replicated servers are up and running
This time period between the "everything bad" and "up and running" situations is called unplanned downtime.
RTO for IT departments are influenced by the technologies used, but for businesses, they are far more important. If your company uses IT services as the basis of its financial activities or for back-office support, every minute of downtime will cost you a lot if not everything. The bigger the company – the more money it loses in case of unplanned downtime. That is why business owners are most interested in recovery time objective when creating backup and recovery plans.
If you are planning disaster recovery strategies, these will incorporate Recovery Time Objective, too. In cases of disaster, however, you will most likely use more complex technologies, such as cluster systems, storage replication, a secondary data center, cloud systems, etc.
Recovery Time Objective Done Right
Downtime costs depend on long-term effects such as loss of/damage to reputation or violation of a facility’s production plan, and immediate consequences such as failure to execute daily sales plan(s) or the inability to promptly replenish warehouses. Although recovery time objective is crucial for business, this objective is often undervalued or its value is set approximately. One reason behind it - the complexity of changes, that needs to be done, when the RTO is changing.
In conventional organizations, the RTO value is related to its service level agreement, appointed by top management and adopted by IT staff. It is important to understand, that a shorter Recovery Time Objective requires organizations to use special software(s) and hardware to meet agreed objectives.
It is not enough to just set up faster backup storage and LAN – meeting RTO is a complex task which involves the creation of a business continuity plan, to include the following activities:
- Create a proper monitoring system that allows you to be notified in real time about any service failures. Many IT departments waste a lot of time by simply not knowing what has happened.
- Plan and deploy effective service/application configuration, in order to be able to restore all data in a worst case scenario, and strictly meet the Recovery Time Objective frame. This configuration can include hardware and software solutions.
- Create and write down restoring procedure so that every member of personnel knows what he/she must do in case of downtime. Even the best-architected solutions can fail the objective by way of human error.
- Test overall recovery plan with real hardware, using an amount of data equivalent to the production load. This testing must be a regular process since backup volumes tend to grow exponentially and the system must be changed, accordingly.
Testing also allows you to effectively scale the system so that you can determine whether your RTO will be met. By attaining this information, you can assess and justify the need to purchase additional funds for top management. It is a good pattern for IT systems scale since it is directly influenced by actual business needs.
By carefully devising backup and disaster recovery plans (DRP) you will help your company avoid excessive financial costs, in addition to meeting your service level. Whilst considering the right solution, you should take into account a balance of RTO and RPO (Recovery Point Objective). Yet the most important thing is that you have to do real tests of deployed solutions to be clear that your agreed Recovery Time Objective fits its measured value.