Tips from Steve: Real Life Disaster Recovery Scenarios

What would you do if a disaster strikes? Flood, fire, tornado, a devastating ransomware attack. These all are real-life issues, that any business can face. We have asked Steve Putnam to tell us about the real-life disaster recovery examples, they have in mind, for the possible disaster striking one of their clients.

As an MSP supporting over 150 businesses, we have had to deal with several disaster situations where our clients lost access to the facility that houses their server and communications equipment.  The following details our plans to recovery from one such disaster:

The Scenario

  • An accounting firm, located 30 miles away from us with one server and 6 PC’s, has a fire in the building that renders it uninhabitable for at least a month.  We cannot access the server or the local backup drive so all recovery has to occur elsewhere.
  • The server is setup such that their File Server data is contained in one Hyper-V instance (FS-1) and the Domain controller in another Hyper-V instance (DC-1).
  • We had been backing up the files nightly to Google, and every month we did full backups of the C: Drive VHDx files of FS-1 and DC-1 to Google Nearline storage.  We only keep one month in the Cloud, and we do not backup the Data VHDx files as they tend to be huge and we have the files backed.

Disaster Recovery Plan

Our general recovery plan to handle disasters such as this is to bring up the client’s server instances on our Spare/Test Server and provide virtual workstation Hyper-V instances to accommodate the users.

The staff of the affected business would use their home computers, laptops (anything they can get their hands on) to connect via a VPN client to one of the virtual workstations.

Our spare server has enough disk, memory and CPU to handle the critical business functions for our clients (up to 150 users).  We also have a fast internet connection (50mbps up- 250mbps down).

Here is an overview of what we do to get the client up and running again after a disaster.

  1. We start the download of the most recent VHDx/Hyper-V files to our spare server. Since these are full backups, we typically use Cloudberry Explorer to retrieve the DC-1 and FS-1 System VHDx files.
  2. Because our spare server has Server 2012 R2 installed on it already, it is simply a matter of configuring the settings on the Virtual instances and bringing them up. (We keep the spare server at Server 2012 R2 as it supports Hyper-V instances running 2008 through 2016.  Server 2016 does not support Server 2008, which several of our clients are still running.)
  3. Once the file Server instance is operational, we can bring up the Cloudberry Backup console that is already installed and begin a repository sync to the cloud storage since the VHDx file (and therefore the repository) could be up to 30 days old.
  4. Once the repository sync is complete,  we would begin the restore of the files, SQL databases, etc. that are required to bring the system up to the point of the last good backup.
  5. Because we have standardized on Watchguard routers, all of our clients use the same VPN client. Many of our clients already use the VPN client to work from home, so in a disaster situation, they would simply change the IP address in the VPN client to our server and remotely connect.  Also, it is fairly easy to install the VPN client on machines that do not already have it.
  6. We would continue to do nightly backups of the data.
  7. The entire process will take 8-72 hours to complete (depending on the data volume that needs to be restored), but core functionality (Accounting Software) could be recovered on the first day.  Given the severity of the disaster, that is a fairly fast recovery.   
  8. When the original facility hit by the disaster is repaired and new equipment is installed and tested, we would plan a weekend migration of the clients Hyper-V instances to their new server.

In the next iteration of our Disaster Recovery solution (currently in testing) we will utilize Amazon EC2 with a Virtual Private Connection (VPC) for our larger clients.  We will need to establish a VPN connection from our clients to EC2.  However, there are recurring costs associated with simply having these Amazon EC2 and VPC components available that will necessitate our charging the clients for having a Disaster Solution in place.  Our larger clients with multiple offices and/or larger footprints will start to use EC2 VPC as the primary recovery location whereas our smaller clients will continue to use our in-house server as their Disaster Recovery solution.

In summary, it is extremely important for an MSP to plan ahead for disasters such as floods, fires, tornados, etc. that might befall their clients.   

  •         Utilize virtualization for Domain controllers and App/File Servers.
  •         Backup these virtual instances to the cloud at least once per month.
  •         Maintain a spare/test server capable of handling the OC version/workload of your largest client.
  •         Leverage a standard router/VPN solution.
  •         Be sure to have a high bandwidth connection to the internet from the MSP facility.

About Steve Putnam:

Steve retired in 2013 from a Fortune 500 firm as  Director of Storage and Backup, after spending 40 years in IT.  He now manages the Cloud Solutions Services for The PC Wizard, a small Managed Service Provider (MSP) owned by his son, who started the company in 1995 when he was only 15 years old.  In this new role, Steve has more time to devote to exploring new technologies, as well as to engage in one of his passions – writing technical articles to assist others in maximizing their use of Cloud-based solutions.