Backup and archive are terms that you might hear used interchangeably. In reality, however, they are not at all the same thing. There are important differences between data backups and data archives.
Understanding these differences is crucial in order to ensure that your data processes meet your needs. Doing data backups when you instead require a data archive, or vice versa, can have very negative results when it comes time to retrieve data.
This article defines data backups and data archives and explains the differences between them.
What is the purpose of a backup? Generally, its main goal is to ensure that you can recover in the event that something unexpected happens to your data -- such as a disk drive failing, files accidentally being deleted or a data center going offline during a catastrophic event.
In other words, data backups provide protection against the unexpected. You hope you never need them, but if you don’t have them, you won’t be prepared for critical disruptions to your infrastructure.
Data archiving is a solution for storing data for long periods of time. They ensure that important records remain available years after they were created.
In most cases, the purpose of data archiving is to meet legal compliance requirements. These requirements vary depending on the industry you operate in, as well as the jurisdictions that govern you, but in many cases businesses are subject to data retention regulations. A doctor’s office might be required to keep patient records for a certain period of time, for example, or a bank may need to retain transaction records.
Backup vs Archive
Now that we’ve discussed the different purposes of data backups and data archives, let’s look at the key differences regarding how each is created and used.
Data backups do not need to be immediately accessible. Although your disaster recovery plan should include an efficient process for restoring data, it may take some time to recover data from backups because copying large amounts of data from a backup location is a time-consuming process.
In contrast, archives should always be readily accessible. If you need to access a stored record for compliance purposes, you don’t want to have to wait hours or days to retrieve it.
Data Storage method
Data backups are typically made by copying data from production systems to a secondary storage location. The original data remains in place, while a backup copy exists elsewhere and can be used to restore the data in the event of a failure in your main systems.
Archived data is instead moved from its original location to an archive storage location. By moving the data rather than copying it, organizations can usually achieve lower data storage costs.
Backups are usually stored in “hot” storage locations that support rapid changes to data -- such as an S3 bucket on AWS, Google Cloud storage or Azure Blog Storage’s Hot tier. Backups can also exist on easily accessible local storage locations, such as a NAS device.
Archives, on the other hand, are typically stored either using tape archives or on a “cold” storage solution in the cloud. Examples of cloud-based cold storage services include Amazon Glacier, Azure Archive Blob Storage and Coldline Storage on Google Cloud. It typically takes longer to move data into and out of cold storage services than it does with hot storage, but cold storage is less expensive.
Backed-up data is constantly changing. Whenever you take a new backup, you modify your backup data.
Data archives are the opposite. They are static. Once you create an archive, you typically do not modify it.
Data Retention Policy
Backed-up data is not stored permanently. You periodically delete or overwrite data backups that are too old to be useful. If you didn’t, you’d end up storing a large amount of outdated backup data, which would be very cost-inefficient.
Data archives are designed for long-term storage. You keep them for however many years your compliance policies or other needs require.
When you backup data, you generally back up all of your data, with the exception of unimportant information like temporary files. If you have only part of your data, you wouldn’t be able to restore your systems to a working state in the event of a failure.
Because data archives are retained for long periods of time, archiving all of your data is not usually feasible. Instead, you archive only the specific files that you must retain for compliance purposes. These might include patient records, for example, but not application logs or configuration files.
Conclusion: Using Backups and Archives Together
Data backups and data archives serve different purposes. You can’t use one as a substitute for the other.
You should instead perform both backups and archives. Backups won’t help you satisfy compliance requirements, and archives won’t allow you to restore all of your data following a major failure. If you have both backups and archives, however, you’re prepared for both challenges.
To perform backups and archives in a cost-effective way, you should distinguish data that needs to be part of a backup from data that should instead be archived. The latter type of data is known as “cold” data because it can be placed in “cold” storage -- in other words, storage locations that do not have to be accessed frequently, and are therefore less expensive to maintain.