Amazon Simple Storage Service, or S3 for short, is a common place to store backup data thanks to its safety and reliability. Users of the platform, however, might not always agree that it is “simple”, given S3's complex pricing terms and a large set of features.
With that challenge in mind, this article provides an overview of how Amazon S3 works. We'll explain S3 storage classes, pricing, features and security from the perspective of data backup and recovery. The goal is to make S3 somewhat simpler than it might appear to users trying to wrap their heads around it for the first time.
Amazon S3 is divided into 5 storage classes. They are:
- S3 Standard - for files you may need to use on a daily basis. Day-to-day documents, daily backups of production databases and so on are good candidates for this storage class.
- S3 Standard Infrequent Access - for files that are used somewhat less frequently. Think of this as a first archival tier. You might use it to back up data that is two weeks old and is not synced with more recent versions of the data, even though the older data might still be needed in case of a failure.
Standard Infrequent Access storage is less expensive for storing data, although it is more expensive to recover from. Also, this class imposes an extra deletion fee that Amazon charges if you delete data or move it to a different storage class within the first 30 days of moving it to this storage class. (If you do not touch your data for at least 30 days, you avoid this fee.)
- S3 Standard One-Zone Infrequent Access subclass - this is a subclass of Standard Infrequent Access storage. This subclass (which is abbreviated as Amazon S3 Z-IA) is the same as its parent in most ways but is less durable because in Amazon S3 Z-IA, data is stored in 1 data center instead of the 3 that are used for S3 Standard IA or S3 Standard.
- S3 Intelligent Tiering - this is a storage class, that allows to automatically move your data between Amazon S3 Standard and Amazon S3 IA classes, depending on your current needs. Learn more about S3 Intelligent Tiering class in our guide
- Glacier. When you need to preserve data for audits that happen once a year, you can create a long-term archive using this storage class. Thus, Amazon Glacier is an archive in the cloud. It’s extremely cheap to store data in this class but expensive to recover data from it. Plus, if you delete or move data from this class within the first 90 days or sooner, you have to pay an early deletion fee
- Glacier Deep Archive - this is a subclass of Glacier storage that offers a few differences in pricing. The data storage is cheaper, while recovery is more expensive, and the early deletion fee applies if you delete or move data in the first 180 days.
The prices between storage classes vary greatly. For example, you will pay $2.30/month for storing one gigabyte in Amazon S3 Standard, as compared to $0.40 to store a gigabyte in Amazon Glacier. The pricing for data recovery is also different for S3 Standard and Glacier.
Further reading S3 Storage Classes Explained
In Amazon S3, AWS bills you for multiple types of activity or service:
- Data storage. You pay a per-gigabyte cost for the data you store in S3.
- Storage requests. Requests are actions that you perform inside the storage, such as moving, deleting or listing data. S3 charges fees when you perform these actions.
- Data retrieval. This term applies to Amazon Glacier only. You need to retrieve data from Glacier prior to downloading it, and the retrieval process is a fee.
Data egress. Egress refers to data that you have moved from one S3 storage location to a different AWS region, or to somewhere else on the Internet. These data transfers have a cost.
Further reading Amazon S3 Pricing Explained
Further reading Amazon Glacier Pricing Explained
The exact price for the activities listed above on Amazon S3 storage varies depending on the cloud region that you use. AWS has data centers spread across dozens of different world regions, and each has somewhat different pricing. You can estimate total costs using the official AWS cost-calculator.
S3 storage was not built solely for backup. It’s a wide-purpose object storage solution with a huge feature set. However, you do not need to know or use every S3 feature if your primary activity is data backup.
Following are the main features that most users will want to know about if they are using S3 for backup purposes:
By default, each file has 3 copies inside Amazon S3. These copies are stored in 3 physically redundant data-centers inside one region. In AWS terms, that region is called an Availability Zone.
If you need extra storage safety, you can set up a second region in which your data will be replicated. Keep in mind that doing so doubles the price of data storage.
For better redundancy and lower costs, you can use Amazon S3 as a primary storage region and Amazon Glacier as a secondary one.
If you have Amazon EC2 machines and you perform file backups to a different region, you will have to pay only for cross-region data transfer. This is less than the cost of transferring data out of AWS to the Internet.
Versioning and Retention
Versioning allows you to create several copies of a file. That is helpful when you want to maintain several recovery points over a period of time so that you have the option of recovering from an older version of your data if necessary. For example, if your latest backup was hit by ransomware, you can recover to the version before that. Amazon S3 natively supports versioning.
As for retention policies, several Amazon S3 classes allow flexible retention settings. You can store files for 30 days in S3, then transfer them to Glacier for long-term storage.
Further reading Backup Retention Policies in CloudBerry Backup
Data Transfer Acceleration
You can speed up data uploads to Amazon S3 with the use of Amazon S3 Transfer Acceleration. After acceleration is enabled, AWS will optimize and re-route data going from your location to the needed data-centers, using AWS's own transfer channels.
AWS claims that upload speeds can improve by between 50% and 500% depending on conditions.
It is recommended to perform a speed test prior to enabling Amazon S3 Transfer Acceleration.
Further reading Amazon S3 Transfer Acceleration Explained
Initial Data Import via Amazon Snowball
At 100 megabits, it will take you roughly 25 hours to upload just 1 terabyte to the internet. What if you have a very large amount of data to back up?
In that case, you can order a suitcase full of hard drives from AWS, upload your data to them locally (where data transfer speeds will be faster) and then send them back for direct upload to S3 storage infrastructure. This solution is called the Amazon Snowball appliance. Each Snowball has 80TB of free space.
Further reading Working with AWS Snowball in CloudBerry Backup
If your storage needs are truly extreme and stretch into the hundreds of terabytes, you can order an Amazon Snowmobile. This is a truck designed to transit up to 100 petabytes to Amazon S3.
Amazon S3 Security
Identity and Access Management
AWS has developed a robust identity and access management service. You can use it to create and control different identities and rules. You don't need to master all of the rules if you only use S3 as backup storage, but you should familiarize yourself with the basic ones.
Further reading Amazon S3 Backup Security Guide
AWS encrypts all data on S3 storage by default. But what if keys from your management console are stolen and someone gets access to the contents of your storage buckets? To prevent that from happening, Amazon S3 has an option to encrypt your data with the security key of your choice.
Pay extra attention to your IAM and encryption settings in order to avoid exposing your data on the web.
Using Amazon S3 for Backup
AWS has not developed a service or software to back up data from physical or virtual machines to Amazon S3. You, therefore, need a dedicated third-party solution to perform this task.
CloudBerry Lab, which has been an Amazon Web Services technology partner for years, has developed such a backup solution: CloudBerry Backup.
Backup to any Amazon S3 Class
Cut costs by backup to the lower-cost Amazon S3 IA and Amazon Glacier storage with CloudBerry
Amazon S3 Intelligent Tiering Support
CloudBerry Backup provides the ability to back up data directly to the Intelligent-Tiering storage class.
Lifecycle and Retention Management
CloudBerry Backup fully supports data versioning in Amazon S3. You can create a flexible and automated retention policy while creating a backup plan.
AWS IAM Support
CloudBerry Backup securely works with your access and secret keys. Our SaaS solution - CloudBerry Managed Backup - works directly with IAM users to ease management and deployment for multiple users and organizations.