Amazon S3 is one of the core services offered by AWS. It is rightly marketed as “storage for the internet”. It has a wide variety of use cases, from serving static websites to hosting images, managing data lakes, and much more.
In this post, we will review the ins and outs of S3 Glacier. S3 Glacier is a special storage class of Amazon S3 that provides extremely cheap storage.
In return for the low cost, you agree to slower access times, as reading data from S3 Glacier can take minutes or days. It is a cost-effective tool for low-access, long-term storage such as archives required for compliance.
Table of Contents
AWS announced some great new features for S3 Glacier at AWS re:Invent 2018. This post will review both the existing features and the newer ones to give you a full understanding of the landscape for S3 Glacier. This post covers:
- The two S3 Glacier storage classes;
- Methods for getting data into S3 Glacier;
- Tips for restoring data from S3 Glacier.
Glacier Storage Classes
The first storage class for S3 Glacier is the classic S3 Glacier class. It was announced in August 2012 and is intended for long-term storage that you don’t need to access quickly.
The second storage class for Glacier is S3 Glacier Deep Archive. This storage class was announced at AWS re:Invent 2018 and is intended for extremely long-term archival with low access needs. This fits well in regulated areas, such as healthcare or financial services, where there are compliance requirements around data retention.
How to use cold storage - like Amazon S3 Glacier - cost-effectively and efficiently? Find out in our whitepaper:
Below are a few axes on which to compare S3 Glacier against S3 Glacier Deep Archive.
Amazon S3 Glacier vs Amazon S3 Glacier Deep Archive
Cost is a key motivator for using S3 Glacier Deep Archive over the classic S3 Glacier. A GB of data in S3 Glacier Deep Archive costs only $0.00099 per month, meaning you can store a terabyte of data in Deep Archive for only $1.01 per month.
While Deep Archive is very cheap, S3 Glacier is pretty cheap as well. A GB of data in S3 Glacier is charged at $0.004 per month. A terabyte in S3 Glacier will set you back about $4.10 per month. This is four times the cost of S3 Glacier Deep Archive, but it’s still only one-sixth the price of storage in S3 Standard. Further, there are additional benefits as discussed below.
One of the reasons the S3 Glacier storage classes are cheap compared to the other S3 storage classes is that you are giving up instant access to your data. Rather than retrieving your S3 objects with sub-second latency, retrieving data from Glacier will take minutes or hours.
The classic S3 Glacier storage class has three options for your retrieval time:
- Expedited: With Expedited retrieval, you can access your data within a few minutes. Expedited retrievals are priced at $0.03 per GB and $0.01 per request. This is the fastest and most expensive option.
- Standard: Standard retrieval makes your data accessible within 3-5 hours. Standard retrievals are priced at $0.01 per GB and $0.05 per 1000 requests.
- Bulk: Bulk retrieval requests are the slowest option and usually take 5-12 hours before your data is accessible. Bulk retrievals are priced at $0.0025 per GB and $0.025 per 1,000 requests. This is the cheapest option and is great for restoring huge amounts of data that you don’t need immediately.
All prices above are for the US East 1 region in Northern Virginia. Prices in other regions will vary.
The S3 Glacier Deep Archive has only one option for retrieval time. Retrieval from S3 Glacier Deep Archive takes around 12-48 hours. S3 Glacier Deep Archive is still in preview and thus the cost for retrieval has not been announced.
Minimum Storage Duration
The final axis on which to compare S3 Glacier vs. S3 Glacier Deep Archive is on minimum storage duration. Because S3 Glacier is designed for long-term storage, AWS will charge you if you delete your data too quickly after storing it in Glacier.
For classic S3 Glacier, the minimum storage duration is 90 days. If you delete an object sooner than 90 days after placing it into S3 Glacier, you will be charged a prorated fee of $0.012 per GB. This charge is prorated over the 90 days. If you delete your object 45 days after placing it into S3 Glacier, you would pay half of the fee ($0.006 per gigabyte).
For S3 Glacier Deep Archive, minimum storage duration is 180 days. The pricing structure works similar to with S3 Glacier -- you are charged a prorated fee of if you delete your object before the minimum storage duration.
S3 Glacier vs S3 Glacier Deep Archive Comparison Table
|Storage Price||Retrieval speeds||Early deletion fee|
|Amazon S3 Glacier||$0.004/GB||Expedited: Minutes|
Standard: 4-5 hours
Bulk: 5-12 hours
|Amazon S3 Glacier Deep Archive||$0.00099/GB||Standard: 12 hours|
Bulk: 48 hours
Now that you understand the basic storage classes with S3 Glacier, let’s review how to get data into S3 Glacier.
Storing Data in S3 Glacier
There are two ways to get data into S3 Glacier. The first and most common way is to transition existing data from the standard S3 storage classes into an S3 Glacier storage class. The key here is that the data already exists first in S3 before being sent to Glacier.
At AWS re:Invent 2018, AWS announced support for S3 PUT to Glacier. This means you don’t need to place your archival data into S3 first before transitioning to an S3 Glacier storage class. Rather, you can insert your data directly into S3 Glacier.
Let’s review how to use each of these patterns to get data into S3 Glacier.
Transitioning Data from S3 into S3 Glacier
Many companies use S3 Glacier to store formerly ‘hot’ data that has gone ‘cold’. Hot data is data that is accessed frequently and/or needs to be available quickly. Cold data is data that is unlikely to be accessed often. One example of hot data that has gone cold is a month’s worth of weekly backups of your databases. Within the first 30 days of the files being created, you may want them easily accessible for disaster recovery scenarios. After 30 days, they are unlikely to be used but kept available to extreme circumstances.
The standard S3 storage classes are a good use for hot data, as access times for these storage classes are sub-second. The S3 Glacier storage classes are a good use for cold data, hence the name Glacier.
AWS has simplified the transition of hot data to cold data through the use of object lifecycle policies. Object lifecycle policies allow you to specify rules where your data will be transitioned from one storage class to another, or even deleted altogether, after a specified time period.
In our example, you could store the initial backups in S3 Standard. You could then set an object lifecycle policy to transition each backup from S3 Standard to S3 Glacier after 30 days. This object lifecycle policy would transfer each backup to S3 Glacier 30 days after it had been uploaded.
Configuring an Object Lifecycle Policy with CloudBerry
You can configure an object lifecycle policy to move objects from S3 Standard to S3 Glacier with CloudBerry Backup. To set up an object lifecycle policy, follow the instructions in this article on Lifecycle Policies in CloudBerry Backup.
Uploading Files Directly to S3 Glacier
Sometimes you may want to load data directly into cold storage like S3 Glacier without needing any time for fast, regular access to the data in a standard S3 storage class. An example here is data that is stored solely for compliance purposes without any need for the quick display to end users.
For the first few years of S3 Glacier’s existence, it was difficult to upload files directly to S3 Glacier. Because of this, the recommended way to quickly move files to S3 Glacier was to set an object lifecycle policy of 0 days for a particular bucket or prefix. Any data that was loaded into that bucket or prefix would immediately be transitioned to S3 Glacier according to your policy.
Starting from the version 6.0., CloudBerry Backup supports direct upload to Glacier as well as S3 intelligent-tiering. To get more information, please refer to the corresponding blog post.
At AWS re:Invent 2018, AWS announced S3 PUT to Glacier. This unifies the S3 experience such that you can upload files directly to S3 Glacier similar to how you upload files to standard S3 storage classes.
In the sections below, you can see how to upload an object directly to Glacier using:
- CloudBerry Explorer
- AWS Tools for PowerShell
- AWS CLI
Uploading Files Directly to S3 Glacier with CloudBerry Explorer
Uploading Files Directly to S3 Glacier with AWS Tools for PowerShell
You can also use the AWS Tools for PowerShell to upload directly to S3 Glacier from your command line.
To do so, use the following steps:
Be sure to set your own BucketName and File parameter to match the desired bucket and file you want.
Uploading Files Directly to S3 Glacier with the AWS CLI
Finally, you can use the AWS CLI to upload objects directly to S3 Glacier.
To use the AWS CLI to upload to S3 Glacier, use the following steps:
1Make sure you have installed the AWS CLI.
2Use the following command to upload a file to S3 Glacier:
aws s3 cp myfile.jpg s3://my-bucket --storage-class GLACIER
3Change the “myfile.jpg” to match the file you want to be uploaded, and “my-bucket” to the name of the bucket you want to use.
Now that we know how to place objects into Glacier, let’s review how to pull objects back out of Glacier.
Restoring Objects from S3 Glacier
When putting data in S3 Glacier, your hope is that you’ll never need to view that data again. However, sometimes you do need to use that data. This section covers how to get data back out of S3 Glacier.
To get objects out of S3 Glacier, you need to make a request to restore the object. Restoring the object will pull it out of Glacier and into a standard S3 storage class where it can be accessed immediately.
When making a restore request from Glacier, there are a few things to consider:
- Restore speed: As mentioned in the Glacier Storage Classes section above, AWS provides three options on restore speed -- expedited, standard, and bulk. The right choice for you depends on your budget and how quickly you need the data.
- How long to keep the data: When restoring data from Glacier, you specify an amount of time to keep the restored data in S3. This helps you to save on cost if you only need the restored data for a short period. If you need the restored data for longer, copy the restored object to a permanent location in S3.
Further reading Temporary Restore from Glacier with CloudBerry
- Cost and AWS Free Tier: The costs for restoring data will be affected by a few factors: the size of the data you’re restoring, the restore speed you use, and the retention time you specify.
AWS does provide a Free Tier for S3 Glacier.
Further reading Amazon Glacier Pricing Explained.
While the basic restore functionality has been a core of S3 Glacier since the beginning, AWS has been steadily adding features over time. Let’s check out two new restore features that were announced at AWS re:Invent 2018.
Upgrading S3 Glacier Restore Speed
As of November 2018, AWS allows you to upgrade the speed of a Glacier restore after you have started. This can be useful when your restore job is taking too long, and you want to speed up the process.
To upgrade the speed of an existing S3 Glacier restore request, you make a new restore request on the same object. The new store request must use a faster restore speed than the existing restore -- you cannot downgrade your restore speed.
You may not change other details about your restore after it has been started, such as the number of days to retain your objects after they are restored.
Receiving S3 Glacier Restore Notifications
One of the downsides of using S3 Glacier is that restoring an object takes a variable amount of time. When you kick off a restore, it could be hours before your object is finally restored and ready to use. This can mean checking back regularly in S3 to see if your object exists.
In November 2018, AWS added the ability to receive S3 event notifications when your restore has finished. Rather than polling for the object to exist, you can receive a notification in an SQS queue, an SNS topic, or a Lambda function once your restore has completed. This functionality allows you to notify yourself when the restore has completed or even programmatically handle the event as needed without even notifying you.
In this post, we covered the powerful S3 Glacier service provided by Amazon. S3 Glacier is a reliable, cost-effective way to store low-usage data for long periods of time. We covered the storage options in S3 Glacier, how to add data to S3 Glacier, and how to restore data from S3 Glacier.
Like most AWS service, S3 Glacier is continually adding new features over time. In November 2018, we saw some powerful additions to S3 Glacier, including:
- An additional, cheaper storage class called S3 Glacier Deep Archive;
- The ability to easily send objects directly to Glacier without putting in S3 first;
- The ability to upgrade the restore speed on an S3 Glacier restore request; and
- Functionality to receive S3 event notifications when your restore is complete.
These new additions make S3 Glacier an even stronger offering for long-term archival.