Amazon Glacier is a cloud service dedicated for storing archived data which is not likely to be retrieved often. In other words, it is designed for infrequently accessed data. Glacier has high latency of data retrieval but offers low pricing and high safety for stored archives. In this article we are going to explain Glacier’s data uploading nuances.
Working with Glacier
Glacier is a quite cost-effective solution for prolonged keeping of important data which is not used often. It is a nice choice for a company which possesses a lot of outdated electronic documentation and wants a cheap but safe storage. Amazon does not urge its customers to store more or less there, though Glacier's optimal usage model foresees archives to be kept for a longer period of time.
Glacier storage ensures high redundancy, as an archive is stored within multiple facilities at once. The archived data is secured with AES-256 encryption on the server side. Additional safety is ensured by Vault Lock policies.
The monthly storage price is fixed and varies from $0.007 to $0.013 per 1GB, depending on a region. Retrieval is free for up to 5% of average monthly storage volume, upon exceeding this limit user is charged with a retrieval fee. Deletion of data is free if this data was stored for more than 3 months, otherwise an early deletion fee would be applied. More details and Glacier retrieval pricing calculator can be found here.
Users have to set up jobs in order to download archives or archive lists in vault snapshots. These jobs run in the background and usually take several hours to complete. For uploading, two ways exist:
- Direct upload from user's instance to Glacier.
- Archiving data stored at user's S3 bucket into Glacier storage via lifecycle policies.
Let's explore both of them in details.
Direct Upload to Glacier
There is no Wizard in AWS console for uploading archives to Glacier vaults. Users have to do that by creating requests via Glacier REST API or use AWS Software Development Kits (or SDKs) for their own applications. All that requires some coding and AWS provides SDKs with Glacier support for the following programming languages:
This way of uploading is, therefore, most convenient for users with programming skills or for third-party providers who offer their own tools for Glacier storage management.
Amazon provides two alternative schemes of direct upload to Glacier:
- Upload in a single operation
- Upload in parts
Single operation option is available for up to 4GB of data. Upload in parts is recommended for archives bigger than 100MB: it transfers each part in a parallel session (size of parts is specified by user). If a session fails, only this part would be missing so user will have to resend only it alone. No additional fees are charged for multipart upload.
Scheduled Upload to Glacier from S3
Data which is already in AWS’ cloud can be moved to Glacier storage with the help of the lifecycle policy feature. If you do not urgently need some of the files stored in a S3 bucket, it is possible to schedule their transfer to a less costly place - that is what these policies are for.
You can create a policy via your AWS console, in the Properties page of your S3 bucket. Just make sure that the Archive to the Glacier Storage Class checkbox is selected. After a new policy is created, your data will be transferred to from S3 to Glacier after time specified. It will not show up in Glacier storage, however - you still could view it from S3 bucket. You would have to restore this archive from Glacier before any other operations would be available.
Scheduled upload is the best option in case user's data is already in S3. It is also a more convenient way for companies with a great flow of electronic documentation because it allows an administrator to automate the archiving of a large number of items. On the downside, this additional tier of storage results in extra storage fees plus a request fee for archiving to Glacier.
Both ways of transferring data to Glacier storage have certain pros and cons. Let us summarize their differences to make the comparison easier.
|Direct Upload||Archiving from S3|
|Time consumption||Multipart upload allows faster archiving||Scheduled archiving jobs automate the process and save time|
|Fees that apply||
|Preconditions||An interface must be set up programmatically in order to send uploading requests to AWS||Data must be stored in S3 in order to be transferred to Glacier|
|Visibility||Archives are visible on Glacier control panel||Archives are not visible on Glacier side and must be managed via S3 control panel|
CloudBerry Backup supports Amazon Glacier and you can perform direct uploads of the data to your Glacier storage. It also possible to create and manage lifecycle policies and transfer archives to Glacier directly from CloudBerry Backup user interface.