190 likes | 198 Views
Explore in-depth cost analysis for uploading, storing, and retrieving data on AWS Glacier. Learn cost estimates, best practices, and case studies for optimizing storage costs effectively.
E N D
NASA PDS SBN’s AWS Glacier Storage Costs Analysis P. Lawton1, V. Kannan1, J. Bauer1 and J. Padams2 1University of Maryland 2Jet Propulsion Laboratory 27 March 2019
What Looked at • Costs to upload SBN holdings directly to AWS Glacier • Costs to store SBN holdings in AWS Glacier • Costs to obtain SBN holding directly from AWS Glacier
Why look at this? • Holdings are getting larger. • Required to have 3 copies. • Required for one copy to be out of area. • Would this be better than mirroring?
Cost Estimates • Prepared a spreadsheet to estimate costs – and determine if there are any that will surprise us • Works for • A sample set of files (requires size and quantity) • A specific list of files (size required) • Have done several interactions, but not in total agreement with AWS Cost Explorer • WHY NOT? Improved, but looking at still • Note: Based on US East 2 (Ohio) ratesand do not include taxes
Upload Costs - 1 • From the AWS Glacier Pricing web site • “All data transfer into $0.00 per GB” • The requests (at least one per file) are not free • Files larger than 4 GB must be uploaded in multiple parts • Tried to upload an empty file – failed with error message • Usage Report has an entry for an UploadArchive (the request) with Value 0 • Not showing up in ArchiveCount • However, Cost Explorer shows non-zero amount for UploadArchive.
Costs - Upload - Sample • Corrected no longer counts Initiate and Complete as requests • User errors for counts and GB? • Discovered 6723 Bytes-out related to Uploads in Usage report • Not enough to explain the $0.00000056 difference
Be Aware - Uploads • The upload of the bytes is free BUT the request(s) to do the upload are not. • If a file is larger than 4 GB, must upload as multiple parts => more requests => more cost. • AWS recommends uploading any file larger than 100 MB as multiple parts. • For our tests, 4 GB worked fine. • Multiple of 1 MB required except for last part which is whatever bytes are remaining. • SHA 256 tree hash used. • Multipart uploads must be ‘initiated’ and ‘completed’. • Uploads are ‘Standard Tier’ only. • Found: There are Bytes-Out (cost).
Costs – Storage - Sample • 1636.63 GB • 3037 files • 1636.51 GB plus overhead • Files smaller than 40 KB, treated as 40 KB • 664 files • 32 KB per file • Cost Explorer calculation dependent on number of days in current month
Be Aware - Storage • Files smaller than 40 KB are treated as 40 KB • Recommendation (from AWS) – collect files into larger files • for example: zip, tar • Every file has an overhead of 32 KB • Charged for minimum of 90 days of storage (regardless of when deleted) • Groups in units referred to as ‘vaults’ • You create • You control which vault a file is loaded to • Maximum limit of 40 TB per vault • Suggested - Keep your own inventory also
Different Tiers • Glacier Tiers only appear to matter for obtaining the data • all other interactions appear to be Standard Tier • Standard (also known as Tier 1) • Retrieval in 3-5 hours • $0.01/GB + $0.05/1000 requests* • Bulk (also known as Tier 2) • Retrieval in 5-12 hours • $0.0025/GB + $0.025/1000 requests* • Expedited (also known as Tier 3) • Retrieval in 1-5 minutes • $0.03/GB + $10.00/1000 requests* * US East (Ohio) pricing different regions have different pricing
Costs – Retrieval & Download - Estimates • If retrieved and downloaded all those files • Notification cost difficult to separate out – JPL IT Security activities
Costs – Retrieval & Download – Small Sample • Obtained 4 files • 1 via Standard Retrieval • 3 via Bulk retrieval • ~0.87 GB * differences probably related to JPL IT Security activities
Sample case of 100 TB • For 100 TB assuming 25,000 files that are 4 GB each * 1 GB/month free ** 10 GB/month free
Be Aware - Obtaining • Obtaining the files is a multi-step process • If wanted, configure notification (only need to do once) • Submit retrieve request • Wait (how long depends on tier selected) • where the notification is helpful • have automated process setup to check or just wait till end of expected time period • Copy file • File is only available for a time period (~24 hours) after it is ready. • If not obtained by then, re-request. • Different tiers • Expedited 1-5 minutes $$$$ • Standard 3-5 hours $$ • Bulk 5-12 hours $
Be Aware – Other • Can run an inventory • Standard tier only • Pay for a standard request and the bytes in the inventory file • Delay • Can be at least 24 hours between when a file is stored to Glacier and when it will show up on the inventory • Likewise for the cost information • IT Security activities • AWS recommending S3 Glacier • More transfers – and fees – involved • There are other services available – for fees
Integrity Checks • High level summary • There is not a full checksum verification every year. • Details are governed by Non-Disclosure Agreement.
Pluses, Neutrals, and Minuses • Uploading and storing data are cheaper than retrieving data • Storage – cheaper than new server and disk space (plus) • Do not have to order more disk space every N years (plus) • If have to do full recovery – will not be cheap (neutral) • Depending on issue, may need to stand up entirely new server – delay (minus) • Could stand up mirror on AWS, but cost control an issue (vault owner responsible) (minus) • Limiting access possible, but why punish agraduate student for the clueless user’s request gone wrong
Tibits • AWS has multiple ‘data regions’ available • Use one at least 50 miles from your location • Use one not on the same weather path (i.e., hurricane track) or fault line • Different regions have different costs!!! • https://aws.amazon.com/glacier/pricing/ • AWS term ‘archive’ – our term ‘file’ • 1 GB = 2^30 • In Cost Explorer, the storage costs factor in the number of days in the current month • Archive_Ids and multipart Upload_ids can have ‘-’ as a character. • If first character, need different syntax than that described on AWS CLI pages
Acronyms & Acknowledgements • AWS – Amazon Web Services • CLI – Command Line Interface • GB – gigabyte • IT – Information Technology • KB – kilobyte • JPL - Jet Propulsion Laboratory • MB – megabytes • NASA –National Aeronautics and Space Administration • PDS – Planetary Data System • SBN – Small Bodies Node • TB - terabytes Thank you to the PDS Engineering Node for use of their AWS account.