1 / 19

NASA PDS SBN’s AWS Glacier Storage Costs Analysis

Explore in-depth cost analysis for uploading, storing, and retrieving data on AWS Glacier. Learn cost estimates, best practices, and case studies for optimizing storage costs effectively.

markowitz
Download Presentation

NASA PDS SBN’s AWS Glacier Storage Costs Analysis

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. NASA PDS SBN’s AWS Glacier Storage Costs Analysis P. Lawton1, V. Kannan1, J. Bauer1 and J. Padams2 1University of Maryland 2Jet Propulsion Laboratory 27 March 2019

  2. What Looked at • Costs to upload SBN holdings directly to AWS Glacier • Costs to store SBN holdings in AWS Glacier • Costs to obtain SBN holding directly from AWS Glacier

  3. Why look at this? • Holdings are getting larger. • Required to have 3 copies. • Required for one copy to be out of area. • Would this be better than mirroring?

  4. Cost Estimates • Prepared a spreadsheet to estimate costs – and determine if there are any that will surprise us • Works for • A sample set of files (requires size and quantity) • A specific list of files (size required) • Have done several interactions, but not in total agreement with AWS Cost Explorer • WHY NOT? Improved, but looking at still • Note: Based on US East 2 (Ohio) ratesand do not include taxes

  5. Upload Costs - 1 • From the AWS Glacier Pricing web site • “All data transfer into $0.00 per GB” • The requests (at least one per file) are not free • Files larger than 4 GB must be uploaded in multiple parts • Tried to upload an empty file – failed with error message • Usage Report has an entry for an UploadArchive (the request) with Value 0 • Not showing up in ArchiveCount • However, Cost Explorer shows non-zero amount for UploadArchive.

  6. Costs - Upload - Sample • Corrected no longer counts Initiate and Complete as requests • User errors for counts and GB? • Discovered 6723 Bytes-out related to Uploads in Usage report • Not enough to explain the $0.00000056 difference

  7. Be Aware - Uploads • The upload of the bytes is free BUT the request(s) to do the upload are not. • If a file is larger than 4 GB, must upload as multiple parts => more requests => more cost. • AWS recommends uploading any file larger than 100 MB as multiple parts. • For our tests, 4 GB worked fine. • Multiple of 1 MB required except for last part which is whatever bytes are remaining. • SHA 256 tree hash used. • Multipart uploads must be ‘initiated’ and ‘completed’. • Uploads are ‘Standard Tier’ only. • Found: There are Bytes-Out (cost).

  8. Costs – Storage - Sample • 1636.63 GB • 3037 files • 1636.51 GB plus overhead • Files smaller than 40 KB, treated as 40 KB • 664 files • 32 KB per file • Cost Explorer calculation dependent on number of days in current month

  9. Be Aware - Storage • Files smaller than 40 KB are treated as 40 KB • Recommendation (from AWS) – collect files into larger files • for example: zip, tar • Every file has an overhead of 32 KB • Charged for minimum of 90 days of storage (regardless of when deleted) • Groups in units referred to as ‘vaults’ • You create • You control which vault a file is loaded to • Maximum limit of 40 TB per vault • Suggested - Keep your own inventory also

  10. Different Tiers • Glacier Tiers only appear to matter for obtaining the data • all other interactions appear to be Standard Tier • Standard (also known as Tier 1) • Retrieval in 3-5 hours • $0.01/GB + $0.05/1000 requests* • Bulk (also known as Tier 2) • Retrieval in 5-12 hours • $0.0025/GB + $0.025/1000 requests* • Expedited (also known as Tier 3) • Retrieval in 1-5 minutes • $0.03/GB + $10.00/1000 requests* * US East (Ohio) pricing different regions have different pricing

  11. Costs – Retrieval & Download - Estimates • If retrieved and downloaded all those files • Notification cost difficult to separate out – JPL IT Security activities

  12. Costs – Retrieval & Download – Small Sample • Obtained 4 files • 1 via Standard Retrieval • 3 via Bulk retrieval • ~0.87 GB * differences probably related to JPL IT Security activities

  13. Sample case of 100 TB • For 100 TB assuming 25,000 files that are 4 GB each * 1 GB/month free ** 10 GB/month free

  14. Be Aware - Obtaining • Obtaining the files is a multi-step process • If wanted, configure notification (only need to do once) • Submit retrieve request • Wait (how long depends on tier selected) • where the notification is helpful • have automated process setup to check or just wait till end of expected time period • Copy file • File is only available for a time period (~24 hours) after it is ready. • If not obtained by then, re-request. • Different tiers • Expedited 1-5 minutes $$$$ • Standard 3-5 hours $$ • Bulk 5-12 hours $

  15. Be Aware – Other • Can run an inventory • Standard tier only • Pay for a standard request and the bytes in the inventory file • Delay • Can be at least 24 hours between when a file is stored to Glacier and when it will show up on the inventory • Likewise for the cost information • IT Security activities • AWS recommending S3 Glacier • More transfers – and fees – involved • There are other services available – for fees

  16. Integrity Checks • High level summary • There is not a full checksum verification every year. • Details are governed by Non-Disclosure Agreement.

  17. Pluses, Neutrals, and Minuses • Uploading and storing data are cheaper than retrieving data • Storage – cheaper than new server and disk space (plus) • Do not have to order more disk space every N years (plus) • If have to do full recovery – will not be cheap (neutral) • Depending on issue, may need to stand up entirely new server – delay (minus) • Could stand up mirror on AWS, but cost control an issue (vault owner responsible) (minus) • Limiting access possible, but why punish agraduate student for the clueless user’s request gone wrong

  18. Tibits • AWS has multiple ‘data regions’ available • Use one at least 50 miles from your location • Use one not on the same weather path (i.e., hurricane track) or fault line • Different regions have different costs!!! • https://aws.amazon.com/glacier/pricing/ • AWS term ‘archive’ – our term ‘file’ • 1 GB = 2^30 • In Cost Explorer, the storage costs factor in the number of days in the current month • Archive_Ids and multipart Upload_ids can have ‘-’ as a character. • If first character, need different syntax than that described on AWS CLI pages

  19. Acronyms & Acknowledgements • AWS – Amazon Web Services • CLI – Command Line Interface • GB – gigabyte • IT – Information Technology • KB – kilobyte • JPL - Jet Propulsion Laboratory • MB – megabytes • NASA –National Aeronautics and Space Administration • PDS – Planetary Data System • SBN – Small Bodies Node • TB - terabytes Thank you to the PDS Engineering Node for use of their AWS account.

More Related