1 / 20

Cloud Computing Storage Architecture and Costs Comparison for PDS

Cloud Computing Storage Architecture and Costs Comparison for PDS. November 08, 2018 MC F2F in Houston. Action Item. From August MC, use the storage and distribution metrics to evaluate potential cloud architecture tradeoffs and storage costs for PDS. Approach.

gallegosk
Download Presentation

Cloud Computing Storage Architecture and Costs Comparison for PDS

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Cloud Computing Storage Architecture and Costs Comparison for PDS November 08, 2018 MC F2F in Houston

  2. Action Item • From August MC, use the storage and distribution metrics to evaluate potential cloud architecture tradeoffs and storage costs for PDS.

  3. Approach • Focused only on storage of archival data for this exercise • There are a lot of options for computing close to data that has benefits on scalability that can be explored • Use PDS data sizing exercise and unfilteredweb metrics to determine costs • Data sizing estimates come from August query to nodes to will cost storage infrastructure • Web metrics com from August web logs and will cost download/egress charge • Use Amazon Web Services (AWS) cost model as the cloud hosting organization given NASA’s AWS investment across the agency as well as other organizations. • Use AWS published monthly rates for S3 (storage service) and Glacier • Note: Different cost alternatives to paying AWS costs down (e.g., prepaying) • Cost one copy of data in AWS with secondary copy on local hardware • Either primary or backup based on requirements and cost tradeoffs

  4. Relevant PDS Level 1/2/3 Requirements • 2.7 PDS will provide appropriate storage for its archive. • 3.2.1 PDS will provide online mechanisms allowing users to download portions of the archive • 4.1.4 PDS will develop and implement a disaster recovery plan for the archive • 4.1.5 PDS will meet U.S. federal regulations for preservation and management of the data through its Memorandum of Understanding (MOU) with the National Space Science Data Center (NSSDC)

  5. PDS-related Policies • PDS Policy on System Availability and Recovery. (2008-08-29) • Recover from data loss from secondary repository within one week • Recover from a catastrophic event with one month • PDS Policy on Online Data Repositories (2008-08-29) • All data will be held online in a primary repository • PDS Policy on Data Delivery and Backup (2005-10-07) • Three copies of the “volume” are preserved within PDS; two copies within PDS and one at NSSDC

  6. Three copy rule • Operational Copies • Primary Storage – Online, accessible for data distribution • Secondary Storage – Accessible to rebuild the primary repository and/or switchover as a mirror 2. Deep Archive - • PDS will meet U.S. federal regulations for preservation and management of the data through its Memorandum of Understanding (MOU) with the National Space Science Data Center (NSSDC) • PDS assumes that it can recover its entire holdings from one of the operational copy

  7. Architectural Considerations The PDS4 architecture decouples a “storage service” from a registry service to allow storage to be independent. This gives PDS tremendous flexibility to meet its requirements and policies using different storage architectures (local, hybrid, commercial cloud, PDS cloud, etc)…

  8. Amazon Web Services • Enormous ecosystem of capabilities • Storage – Simple Storage Service (S3), Glacier, etc • EC2 – Elastic Compute • Significant support for running databases, ML applications, • Ability to spin up virtual machines • This cost model looked at storage models and costs for using S3 where PDS data would be in the cloud and applications hosted locally.

  9. Architectural Models

  10. Benefits of AWS S3 • Built in REST access to any file for download • Can link in authentication • Versioning of files • 99.99% uptime • Encrypted transfers • Management of security buckets • Link to compute services either co-located (EC2/AWS Workspaces, etc) or remotely • Co-location can decrease egress and increase scalability • Custom PDS tools and services for operating on the data

  11. AWS Rates • Storage costs are generally broken into amount of storage plus egress (out of Amazon) • Writing to AWS does not carry a cost • Glacier is lower cost but is slower to access and has high costs for retrieval

  12. Node Data Volume PDS August total storage metrics was about 1.58 PBs.

  13. Node Monthly Data Distribution PDS August data distribution (egress) was approximately 79 TBs.

  14. Projected Monthly S3 Costs for ~1.5 PB of PDS data

  15. Calculated Storage Costs

  16. Findings • Imaging and then GEO drive the PDS storage costs • Both storage and egress • Most other nodes do not have the same costs drivers • Some savings can be achieved through Glacier if certain data sets could be classified that are rarely (if ever) accessed • Different architectural models could be adopted that would affect costs (storage and egress) • Primary vs Secondary Costs • Putting a subset of PDS data in the cloud

  17. Other Considerations • PDS storage growth seems to be averaging approximately 200 TB/year • Assume that adds about 15% to the cost per year • AWS does not eliminate the need for system administration • Need a trained “cloud system administrator” • Compute can be brought to the cloud through EC2 • This could reduce egress charges • Increases need for a cloud administrator • Opens up opportunity for more novel integration of computational capabilties • Other models for cloud and a ”PDS Storage Service” are possible • Other commercial vendors; hosted locally

  18. Recommendation • Given increasing importance of cloud computing, recommend PDS perform a cloud pilot for storage • EN would chair and provide a cloud instance for evaluation • Identify one or more nodes that would demonstrate a PDS4 implementation with data stored in the cloud • Present an experience report the MC in August 2019 for discussion • Feed into a longer term PDS compute and storage roadmap • Rename PMWG and task to lead a cloud pilot study • PDS has used the PMWG (Physical Media WG) in the past; the name seems to be well past its prime and should be renamed • Possible names: Storage Management WG, Infrastructure Management WG, etc

  19. Backup

  20. Study • Setup AWS/S3 instances and host a few PDS4 bundles • Assign different buckets to different nodes • Work with nodes to transfer bundles to AWS • Configure a test instances to show product access and distribution via S3 REST API • Link from EN and node hosted web applications • Demonstrate ability to easily share API for data access across nodes

More Related