160 likes | 416 Views
Data Grids. Jon Ludwig Leor Dilmanian Braden Allchin Andrew Brown. Outline. What is a Data Grid Components of a Data Grid Data Grids of Today Amazon S3 Web Service. What is a Data Grid?.
E N D
Data Grids Jon LudwigLeor DilmanianBraden AllchinAndrew Brown
Outline What is a Data GridComponents of a Data Grid Data Grids of TodayAmazon S3 Web Service
What is a Data Grid? Distributed storage mechanism providing resources to computational gridsCheap, effective, and scalable means of recording information across multiple grid sitesThe resources, tools, and information products that can be used for data discovery and delivery from a variety of sources, typically used for the production of valuable information.
Components of a Data Grid • Case study: NERC • CSML. The Climate Science Modelling Language information Model. • The CSML Toolbox: Create and Manipulate documents which conform to the CSML schema. • The CSML Data Services. Expose documents & data pointed to. • The NDG Data Graphical User Interface - Use web service to manipulate data • Moles Schema, XQuery definitions, related software, frontend browser • Discovery Gateways & Infrastructure • Vocabulary server
Storage Resource Broker • Virtual data storage using namespaces • Maintains metadata on files, users, groups • Stored in relational DBMS • Queries supported • Has an API for other applications (e.g. Globus) • Sharing, transfer, backup
Data Grids of Today • Biomedical Informatics Research Network (BIRN) • HP's Global File systems (SFS) collaboration • NSF's iVDGL (International Virtual Data Grid Laboratory) • Now part of OSG • European Union's DataGrid Project • Now part of the Enabling Grids for E-SciencE • Natural Environment Research Counsel (NERC) • Amazon Simple Storage Solution (S3)
Amazon S3 • Amazon Simple Storage Service • Web Service - REST / SOAP / BitTorrent • Offload storage requirements to Amazon • Cost • Security • Scalable - Storage, availability, speed • Reliable - Fault tolerance, redundancy • Fast • Inexpensive - Commodity hardware • Simple - Data grid is abstracted • Flexible - Constraints
Amazon S3 - Design Principles • Decentralization - Avoid SPoF • Asynchrony - Avoid waiting on communications • Autonomy - • Local Responsibility - Nodes take care of themselves • Controlled Concurrency - Exposed operations require little or no concurrency • Failure Tolerance - Automatic recovery, minimal interruption • Controlled Parallelism - Recover quickly • Small Building Blocks • Symmetry - Nodes are identical in functionality, minimal configuration • Simplicity
Amazon S3 - Functionality • Objects - Fundamental storage unit • 1B to 5GB • Metadata • Keys uniquely identify Objects • Buckets - Namespace for managing objects • Users own Buckets • Buckets contain Objects • Unlimited Objects per Bucket • Operations • Create, Read, Write, List, Delete • Replication
Amazon S3 - Security • Public key authentication + HMAC • Access Control Lists for Buckets • Logging for Buckets • May use SSL • Integrity - MD5 • No data encryption
Amazon S3 - Disadvantages • No renaming or moving of Buckets • No content-based search • No capping capabilities • Cost
Amazon S3 - Costs • Storage • $0.15 per GB-Month of storage used • Data Transfer • $0.10 per GB - all data transfer in • $0.18 per GB - first 10 TB / month data transfer out • $0.16 per GB - next 40 TB / month data transfer out • $0.13 per GB - data transfer out / month over 50 TB • Requests • $0.01 per 1,000 PUT or LIST requests • $0.01 per 10,000 GET and all other requests
References [2]: Baru, C.; Moore, R.; Rajasekar, A. & Wan, M. (1998), The SDSC storage resource broker, in 'CASCON '98: Proceedings of the 1998 conference of the Centre for Advanced Studies on Collaborative research', IBM Press, , pp. 5. [3] Amazon S3: http://www.amazon.com/s3 [4] S. Aktas, M.; C. Fox, G. & Pierce, M. "Distributed High Performance Grid Information Service" Indiana University, 2007 [5] Garfinkel, I.; Palankar & Ripeanu. "Amazon S3 for Science Grids: a Viable Solution?" International Workshop on Data-Aware Distributed Computing, 2008
http://eu-datagrid.web.cern.ch/eu-datagrid/ http://www.ppdg.net/ http://ndg.nerc.ac.uk/S3 - http://www.amazon.com/gp/browse.html?node=16427261http://www.ivdgl.org/http://www.hp.com/techservers/hpccn/linux_gfs/index.htmlhttp://en.wikipedia.org/wiki/Data_grid