270 likes | 472 Views
Data Grid Services for PNC. Wei-Long, Ueng Academia Sinica Grid Computing Center wlueng@twgrid.org. Outlines. Introduction Data Grid Architecture Common Data Grid Services Application Integration Summary. Information Management Technologies. Data collecting
E N D
Data Grid Services for PNC Wei-Long, Ueng Academia Sinica Grid Computing Center wlueng@twgrid.org
Outlines • Introduction • Data Grid Architecture • Common Data Grid Services • Application Integration • Summary
Information Management Technologies • Data collecting • Sensor systems, object ring buffers and portals • Data organization • Collections, manage data context • Data sharing • Data grids, manage heterogeneity • Data publication • Digital libraries, support discovery • Data preservation • Persistent archives, manage technology evolution • Data analysis • Processing pipelines, manage knowledge extraction
Managing Data • Historically data has been STORED rather than MANAGED • Problems arising from this include: • Scaling • Distribution • Access Control, Authentication, Security • Data Migration • Data Creation
Data Management Concepts • Collection • The organization of digital entities to simplify management and access. • Context • The information that describes the data objects in a collection. • Content • The data objects in a collection
Data Management Challenges • Distributed data sources • Management across administrative domains • Heterogeneity • Multiple types of storage repositories • Scalability • Support for billions of digital entities,PetaBytes of data • Preservation • Management of technology evolution
Data Grids • Distributed data sources • Inter-realm authentication and authorization • Heterogeneity • Storage repository abstraction • Scalability • Differentiation between context and content management • Preservation • Support for automated processing (migration, archival processes)
Data Grid Transparencies • Find data without knowing the identifier • Descriptive attributes • Access data without knowing the location • Logical name space • Access data without knowing the type of storage • Storage repository abstraction • Retrieve data using your preferred API • Access abstraction • Provide transformations for any data collection • Data behavior abstraction
Data Grid Goals • Automate all aspects of data analysis • Data discovery • Data access • Data transport • Data manipulation • Automate all aspects of data collections • Metadata generation • Metadata organization • Metadata management • Preservation
Data Grid Components • Federated client-server architecture • Servers can talk to each other independently of the client • Infrastructure independent naming • Logical names for users, resources, files, applications • Collective ownership of data • Collection-owned data, with infrastructure independent access control lists • Context management • Record state information in a metadata catalog from data grid services such as replication • Abstractions for dealing with heterogeneity
Storage Resource Broker • Developed at San Diego Supercomputer Center • A distributed file management system (Data Grid), based on a client-server architecture. • Allows users to access files seamlessly across a distributed environment, based upon their attributes rather than just their names or physical locations. • It replicates, syncs, archives, and connects heterogeneous resources in a logical and abstracted manner.
Oracle RDBMS Oracle Client SRB Server SRB Server SRB Server SRB Server User @ location X Storage Driver Storage Driver Storage Driver Storage Space Storage Space Storage Space SRB Physical Structure SRB Vault @ location B SRB Vault @ location B SRB Vault @ location D
PNC SRB Server SRB Server SRB Server SRB Server SRB Storage Servers SRB Storage Servers SRB Server SRB Server SRB Server SRB Server SRB Storage Servers SRB Storage Servers PNC Data Grid Service Framework App App App App App PNC-SRB Multiple Servers Web Server MES MES MES MES MCAT Server Oracle Client DB-Instance-1 DB-Instance-2 Oracle RAC Database Server MCAT Database Schema1 Schema2 Schema3 Schema4
SRB DB For inter-organizational collaboration Trust Relation Trust Relation TWGrid Zone PNC Zone ECAI Zone SRB SRB DB DB
Common Services • Data Services • Data Object Service • Profile Service • Web Grid Service • Data I/O Service • Application Service • Security Services • VO Service • CA Service
Common Services (Cont.) • Catalog and Archive Services • Catalog and Archive Web Application • Metadata Services • Metadata Service • Metadata Web Application • Data Identifier Services • Query Service • Query Expression
Common Services (Cont.) • Server Management Services • Server/Network Controller • Server/Network Monitor • Server Manager Application • Resources Monitoring
Applications Integration • Digital Archives/Digital Library • Bioinformatics • Atmosphere Science • GIS • Earth Observation Science • Biodiversity • …
Statistics of DADG (till 26 Oct. 2005)
Applications Portal/Web Client Users Linux MCAT / srb001 NTU/monsoon(TB) windows NTNU/dms ASCC/lcg00104(TB) NCU/databank ASCC/srb002 ASCC/lcg00105(TB) ASCC/gis212(TB) NTU/dbar_rs1, dbar_rs_2 ASCC/gis252(TB) Atmosphere Science Integration Command Lines
Summary • By integrating data grids, digital libraries, and persistent archives we will be able to maintain the consistency of federated data collections while flowing information and data from digital entities through grid services into preservation environments. • Imagine what we can do for your project.