1 / 17

WP2: Data Management

WP2: Data Management. Gavin McCance University of Glasgow. WP2: Data Management. Key areas covered by WP2 Current Status GDMP Services to be Delivered GridPP CPU and Bandwidth Investigation Summary. Q. u. e. r. y. O. p. t. i. m. i. s. a. t. i. o. n. &. R.

nicolel
Download Presentation

WP2: Data Management

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. WP2: Data Management Gavin McCance University of Glasgow

  2. WP2: Data Management • Key areas covered by WP2 • Current Status GDMP • Services to be Delivered GridPP • CPU and Bandwidth Investigation • Summary Gavin McCance - University of Glasgow

  3. Q u e r y O p t i m i s a t i o n & R e p l i c a M a n a g e r A c c e s s P a t t e r n M a n a g . D a t a M o v e r D a t a A c c e s s o r D a t a L o c a t o r S t o r a g e M a n a g e r M e t a D a t a M a n a g e r C a s t o r H P S S L o c a l F i l e s y s t e m WP2: Data Management • Goal: develop middle-ware infrastructure to manage petabyte-scale data H i g h L e v e l S e r v i c e s GridPP: Identify Key Areas Within Software Structure M e d i u m L e v e l S e r v i c e s Service levels reasonably well defined C o r e S e r v i c e s S e c u r e R e g i o n Gavin McCance - University of Glasgow

  4. Key Areas and Services • Concentrate mostly on M9 deliverables and where GridPP fits in • Replication • GDMP integration with Globus Replica Catalogue • Query / Replica Optimisation (not for M9!) • Investigate Genetic Algorithms for efficient optimisation of cost functions • SQL Database Service • Complements the LDAP Directory Service approach • Service Index • Efficient and scalable discovery mechanism Gavin McCance - University of Glasgow

  5. GDMP Replication • CERN’s GDMP: Asad Samar / Heinz Stockinger • Allows world-wide replication of large OO databases • Modules soon available for Objectivity, Root and FZ files (M9) • WP2: Numerous replication strategies possible • e.g. (fully) consistent synchronous replication or more lazy asynchronous replication • Reviews... • Much current discussion in WP2 and beyond… workshops? [Distributed Database Management Systems and the Data Grid, Heinz Stockiner] Gavin McCance - University of Glasgow

  6. Logical Collection Replica Catalogue Publish files Logical File Get import file list Get import file list Logical File Logical File Export Catalogue Site1 (Publisher) Physical File Site2 (Subscriber) Site3 (Subscriber) Site4 Physical File Physical File Notify subscribers of new files Import Catalogue Import Catalogue Import Catalogue GDMP Replica Catalogue • M9… GDMP now interfaced to the Globus Replica Catalogue Get import file list File Registration, Searching and Deletion implemented [GDMP Integration with Globus’ Replica Catalogue, Asad Samar] Gavin McCance - University of Glasgow

  7. Query / Replica Optimisation • Should the replica manager make a new replica? Can a query/job be split into sub-queries? Which replica to use? • Higher level service! Uses cost model to make decision... • Minimise over all subsets of data accessed in sub-queries and all physical file replicas • Preliminary work done in development of cost models… more to be studied... • GridPP can contribute to WP2! [Towards a Cost Model for Distributed and Replicated Data Stores, Heinz & Kurt Stockinger, CERN] Gavin McCance - University of Glasgow

  8. GA Approach • GridPP work will investigate uses of Genetic Algorithms for optimising complex multi-dimensional cost functions • Solutions are ‘bred’ in parallel, ranked according to the cost function, and re-bred using the best candidates using some crossing and mutation operators Multiple points evolved simultaneously; more robust against local minima Optimisations generally faster for complex functions, particularly for more unpredictable situations e.g. networks! Gavin McCance - University of Glasgow

  9. M9: SQL Database Service • LDAP? Hierarchical model assumes you know the query before designing the database! • Arbitrary / Computed queries can be expensive / impossible! • RDBMS model is better for these queries • Investigating SQL databases… • Issues with transactions to be investigated • M9 should see basic SQL insert, delete, update and select operations. • Standard protocols should be used! • e.g. Generic SQL wrapped in XML over HTTPS... PostgreSQL Gavin McCance - University of Glasgow

  10. M9: SQL Database Service • Producer / Consumer Model • A Producer adds meta-data and registers table format. • (Dynamic registration of new tables is outside M9..?) • A Consumer uses a known or registered schema (tbd!) to construct query. • translated by server to SQL.. queried.. returned to client as XML / HTML • APIs to be implemented: • JAVA, Web, Command line Gavin McCance - University of Glasgow

  11. M9: Service Index • Grid services must be able to discover each other! • Neither the ‘everyone knows...’ approach nor the hierarchical approach is scalable. Construct a ‘web’ of Service Indices sds.padova-infn.it sds.anl.gov sds.trieste-infn.it sds.infn.it sds.cern.ch sds.ral.uk sds.bologna-infn.it Allowed Hierarchical Model Gavin McCance - University of Glasgow

  12. M9: Service Index • Services publish XML based description… • e.g. name, contact protocols / details, type, who can know about me. • JINI style ‘leases’: services must report periodically or be dropped from list • Clients query service-indices using XML based query with standard schema (tbd!)… • M9 will see basic propagation of queries. • Security: Services must be able to limit who can access their description ! • Coarse grained.. • Other than this, the service index will not provide any access policy control..! Gavin McCance - University of Glasgow

  13. M9: Service Index • Service descriptions should be small! (<1k) • User defined (eg. experiment specific) schema should be ~ discouraged. • After M9.. more intelligent web traversing tools can be developed! • Agent technology? • How to find a service index?? • Hard wired ‘root’ service indices?? • Limited scope multicast advertising?? Gavin McCance - University of Glasgow

  14. CPU and Bandwidth Monitoring • Scalable CPU Monitoring system for ScotGRID cluster with JAS GUI being developed General cluster overview More detailed individual node information Gavin McCance - University of Glasgow

  15. CPU and Bandwidth Monitoring • Network measurement tools being evaluated and developed Bandwidth measurement from UDP packet dispersion b b Δt MonitorX Pipechar IPERF Gavin McCance - University of Glasgow

  16. CPU and Bandwidth Monitoring • Other methods / tools being investigated and developed Bandwidth measurement from Round-trip-time (RTT) using UDP, TC/PIP and ICMP Uses RTT through routers as a function of packet size to obtain bandwidth mptraceu pathchar Gavin McCance - University of Glasgow

  17. Summary • GDMP Replication Manager completed • Active discussion in WP2 and beyond about replication strategies • Cost models… GA approach? • SQL Database Service being investigated for M9 • Service Index being investigated for M9 • CPU and Network Monitoring work is underway in ScotGRID... Gavin McCance - University of Glasgow

More Related