200 likes | 373 Views
Metadata: Plans and Progress of the Metadata Working Group. Rick St. Denis Glasgow University May 13,2004. Cast. Alessandra Forti Manchester University Babar Carmine Cioffi Oxford University LHCb Gavin McCance CERN EGEE Solveig Albrand Grenoble Atlas Stefan Stonjek Fermilab/Oxford CDF
E N D
Metadata: Plans and Progress of the Metadata Working Group Rick St. Denis Glasgow University May 13,2004
Cast • Alessandra Forti Manchester University Babar • Carmine Cioffi Oxford University LHCb • Gavin McCance CERN EGEE • Solveig Albrand Grenoble Atlas • Stefan Stonjek Fermilab/Oxford CDF • Tim Barrass Bristol CMS • Wyatt Merrit Fermilab CDF/D0 • Adam Lyon Fermilab CDF/D0 • Morag Burgon-Lyon Glasgow CDF • Rick St.Denis Glasgow CDF • Julie Trumbo Fermilab CDF • Paul Millar Glasgow General • Steven Hanlon Glasgow Metadata • Caitriona Nicholson
Format • Fired up by a workshop on April 26-28, 2004 in Glasgow • Goal: Answer the question “What is Metadata” in our document • Method: Provocateurs • Topics list: augmented at workshop • Got acquainted, divide and study topics, presentations together, course of action • Output of workshop: Revamped deliverables • Output of Group: Package services for release in SourceForge
Rough Agenda • Mon: • 2-3 5 min on who we are • 3-3:30 Decide on topics • 3:30-5:00 Get to Stepps, Hotel • 5:00 Meet in 2 West Ave • Tues: Provocateur sessions and research • Wed: Final Document with deliverables, Plans for future: MO, CHEP abstract
Topics • Metadata Architecture and components • Replica Catalogs, file catalogs, physics catalogs • Use Cases • Query Languages • Implementations and Performance. Technology Considerations, Performance reqs • Service architectures, Deployment Architectures • Database implementations: text/mysql/postgres/oracle/enth
Informing ourselves • SAM Services (Julie) • Arda/OGSA-DAI(Gav will outline) • AMI (Solveig) • Pool and Graphical Visualization (Carmine) • Spitfire (Paul) • SamTV (Adam) • PNPA-GGF (Rick) • Project Management (Tony)
Next Steps • Design for Keyword-Value • Schema evolution and self-describing schema • Use previous 2 to automate transition from keyword-value to query-efficient schema and determination of which queries need to be satisfied. • Unique dataset tool
Deliverables • Docs from next steps • Use casefiltered for our group (draft) • Services:Decomposition of ER-Diagram into collab diagram • Deployment Arch: Enumerate problems • Monitoring: Stats on queries(accumluate/doc) • QueryLang/Int: Survey of QL(Pool.C&L) • Tools:Wrap corba w/xml • Deliverables: longer term
Schedules • Monthly meeting Last Tues of month at 8:30/14:30/15:30 First: May 25. H323: 8272634 • Mailing list (Paul)
Metadata for the Common Physicist A working group on metadata with representatives from ATLAS, BaBar, CDF, CMS, D0, and LHCB in cooperation with EGEE have identified overlapping user requirements that may be supported by common service implementations. Classes of metadata specific to each service and their relations are described. These include a set of use cases based on compilation of various HEP documents. These documents are used to inform interfaces in existing and planned services as described in metadata schema. Emphasis is placed on the evolution of schema using keyword-value pairs that are then transformed into a normalised performant database schema. A report is made of self-description mechanisms, which coupled with updating processes, allow the APIs to remain static as the schema evolves. A presentation is made of the way use cases drive performance. Requirements are presented for the physical and logical arrangement of service implementations, dictating the degree to which the databases containing the metadata may be distributed or centralised. A set of existing monitoring tools expose the validity and completeness of the use cases for experiments in various stages of maturity. A survey of the query languages, web service interfaces and tools in use across the experiments is presented.
Future • Work to deliverables • Meet according to deadlines • Workshops according to major deadlines
Use Cases • CDF5858: physicist use case (Rick) • HEPCAL II (Solveig,Tony) • Production • Analysis • ADA: Atlas catalogs – David Adams(Steve) • D0: Wyatt • Schema Update Document: use cases?(Adam)
Services • Compare Arda and SAM approaches: Arda architecture:Gavin • Given Use cases: Define services • List Services from SAM:Services to services • Interfaces: The SAM service with one schema – the Grid services implemented in several schemas. • Interfaces: Physics catalog impact from failure of lower level services. “file content status”. • Action: outline models of access: physical/logical • Discrete or related bits of functionality: dependencies between services.Performance implications on interfaces. • Wyatt, Gavin, Rick, Julie
Deployment Architectures • Where do the services run? Application servers? Tiers of applications and databases • Replication for HA. At what tier? Application or DB? Oracle? Is it replication or mirroring. • What is the time constant for replication? • When do metadata become stale?Freshness date: status bits. • Centralized catalogs as a single point of failure: what are single points of failure. • HA strategies • Federation of metadata • Julie,Gavin,Paul,Solveig
Tools • DB: jdbc,phpi,text, mysql, msql, oracle,xml,soap,python • Dbserver • Tools on top of *sql. • Relation to deployment architectures: db access directly or application server. • Replication • Data Virtualization • Rick, Gavin, Solveig, Adam,Julie
Query Languages and Interfaces • SQL • Chains and Links (rick) • General Dimensions (Wyatt) • Queries against multiple databases. Related to deployment architecture (dimensions, c&l,SBIR II/enth) • POOL (Carmine)
Monitoring • Sam TV (Adam) • Mining and instrumenting (Caitriana) • MonAlisa • File access patterns • stats
Security • Table Access in a distributed architecture • Server to Server security • Access to the Server by the user • A standard certification protocol • VOMs • Spitfire security