ATLAS MetaData

ATLAS MetaData AMI and Spitfire: Starting Point Tony Doyle & Gavin McCance - University of Glasgow

AMI addresses… • Use case 1: Search for SM Higgs samples (e.g., needed for b-tagging studies) • Use case 2: Search for dijet samples with pile-up • Use case 3: Search for pile-up production done at a particular site • Doesn’t yet address file association/navigation issues on the Grid. • Two Hierarchies: • Grid • Data Tony Doyle & Gavin McCance - University of Glasgow

Events.. to Files.. to Events Event 1 Event 2 Event 3 Data Files Data Files Data Files RAW Tier-0 (International) RAW RAW Data Files RAW Data File ESD Tier-1 (National) Data Files ESD ESD Data Files Data Files ESD Data AOD Tier-2 (Regional) AOD AOD Data Files Data Files Data Files AOD Data TAG Tier-3 (Local) TAG TAG TAG Data Not all pre-filtered events are interesting… Non pre-filtered events may be… File Replication Overhead. “Interesting Events List” Tony Doyle & Gavin McCance - University of Glasgow

Spitfire • Spitfire’s purpose is to allow secure access to a relational database using grid credentials. You can therefore use it in a place where you might otherwise use JDBC (or ODBC) directly. • It exposes all the basic operations that you like to do with RDBMS. • select, insert, update, bulk upload, delete • The AMI client can use Spitfire as a plug-in to access its databases in a ‘grid-enabled’ way, i.e. using grid credentials. • This relationship is discussed in the AMI document • http://isnpx1158.in2p3.fr:8180/AMI/AMI/doc/pdf/WebServices_AMI.pdf Tony Doyle & Gavin McCance - University of Glasgow

What Spitfire offers: security • Grid authentication to the DB. • Medium grained authorisation on the DB. • At the method level. e.g. insert, update, select • Authz can be based upon standard grid-mapfiles, a list of certificates, or regular expressions of certificates. • Administration webpage (servlet based) to remotely administer DB authorisation. There is also an client API for authorisation administration, if AMI prefers to do it via its own web-page based admin tool. Tony Doyle & Gavin McCance - University of Glasgow

What Spitfire offers: performance • Server side DB connection pooling to increase the robustness and performance of the service. • The framework from which Spitfire has been built can run against Oracle DB and inside Oracle 9iAS for better performance and service robustness (e.g. service clustering and failover). • This becomes an issue the harder you hit the service. Tony Doyle & Gavin McCance - University of Glasgow

VOMS authorisation • Spitfire is VOMS ‘ready’ (Virtual Organisation Membership Service) • VOMS allows VO wide authorization control. • VOMS certs are like grid certs with extra ‘tokens’ that Spitfire can look at to permit extra priviliges (or roles) upon the service (e.g. DB update role). • When user does ‘voms-proxy-init’ they can request these extra roles be added to their cert, provided they are permitted that role. Tony Doyle & Gavin McCance - University of Glasgow

Development Work • AMI has a desire to move to web services (and there onto OGSI). • Proposal would be to leverage the experience of WP2 people in collaboration with the LPSC group (who already have a working relationship with Spitfire team). • Focus would be on ATLAS end-user physics analysis in a Grid environment • Ease the move to web services and hence to OGSI • Experience in the team of deploying web services Tony Doyle & Gavin McCance - University of Glasgow

Benefits I • With the same team, AMI itself (the portal servlet and the client) gets the experience of the WP2 security modules: • Grid authentication, medium grained authorisation, hooks for finer grained authorisation if desired, standard VOMS integration, easy ATLAS authorisation administration. • Integration of these into AMI itself (with or without Spitfire). Tony Doyle & Gavin McCance - University of Glasgow

Benefits II • Continued collaboration with Spitfire work. • In the move to the OGSA based DAIS standard • OGSA based DB replication • Future work in Spitfire, allowing (power)-user defined ‘canned’ metadata queries on the server side for efficient and faster execution. • Allows AMI to maintain its interface to users, while optionally delegating some of its functionality to the DB server side for better performance. Tony Doyle & Gavin McCance - University of Glasgow

First ideas.. • Define ‘data bunch’ to be a number of events in a data file. • A bunch will have a catalog specific unique identifier (e.g. primary key) • It may also have a user friendly names (i.e. LFN) • Namespace issues (global namespaces, per user namespaces?) • Different event classes (RAW, ESD, AOD, TAG) will have different sized bunches, i.e. differing numbers of events in per file • The file size itself may or may not be different for different data sets. Tony Doyle & Gavin McCance - University of Glasgow

Activities • Bunch navigation using the MetaData Catalog (in ARDA speak) • This will solve the navigational issue. • How to I get from TAG  AOD to have a closer look. • Bunch optimisation • Given the data type, what is the more efficient bunch granularity (#events per file) for different analysis use-cases. Tony Doyle & Gavin McCance - University of Glasgow

1. Bunch Navigation • Define a metadata entry per bunch (either in a MetaData catalog, or ‘on-board’ in the bunch itself). • Usual stuff: author, description. • Input tuple: event range, bunch • Processing information: • “How I was made from my input files” • Experiment and Analysis group specific; this should be fairly flexible. Tony Doyle & Gavin McCance - University of Glasgow

Example • 10k events TAG data in bunch with unique identifier “LFN:atlas/higgs/gavtag01”. Event 561 looks interesting. • Goto MetaData Catalog or look within the tag itself: look at bunch metadata for “LFN:atlas/higgs/gavtag01”. The “files used to generate me” tuple: Tony Doyle & Gavin McCance - University of Glasgow

Example • It resolves the relevant AOD bunch to fetch. Then uses its identifier (e.g. LFN) to access the file however you like • e.g. ask the ARDA File Catalog • Location of the navigational metadata? • on board the tag data files or in Metadata Catalog? • use-case dependent: there’s a difference between searching for 1 event in 100 and 1 in a million • The metadata also contains enough info to regenerate the bunch from its input files • c.f. data virtualisation Tony Doyle & Gavin McCance - University of Glasgow

2. Bunch Optimisation • RAW and ESD bunch granularity (# events per file) must be decided upon – based upon typical production use-cases. • AOD and TAG creation is more analysis-use-case specific - the object is to optimise the bunch file size for each type dependent upon the analysis use-case. • Based upon storage and networking constraints and file access patterns Tony Doyle & Gavin McCance - University of Glasgow

Summary • AMI developers have identified need for web services • Glasgow developers can provide these • Need to amalgamate/test these ideas • This requires additional effort • Would place the UK in a good position to exploit ATLAS datasets via the Grid • Feedback required from this meeting on these first ideas • If positive, we will define the deliverables with the AMI and Spitfire project developers Tony Doyle & Gavin McCance - University of Glasgow

ATLAS MetaData

ATLAS MetaData

Presentation Transcript

Metadata

Metadata Considerations for ATLAS Distributed Computing

A Programmatic View of Metadata, Metadata Services, and Metadata Flow in ATLAS

Metadata

METADATA

AMI and its place in ATLAS Metadata

Metadata

Metadata

METADATA

Finding Information: Metadata in ATLAS

Metadata

ATLAS Metadata Handling and AMI Wokshop Highlights

Metadata

An Integrated Overview of Metadata in ATLAS

METADATA

Metadata

Overview of ATLAS Metadata Tools

Metadata

Metadata

ATLAS Metadata Interface

Metadata

AMI and its place in ATLAS Metadata