170 likes | 306 Views
ATLAS MetaData. AMI and Spitfire: Starting Point. AMI addresses…. Use case 1: Search for SM Higgs samples (e.g., needed for b-tagging studies) Use case 2: Search for dijet samples with pile-up Use case 3: Search for pile-up production done at a particular site
E N D
ATLAS MetaData AMI and Spitfire: Starting Point Tony Doyle & Gavin McCance - University of Glasgow
AMI addresses… • Use case 1: Search for SM Higgs samples (e.g., needed for b-tagging studies) • Use case 2: Search for dijet samples with pile-up • Use case 3: Search for pile-up production done at a particular site • Doesn’t yet address file association/navigation issues on the Grid. • Two Hierarchies: • Grid • Data Tony Doyle & Gavin McCance - University of Glasgow
Events.. to Files.. to Events Event 1 Event 2 Event 3 Data Files Data Files Data Files RAW Tier-0 (International) RAW RAW Data Files RAW Data File ESD Tier-1 (National) Data Files ESD ESD Data Files Data Files ESD Data AOD Tier-2 (Regional) AOD AOD Data Files Data Files Data Files AOD Data TAG Tier-3 (Local) TAG TAG TAG Data Not all pre-filtered events are interesting… Non pre-filtered events may be… File Replication Overhead. “Interesting Events List” Tony Doyle & Gavin McCance - University of Glasgow
Spitfire • Spitfire’s purpose is to allow secure access to a relational database using grid credentials. You can therefore use it in a place where you might otherwise use JDBC (or ODBC) directly. • It exposes all the basic operations that you like to do with RDBMS. • select, insert, update, bulk upload, delete • The AMI client can use Spitfire as a plug-in to access its databases in a ‘grid-enabled’ way, i.e. using grid credentials. • This relationship is discussed in the AMI document • http://isnpx1158.in2p3.fr:8180/AMI/AMI/doc/pdf/WebServices_AMI.pdf Tony Doyle & Gavin McCance - University of Glasgow
What Spitfire offers: security • Grid authentication to the DB. • Medium grained authorisation on the DB. • At the method level. e.g. insert, update, select • Authz can be based upon standard grid-mapfiles, a list of certificates, or regular expressions of certificates. • Administration webpage (servlet based) to remotely administer DB authorisation. There is also an client API for authorisation administration, if AMI prefers to do it via its own web-page based admin tool. Tony Doyle & Gavin McCance - University of Glasgow
What Spitfire offers: performance • Server side DB connection pooling to increase the robustness and performance of the service. • The framework from which Spitfire has been built can run against Oracle DB and inside Oracle 9iAS for better performance and service robustness (e.g. service clustering and failover). • This becomes an issue the harder you hit the service. Tony Doyle & Gavin McCance - University of Glasgow
VOMS authorisation • Spitfire is VOMS ‘ready’ (Virtual Organisation Membership Service) • VOMS allows VO wide authorization control. • VOMS certs are like grid certs with extra ‘tokens’ that Spitfire can look at to permit extra priviliges (or roles) upon the service (e.g. DB update role). • When user does ‘voms-proxy-init’ they can request these extra roles be added to their cert, provided they are permitted that role. Tony Doyle & Gavin McCance - University of Glasgow
Development Work • AMI has a desire to move to web services (and there onto OGSI). • Proposal would be to leverage the experience of WP2 people in collaboration with the LPSC group (who already have a working relationship with Spitfire team). • Focus would be on ATLAS end-user physics analysis in a Grid environment • Ease the move to web services and hence to OGSI • Experience in the team of deploying web services Tony Doyle & Gavin McCance - University of Glasgow
Benefits I • With the same team, AMI itself (the portal servlet and the client) gets the experience of the WP2 security modules: • Grid authentication, medium grained authorisation, hooks for finer grained authorisation if desired, standard VOMS integration, easy ATLAS authorisation administration. • Integration of these into AMI itself (with or without Spitfire). Tony Doyle & Gavin McCance - University of Glasgow
Benefits II • Continued collaboration with Spitfire work. • In the move to the OGSA based DAIS standard • OGSA based DB replication • Future work in Spitfire, allowing (power)-user defined ‘canned’ metadata queries on the server side for efficient and faster execution. • Allows AMI to maintain its interface to users, while optionally delegating some of its functionality to the DB server side for better performance. Tony Doyle & Gavin McCance - University of Glasgow
First ideas.. • Define ‘data bunch’ to be a number of events in a data file. • A bunch will have a catalog specific unique identifier (e.g. primary key) • It may also have a user friendly names (i.e. LFN) • Namespace issues (global namespaces, per user namespaces?) • Different event classes (RAW, ESD, AOD, TAG) will have different sized bunches, i.e. differing numbers of events in per file • The file size itself may or may not be different for different data sets. Tony Doyle & Gavin McCance - University of Glasgow
Activities • Bunch navigation using the MetaData Catalog (in ARDA speak) • This will solve the navigational issue. • How to I get from TAG AOD to have a closer look. • Bunch optimisation • Given the data type, what is the more efficient bunch granularity (#events per file) for different analysis use-cases. Tony Doyle & Gavin McCance - University of Glasgow
1. Bunch Navigation • Define a metadata entry per bunch (either in a MetaData catalog, or ‘on-board’ in the bunch itself). • Usual stuff: author, description. • Input tuple: event range, bunch • Processing information: • “How I was made from my input files” • Experiment and Analysis group specific; this should be fairly flexible. Tony Doyle & Gavin McCance - University of Glasgow
Example • 10k events TAG data in bunch with unique identifier “LFN:atlas/higgs/gavtag01”. Event 561 looks interesting. • Goto MetaData Catalog or look within the tag itself: look at bunch metadata for “LFN:atlas/higgs/gavtag01”. The “files used to generate me” tuple: Tony Doyle & Gavin McCance - University of Glasgow
Example • It resolves the relevant AOD bunch to fetch. Then uses its identifier (e.g. LFN) to access the file however you like • e.g. ask the ARDA File Catalog • Location of the navigational metadata? • on board the tag data files or in Metadata Catalog? • use-case dependent: there’s a difference between searching for 1 event in 100 and 1 in a million • The metadata also contains enough info to regenerate the bunch from its input files • c.f. data virtualisation Tony Doyle & Gavin McCance - University of Glasgow
2. Bunch Optimisation • RAW and ESD bunch granularity (# events per file) must be decided upon – based upon typical production use-cases. • AOD and TAG creation is more analysis-use-case specific - the object is to optimise the bunch file size for each type dependent upon the analysis use-case. • Based upon storage and networking constraints and file access patterns Tony Doyle & Gavin McCance - University of Glasgow
Summary • AMI developers have identified need for web services • Glasgow developers can provide these • Need to amalgamate/test these ideas • This requires additional effort • Would place the UK in a good position to exploit ATLAS datasets via the Grid • Feedback required from this meeting on these first ideas • If positive, we will define the deliverables with the AMI and Spitfire project developers Tony Doyle & Gavin McCance - University of Glasgow