skimData and Replica Catalogue Alessandra Forti BaBar Collaboration Meeting November 13 th 2002

skimData and Replica CatalogueAlessandra FortiBaBar Collaboration Meeting November 13th 2002 • skimData based replica catalogue • RLS (Replica Location Services) • SkimTools development • Conclusions

Replica Catalogue Intro • Central metadata catalogue is replicated at each site • All the entries are copied, update is incremental • Local metadata catalogues are used: • For local data management • Each entry (collection) has a status flag: selected for import, imported, backed up, deleted…. • For users queries through skimData • Users can query their local catalogue • Users can query other metadata catalogues to know what is at other sites. User queries might take some time (minutes). As it is now they have to repeat the same query at each site. Therefore the need of a replica system.

Replica Catalogue (1) • The replica catalogue consist of two additional tables • They contain the indices of what is at other sites • The data query is now in two steps: • Users can make his query once their database with skimData. If skimData detects the presence of the replica tables it returns all the entries corresponding to the selected stream and not only the ones on local disk. • The produced tcl file is then fed to another script (bbmkidx) that checks the selected collections against the replica tables and creates a set of tcl files. This operation takes only few seconds. The new tcl files are sequentially numbered and can contain collections that might be in more than one site.

Replica Catalogue (2) bbmkidx also creates an index file with the following format: BABAR_RAL,BABAR_RHUL,BABAR_MANCHESTER 0003,0005,0006 BABAR_BRISTOL,BABAR_QMW 0001,0002 BABAR_RAL 0004 where 000N is the sequence number of a tcl file and BABAR_SITE is the name of the site as understood by the GIIS known as resource specifier The index file is then taken by a job submission script that will transform it into an appropriate jdl file to be used to submit jobs to the EDG resource broker (see Janusz talk) The code is in cvs.

RLS Replica Location Service (1) The next step will be the integration with RLS • RLS is the EDG/Globus replica location service and it is going to replace the ldap based replica catalog • RLS is a system based on relational databases (mysql) • It is distributed rather than centralized • It consists of two levels of information: • LRC (Local Replica Catalog) with local replica or PFN (Physical File Name) informations • RLI (Replica Location Index) contains pointers to different LRC for each LFN (Logical File Name) • It doesn’t contain any metadata.

SkimTools development (1) • User requirements such as: • Select type of data i.e. run1,run2… or SP3, SP4, SP5... • Find a way to deal with transition phases i.e. when there is a change of tag • Have the tcl output with the exact number of events required even if it is less than the file number of events • Set the kanga condition files automatically in the tcl file without the release selection • Organize collections depending on their condalias • Data quality information • Optimization of the queries • Import and data management requirements • What file-system a file is on (Kanga) • Possible association of each collection with the backed up tcl file (under discussion) • Distinction between type of collections micro, mini, pointer, non-pointer

SkimTools development (2) • General problems to be solved (with the production people) • releases precedences • block constants • input release is different from the one in the input_name in skim_requests. • jobs are deleted and reinserted with duplicated keys problem for the mirror • number of input/output events • Integration with other catalogs • RLS • Newcolldb • Objectivity requirements • Check the existence of the collections • What sub-federation the collection belong

Conclusion • There’s a BaBar specific replica catalogue integrated with the EDG type job submission that needs volunteers to try it. • The integration with the EDG RLS is under study. • If you have any requirements for skimData improvements please let me know.

skimData and Replica Catalogue Alessandra Forti BaBar Collaboration Meeting November 13 th 2002

skimData and Replica Catalogue Alessandra Forti BaBar Collaboration Meeting November 13 th 2002

Presentation Transcript

6 th November 2002

Xavier Prudent - LAPP BaBar Collaboration Meeting June 2006 - Montreal

Thursday 13 th November

November 13 th

17 th Geant4 Collaboration Meeting

Tuesday, November 13 th

2012-13 Collaboration Meeting

26 th CAST Collaboration Meeting

MUCOOL Collaboration Meeting February 2002

MUCOOL Collaboration Meeting October 2002

London, March 13 th 2002

9 th GridPP Collaboration Meeting

Tuesday, November 13 th

CAGNY NOVEMBER 2002 MEETING

GridPP 5 th Collaboration Meeting

November 13 th , 2013

MARS 2020 Kick-Off Meeting 13 November 2002 Georgia Tech

CAGNY NOVEMBER 2002 MEETING