1 / 8

skimData and Replica Catalogue Alessandra Forti BaBar Collaboration Meeting November 13 th 2002

skimData and Replica Catalogue Alessandra Forti BaBar Collaboration Meeting November 13 th 2002. skimData based replica catalogue RLS (Replica Location Services) SkimTools development Conclusions. Replica Catalogue Intro. Central metadata catalogue is replicated at each site

alda
Download Presentation

skimData and Replica Catalogue Alessandra Forti BaBar Collaboration Meeting November 13 th 2002

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. skimData and Replica CatalogueAlessandra FortiBaBar Collaboration Meeting November 13th 2002 • skimData based replica catalogue • RLS (Replica Location Services) • SkimTools development • Conclusions

  2. Replica Catalogue Intro • Central metadata catalogue is replicated at each site • All the entries are copied, update is incremental • Local metadata catalogues are used: • For local data management • Each entry (collection) has a status flag: selected for import, imported, backed up, deleted…. • For users queries through skimData • Users can query their local catalogue • Users can query other metadata catalogues to know what is at other sites. User queries might take some time (minutes). As it is now they have to repeat the same query at each site. Therefore the need of a replica system.

  3. Replica Catalogue (1) • The replica catalogue consist of two additional tables • They contain the indices of what is at other sites • The data query is now in two steps: • Users can make his query once their database with skimData. If skimData detects the presence of the replica tables it returns all the entries corresponding to the selected stream and not only the ones on local disk. • The produced tcl file is then fed to another script (bbmkidx) that checks the selected collections against the replica tables and creates a set of tcl files. This operation takes only few seconds. The new tcl files are sequentially numbered and can contain collections that might be in more than one site.

  4. Replica Catalogue (2) bbmkidx also creates an index file with the following format: BABAR_RAL,BABAR_RHUL,BABAR_MANCHESTER 0003,0005,0006 BABAR_BRISTOL,BABAR_QMW 0001,0002 BABAR_RAL 0004 where 000N is the sequence number of a tcl file and BABAR_SITE is the name of the site as understood by the GIIS known as resource specifier The index file is then taken by a job submission script that will transform it into an appropriate jdl file to be used to submit jobs to the EDG resource broker (see Janusz talk) The code is in cvs.

  5. RLS Replica Location Service (1) The next step will be the integration with RLS • RLS is the EDG/Globus replica location service and it is going to replace the ldap based replica catalog • RLS is a system based on relational databases (mysql) • It is distributed rather than centralized • It consists of two levels of information: • LRC (Local Replica Catalog) with local replica or PFN (Physical File Name) informations • RLI (Replica Location Index) contains pointers to different LRC for each LFN (Logical File Name) • It doesn’t contain any metadata.

  6. SkimTools development (1) • User requirements such as: • Select type of data i.e. run1,run2… or SP3, SP4, SP5... • Find a way to deal with transition phases i.e. when there is a change of tag • Have the tcl output with the exact number of events required even if it is less than the file number of events • Set the kanga condition files automatically in the tcl file without the release selection • Organize collections depending on their condalias • Data quality information • Optimization of the queries • Import and data management requirements • What file-system a file is on (Kanga) • Possible association of each collection with the backed up tcl file (under discussion) • Distinction between type of collections micro, mini, pointer, non-pointer

  7. SkimTools development (2) • General problems to be solved (with the production people) • releases precedences • block constants • input release is different from the one in the input_name in skim_requests. • jobs are deleted and reinserted with duplicated keys problem for the mirror • number of input/output events • Integration with other catalogs • RLS • Newcolldb • Objectivity requirements • Check the existence of the collections • What sub-federation the collection belong

  8. Conclusion • There’s a BaBar specific replica catalogue integrated with the EDG type job submission that needs volunteers to try it. • The integration with the EDG RLS is under study. • If you have any requirements for skimData improvements please let me know.

More Related