110 likes | 256 Views
ATLAS Distributed Data Management. Simone Campana. ATLAS DDM (DQ2). Moves from a file based system to one based on datasets Hides file level granularity from users A hierarchical structure makes cataloging more manageable However file level access is still possible
E N D
ATLAS Distributed Data Management Simone Campana
ATLAS DDM (DQ2) • Moves from a file based system to one based on datasets • Hides file level granularity from users • A hierarchical structure makes cataloging more manageable • However file level access is still possible • Scalable global data discovery and access via a catalog hierarchy • No global physical file replica catalog • but global dataset replica catalog and global dataset location catalog Files Files Sites Datasets Files Files Files Dataset
‘Global’ catalogs Dataset Repository Holds all dataset names and unique IDs (+ system metadata) Dataset Hierarchy Maintains versioning information and information on ‘container datasets’, datasets consisting of other datasets Dataset Content Catalog Maps each dataset to its constituent files This one holds info on every logical file so must be highly scalable, however it can be highly partitioned using metadata etc.. Dataset Location Catalog Stores locations of each dataset All logically global but may be distributed physically
‘Local’ Catalogs Local Replica Catalog Per grid/site/tier providing logical to physical file name mapping. In LCG, the local replica catalog is the LCG File Catalog (LFC) Currently all ‘Local’ catalogs are deployed at each ATLAS T1
LOCAL CATALOGS CENTRAL CATALOGS Dataset Name: <mydataset> Dataset Content Catalog Content: <guid1> <lfn1> <guid2> <lfn2> <…> <…> Entries: <guid1> </…/lfn1> <guid2> </…/lfn2> <…> <…> CNAF LFC Dataset Name: <mydataset> Dataset Location Catalog Dataset Location: CNAF,LYON
About LFNs • ATLAS has a naming convention for MCProd Logical File Names (LFNs) • <lfn>=csc11.007060.singlepart_e_E50.evgen.EVNT.v11004201_tid002912._00027.pool.root.1 • <project>=csc11 • <datasetname>=csc11.007060.singlepart_e_E50.evgen.EVNT.v11004201_tid002912 • In Central content catalog, it will appear the LFN as it is • In the Local Replica Catalog (LFC), the LFN is namespaced i.e. under a directory structure • /grid/atlas/dq2/<project>/<dataset>/<lfn> • /grid/atlas/dq2/csc11/csc11.007060.singlepart_e_E50.evgen.EVNT.v11004201_tid002912/csc11.007060.singlepart_e_E50.evgen.EVNT.v11004201_tid002912._00027.pool.root.1
Subscriptions • A site can subscribe to data • Dataset A is present in site Y but not site X • X subscribes to Dataset A • A is transferred to Site X and registered properly in catalogs • A kind of magic …. Site ‘X’: Does not contain Dataset A Dataset ‘A’ File 1 File 2 Site ‘Y’: Subscriptions: Contains Dataset A Dataset ‘A’ | Site ‘X’
Complications… • A dataset can be closed or not. • If you subscribe a closed dataset A to site X: • Files will be transferred to the site • The subscription will be honored and disregarded • If you subscribe a open dataset A to the site X • Files will be transferred • The subscription will remain active • If new files are added to the dataset and stored in Y, they will be streamed to X
The dq2 API • Instructions about how to install it can be found in • https://uimon.cern.ch/twiki/bin/view/Atlas/DDMOperations • https://uimon.cern.ch/twiki/bin/view/Atlas/DDM • https://uimon.cern.ch/twiki/bin/view/Atlas/ExecutorsCommon(VERY PRAGMATIC) • Once you install the API, you can run the “dq2” command to get the help page • You find the API also in AFS at CERN /afs/cern.ch/atlas/offline/external/GRID/ddm/pro02
Monitoring Subscriptions • Subscriptions can be monitored in http://atlas-ddm-monitoring.web.cern.ch/atlas-ddm-monitoring/ • To report problems about subscriptions you can • Use savannah: https://savannah.cern.ch/projects/dq2-ddm-ops/ • Report to atlas-t1-ddm-oper@cern.ch • Use GGUS: www.ggus.org