70 likes | 153 Views
Update on FAX Redirection Topology Detection & V erification. Wei Yang. Redirector hardware at CERN. Redundant redirectors for EU, UK, DE, FR Redundant (the “+” sign below) VMs More to come atlas- xrd-eu.cern.ch + xrootd port 1094, cmsd port 1098 atlas - xrd-uk.cern.ch +
E N D
Update on FAX Redirection TopologyDetection & Verification Wei Yang
Redirector hardware at CERN • Redundant redirectors for EU, UK, DE, FR • Redundant (the “+” sign below) VMs • More to come • atlas-xrd-eu.cern.ch+ • xrootd port 1094, cmsd port 1098 • atlas-xrd-uk.cern.ch+ • Report to EU redirector • Xrootd port 1094,cmsd port 1098? • Same for DE and FR redirectors
Redirector topology cmsd & xrootd redirection EU rdr US rdr Mature xrootd redirection DE rdr UK rdr US rdr SLAC test machine Glasgow rdr Edinburgh (ECDF) rdr Middle West rdr Site 1 rdr Site 2 rdr Site A rdr Site B rdr • cmsd based redirection search the branch under it • xrootd based redirection is used to jump to upper level • if cmsd search return nothing • US can either report to EU redirector, or as a peer of EU • depend on needs, latency, or performance
Topology verification Goal: diagnose broken node in the topology • Deploy site specific (small) file with known checksum • Access from global redirector • Test full redirection chain + N2N • For every lower level redirector • Test a file not exists in its domain How to find out that the actual topology is? • Weknow how do this manually • We need to do it automatically and produce a graph
Testing Topology with Xrdcp $ xrdcp -f-d1 root://atlas-xrd-eu.cern.ch//atlas/dq2/user/wbhimji/HCtest/user.ilijav.HCtest.1/group.test.hc.NTUP_SMWZ.root /dev/null This is a unique file at Glasgow • Received redirection to [xrdfed01.cern.ch:1094]. A.K.A. atlas-xrd.uk.cern.chUK redirector at CERN • Received redirection to [svr025.gla.scotgrid.ac.uk:11000]. • Received redirection to [disk034.gla.scotgrid.ac.uk:1095]. EU rdr -> UK rdr -> Glasgow rdr-> Glasgow data server
Testing Bottom Up Redirection Topology: EU rdr -> UK rdr -> Glasgow rdr-> Glasgow data server $ xrdcp -f-d 1root://svr025.gla.scotgrid.ac.uk:11000//atlas/dq2/user/HironoriIto/user.HironoriIto.xrootd.wt2/user.HironoriIto.xrootd.wt2-1M /dev/null A unique file at SLAC • Received redirection to [atlas-xrd-uk.cern.ch:1094]. Token=[]]. Opaque=[tried=+fedredir_atlas@svr025.gla.scotgrid.ac.uk]. (tried : already tried myself, please exclude me) • Received redirection to [atlas-xrd-eu.cern.ch:1094]. Token=[]]. Opaque=[tried=+1098localhost]. (small mis-configure) • Received redirection to [atl-prod08.slac.stanford.edu:1094]. Token=[]]. Opaque=[]. RdrSeq: Glasgow rdr -> UK rdr -> EU rdr-> SLAC data server
Dealing with Abnormal Situation $ xrdcp-f-d 1root://atlas-xrd-eu.cern.ch//atlas/dq2/user/ilijav/HCtest/user.ilijav.HCtest.1/group.test.hc.NTUP_SMWZ.root /dev/null • Received redirection to [xrdfed02.cern.ch:1094]. (UK redirector) • Received redirection to [srm.glite.ecdf.ed.ac.uk:11000]. (ECDF @ Edinburgh) • GoToAnotherServer: Error connecting to [srm.glite.ecdf.ed.ac.uk:11000]. • Received redirection to [xrdfed02.cern.ch:1094]. • Received redirection to [srm.glite.ecdf.ed.ac.uk:11000]. • … • Set a timeout to detect abnormal situation like this