180 likes | 365 Views
xrootd Demonstrator Infrastructure. OSG All Hands Meeting Harvard University March 7-11, 2011 Andrew Hanushevsky, SLAC http://xrootd.org. Goals. Describe xrootd architecture configurations Show how these can be used by demos Alice (in production) , Atlas, and CMS
E N D
xrootdDemonstrator Infrastructure OSG All Hands Meeting Harvard University March 7-11, 2011 Andrew Hanushevsky, SLAC http://xrootd.org
Goals • Describe xrootdarchitecture configurations • Show how these can be used by demos • Alice (in production), Atlas, and CMS • Overview of the File Residency Manager • How it addresses file placement • Cover recent and future developments • Conclusion
The Motivation • Can we access HEP data as a single repository? • Treat it like a Virtual Mass Storage System • Is cache-driven grid data distribution feasible? • The last missing file issue (Alice production) • Adaptive file placement at Tier 3’s (Atlas demo) • Analysis at storage-starved sites (CMS demo) • Does xrootd provide the needed infrastructure?
A Simple xrootd Cluster Manager (a.k.a. Redirector) 1: open(“/my/file”) Client 4: Try open() at A xrootd xrootd xrootd xrootd cmsd cmsd cmsd cmsd 5: open(“/my/file”) 3: I DO! 3: I DO! 2: Who has “/my/file”? Data Servers A /my/file B C /my/file
The Fundamentals • An xrootd-cmsd pair is the building block • xrootd provides the client interface • Handles data and redirections • cmsd manages xrootd’s (i.e. forms clusters) • Monitors activity and handles file discovery • The building block is uniformly stackable • Can build a wide variety of configurations • Much like you would do with LegoÒ blocks • Extensive plug-ins provide adaptability
Federating xrootd Clusters Client Meta-Manager (a.k.a. Global Redirector) S e r v e r s S e r v e r s S e r v e r s Data is uniformly available from three distinct sites 1: open(“/my/file”) 2: Who has “/my/file”? 5: Try open() at ANL 7: Try open() at A 6: open(“/my/file”) 4: I DO! 4: I DO! Manager (a.k.a. Local Redirector) Manager (a.k.a. Local Redirector) Manager (a.k.a. Local Redirector) xrootd xrootd xrootd xrootd xrootd xrootd xrootd xrootd xrootd xrootd xrootd xrootd xrootd cmsd cmsd cmsd cmsd cmsd cmsd cmsd cmsd cmsd cmsd cmsd cmsd cmsd 8: open(“/my/file”) B A C /my/file ANL SLAC UTA C C C /my/file /my/file /my/file But I’m behind a firewall! Can I still play? An exponentially parallel search! (i.e. O(2n)) A A A /my/file /my/file B B B Distributed Clusters 3: Who has “/my/file”? 3: Who has “/my/file”? 3: Who has “/my/file”?
Firewalls & xrootd • xrootd is a very versatile system • It can be a server, manager, or supervisor • Desires are all specified in a single configuration file • libXrdPss.so plug-in creates an xrootd chameleon • Allows xrootd to be a client to another xrootd • So, all the basic roles can run as proxies • Transparently getting around fire-walls • Assuming you run the proxy role on a border machine
A Simple xrootd Proxy Cluster Proxy Manager (a.k.a. Proxy Redirector) Proxy Managers Can Federate With a Meta-Manager Client X Y Proxy Servers 3: open(“/my/file”) Border Machines 2: Try open() at X Firewall 4: open(“/my/file”) 1: open(“/my/file”) 7: Try open() at A Manager (a.k.a. Redirector) xrootd xrootd xrootd xrootd xrootd xrootd xrootd cmsd cmsd cmsd cmsd cmsd cmsd cmsd 8: open(“/my/file”) 6: I DO! 6: I DO! How does help in a Federated cluster? 5: Who has “/my/file”? Data Servers A /my/file B C /my/file
Demonstrator Specific Features • A uniform file access infrastructure • Usable even in the presence of firewalls • Access to files across administrative domains • Each site can enforce its own rules • Site participation proportional to scalability • Essentially the bit-torrent social model • Increased opportunities for HEP analysis • A foundation for novel approaches to efficiency
Alice & Atlas Approach • Real-time placing of files at a site • Built on top of the File Residency Manager (FRM) • FRM - xrootdservice that controls file residency • Locally configured to handle events such as • A requested file is missing • A file is created or an existing file is modified • Disk space is getting full • Alice uses an “only when necessary” model • Atlas will use a “when analysis demands” model
Using FRM For File Placement 6 7 5 4 8 2 1 3 xrootd Data Server Read xfr request Insert xfr request Transfer Queue open(missing_file) Copy in file Launch xfr agent Tell client wait Transfer Agent Remote Storage Client xrootd frm_xfrd dq2get globus-url-copy gridFTP scp wget xrdcp etc Wakeup client Notify xrootd OK Configuration File all.export /atlas/atlasproddisk stage frm.xfr.copycmd in /opt/xrootd/bin/xrdcp \ –f –np root://globalredirector/$SRC $DST
FRM Even Works With Firewalls 1 2 3 4 5 Transfer Queue Big Bad Internet Copy in file xrootd Data Server Border Machine Firewall xrdcp Need to setup ssh identity keys ●The FRM needs one or more border machines Read xfr request sshxfr agent frm_xfrd ●The server transfer agent simply launches the real agent across the border xrootd • ● How it’s done • frm.xfr.copycmd in noallocsshbordermachine /opt/xrootd/bin/xrdcp –f \ root://globalredirector/$LFN root://mynode/$LFN?ofs.posc=1 Notify xrootd to run client Write xfr request
Storage-Starved Sites (CMS) • Provide direct access to missing files • This is basically a freebie of the system • However, latency issues exist • Naively, as much as 3x increase in wall-clock time • Can be as low as 5% depending on job’s CPU/IO ratio • The root team is aggressively working to reduce it • On the other hand. . . • May be better than not doing analysis at such sites • No analysis is essentially infinite latency
Security • xrootdsupports needed security models • Most notably grid certificates (GSI) • Human cost needs to be considered • Does read-only access require this level of security? • Considering that the data is unusable without a framework • Each deployment faces different issues • Alice uses light-weight internal security • Atlas will use server-to-server certificates • CMS will need to deploy the full grid infrastructure
Recent Developments • FS-Independent Extended Attribute Framework • Used to save file-specific information • Migration time, residency requirements, checksums • Shared-Everything File System Support • Optimize file discovery in distributed file systems • dCache, DPM, GPFS, HDFS, Lustre, proxy xrootd • Meta-Manager throttling • Configurable per-site query limits
Future Major Developments • Integrated checksums • Inboard computation, storage, and reporting • Outboard computation already supported • Specialized Meta-Manager • Allows many more subscriptions than today • Internal DNS caching and full IPV6 support • Automatic alerts • Part of message and logging restructuring
Conclusion • xrootdmates well with demo requirements • Can federated almost any file system • Gives a uniform view of massive amounts of data • Assuming per-experiment common logical namespace • Secure and firewall friendly • Ideal platform for adaptive caching systems • Completely open source under a BSD license • See more at http://xrootd.org/
Acknowledgements • Current Software Contributors • ATLAS: Doug Benjamin • CERN: FabrizioFurano, Lukasz Janyst, Andreas Peters,David Smith • Fermi/GLAST: Tony Johnson • FZK: ArtemTrunov • LBNL: Alex Sim, JunminGu, VijayaNatarajan(BeStMan team) • Root: Gerri Ganis, BeterandBellenet, FonsRademakers • OSG: Tim Cartwright, Tanya Levshina • SLAC: Andrew Hanushevsky,WilkoKroeger, Daniel Wang, Wei Yang • UNL: Brian Bockelman • UoC: Charles Waldman • Operational Collaborators • ANL, BNL, CERN, FZK, IN2P3, SLAC, UTA, UoC, UNL, UVIC, UWisc • US Department of Energy • Contract DE-AC02-76SF00515with Stanford University