170 likes | 181 Views
Learn about solving the challenge of synchronizing catalogues and propagating ACLs between storage elements. Efficient message exchange and robust plugin-based architecture enable seamless communication.
E N D
Catalogue synchronization&ACL propagation • Fabrizio Furano (CERN IT-GT-DMS) EMI INFSO-RI-261611 EMI INFSO-RI-261611
The problem • Various catalogues keep information that is related • E.g. LFC keeps info about the content of remote Storage Elements, each one with its own catalogue • A change in the permissions of a file in LFC is not automatically reflected by the peripheric catalogue • If a SE looses a file, the LFC does not know • If a new file is not correctly registered -> dark data • Keeping them in sync is a very hard problem • Namespace scanning for diffs is an expensive workaround EMI INFSO-RI-261611 EMI INFSO-RI-261611 11 Apr 2011 F.Furano - Catalogue Synchronization & ACL propagation
The idea • Make the various catalogues/SE able to talk to each other • In order to exchange messages that keep them synchronized • 2 directions: • Central Catalogue->SE (downstream) • e.g. to propagate changes in the permissions • SE->Central Catalogue (upstream) • e.g. to propagate info about lost and missing files EMI INFSO-RI-261611 EMI INFSO-RI-261611 11 Apr 2011 F.Furano - Catalogue Synchronization & ACL propagation
Communication Central Catalogue Other catalogue/SE e.g. ATLAS SE sends to the appropriate topics (e.g. “Lost”) SE Sends to the appropriate topics (e.g. “Changes”) SE2 SE1 SEn SE or exp. catalogue subscribes to the relevant topics (e.g. “Changes”) Subscribes to the relevant topics (e.g. “Lost”) Broker(s) Looking for good ways to reliably communicate and cooperate EMI INFSO-RI-261611 EMI INFSO-RI-261611 11 Apr 2011 F.Furano - Catalogue Synchronization & ACL propagation
Types of interactions • What can we do with this? • Fix inconsistencies as they are found • “SE1 apparently lost file X” • Prevent inconsistencies by sending messages when something happens • “File X has new access permissions” • “SE1 has a new file Y” • Ev. allow a central system to query the others to synchronize itself • “Who has file Z”? • “Do you still have file W?” EMI INFSO-RI-261611 EMI INFSO-RI-261611 11 Apr 2011 F.Furano - Catalogue Synchronization & ACL propagation
The architecture Uses a virtual destination, e.g. : Consumer.LFC1.SEMsg_upstream the broker queues messages for this endpoint if it disconnects momentarily. Uses a virtual destination, e.g. : Consumer.DPNS1.SEMsg_downstream the broker queues messages for this endpoint if it disconnects momentarily. Fix info! Central Catalogue Msg brokers Adapter Downstream topic “SEMsg_downstream” Chmod(sfn) Upstream topic “SEMsg_upstream” NotAvailable(sfn) [ FileCreated(sfn) ] Fix info! Adapter EMI INFSO-RI-261611 SE or other Catalogue A file can be N/A if: - it was requested to a DB that does not know it - OR if it was requested to a GridFTP that does not find it (trickier) EMI INFSO-RI-261611 11 Apr 2011 F.Furano - Catalogue Synchronization & ACL propagation
SEMsg • The current implementation is a component called SEMsg • Built to be robust, efficient and easy to integrate • Plugin-based (ev. with “null” plugins), loaded at runtime (dlopen) • A plugin that performs actions (in the catalogue) when a message comes • A plugin that performs SE(Catalogue)-specific actions when a message has to be sent through the API • 5 plugins available by now: LFC producer+consumer, DPM producer+consumer, Python consumer • Provides a configurable daemon that consumes and dispatches the notifications • Provides commands to manually send notifications • As well as a simple C/C++ API to send them • hides message composition and the security implementation • crafted to avoid bringing in complex build dependencies • The same tools are used for the LFC and DPM prototype, loading different sets of plugins • Hence, more sets of plugins can be added, to talk to other systems EMI INFSO-RI-261611 EMI INFSO-RI-261611 11 Apr 2011 F.Furano - Catalogue Synchronization & ACL propagation
Detail - SEMsg plugins Producer API, e.g. NotifyChmod(sfn) NotifyNotAvailable(sfn) Producer API, e.g. NotifyChmod(sfn) NotifyNotAvailable(sfn) Security Security Default producer plugin LFC producer plugin LFC consumer plugin DPM consumer plugin These implement some kind of realtime storage aggregation Must be robust, fast and low latency To be able to trust them and build an evolving thing Msg brokers LFC DPM EMI INFSO-RI-261611 EMI INFSO-RI-261611 11 Apr 2011 F.Furano - Catalogue Synchronization & ACL propagation
SEMsg • Encapsulates the message-related aspects into a clear subsystem • Simple to use and integrate in other systems • Completely asynchronous, multithreaded design • Does everything (in background) to keep the consumer connection UP • The API never hiccups in case of conn troubles/broker restart • The consumer may live as a daemon or be started within another daemon • LFC/DPM use the external daemon • Also provides commands to send the notifications • E.g. to manually notify that a set of files is not available • Natively supports the bulk operations defined in the protocol • Automatic creation of bulk notifications from the internal queues. • No need for weird APIs or complex implementations. EMI INFSO-RI-261611 EMI INFSO-RI-261611 11 Apr 2011 F.Furano - Catalogue Synchronization & ACL propagation
Security (1/2) • We can: • Guarantee the identity of the senders • Make sure that the content has not changed • Support SSL connections to the brokers (=comm encryption) • Everything is X509-based, described in detail in the docs • Performance is good (hundreds/s), space for improvement, probably not needed now • The bulk messages help a lot here • The messages are signed using SHA1 • We treat messages as we do with PGP for e-mails • Future: implement some form of whitelisting EMI INFSO-RI-261611 EMI INFSO-RI-261611 11 Apr 2011 F.Furano - Catalogue Synchronization & ACL propagation
Security (2/2) • The communicating entities are in general SEs, hence machines running with: • a host certificate • a service certificate • Now we have a way to be sure about the sender’s identity • The notifications are never destructive by construction • We might like to have a way to filter senders • Whitelisting seems a way to go • We need anyway some real-world experience EMI INFSO-RI-261611 EMI INFSO-RI-261611 11 Apr 2011 F.Furano - Catalogue Synchronization & ACL propagation
New entry: the Python plugin • One more consumer plugin, in the SEMsg distribution, with the DPM and LFC ones • Associates python funcs to SEMsg notifications • Fully configurable in the SEMsg config file and generic • e.g. The notification FileNotAvailable invokes the function ‘func1’ from the module ‘module1’ passing its content as parameters • Fast: invokes natively the Python C API • Benefits from the SEMsg structure, e.g. the security • The Python script only deals with the action to be performed • Tested with Python 2.4, 2.5, 2.6 EMI INFSO-RI-261611 EMI INFSO-RI-261611 11 Apr 2011 F.Furano - Catalogue Synchronization & ACL propagation
Upcoming: ATLAS DDM • The first usage of the Python plugin is going to be to invoke the ATLAS DDM tools • The “DDM problematic file catalogue” can be directly fed with notifications coming from SEMsg, i.e. coming from: • SEs, when they detect a missing file • Manual invokation of the CLI with lists of files • In practice, an instance of the SEMsgdaemon will sniff the notifications and forward them through the Python plugin EMI INFSO-RI-261611 EMI INFSO-RI-261611 11 Apr 2011 F.Furano - Catalogue Synchronization & ACL propagation
v1.1.0beta is available • The Dec demonstrator was OK • Now the libs/tools are available in the glite and EMI builds as a pre-release • All interested catalogues can use it • Documentation available in the TWiki, including the plugins • https://twiki.cern.ch/twiki/bin/view/EMI/EmiJra1Syncat • The instrumentation of LFC/DPM is finding its way to the SVN :-) • Installed in the DPM nightly testbed • Just a matter of installing a couple of RPMs from ETICS and setting a few options EMI INFSO-RI-261611 EMI INFSO-RI-261611 11 Apr 2011 F.Furano - Catalogue Synchronization & ACL propagation
What’s next • Sync with the other SE developers (STORM, dCache) • Sync with LCGDM and provide the LFC/DPM instrumentation • Start deploying the ATLAS instance and some real world test instances of LFC/DPM • Robustness tests + ev. fixes/additions in SEMsg/DPM/LFC • At each step, keep an eye on the applicability to the computing models • + ev. fixes/additions • Would be really useful to understand if the protocol is descriptive enough • Then, we’ll drop the ‘beta’ EMI INFSO-RI-261611 EMI INFSO-RI-261611 11 Apr 2011 F.Furano - Catalogue Synchronization & ACL propagation
Conclusions • Making catalogues and SEs interact seems a good way to attack the consistency problem • It’s a form of realtime interaction between SEs and catalogues • By definition, it won’t mathematically kill the inconsistencies, but will help making a much better system • Will be interesting to explore the possibilities of the technology and of the implementation • SEMsg is available as a pre-release until EMI-2 • Protocol and SEMsg documentation in the Wiki • https://twiki.cern.ch/twiki/bin/view/EMI/EmiJra1Syncat • Feedbacks are welcome • The messaging (test) infrastructure and the tools seem really OK EMI INFSO-RI-261611 EMI INFSO-RI-261611 11 Apr 2011 F.Furano - Catalogue Synchronization & ACL propagation
Thank you EMI INFSO-RI-261611 EMI is partially funded by the European Commission under Grant Agreement INFSO-RI-261611 EMI INFSO-RI-261611 11 Apr 2011 F.Furano - Catalogue Synchronization & ACL propagation