100 likes | 234 Views
Lavoisier. Motivations. data sources provided by many partners heterogeneity of used technologies objectives reduce complexity / increase maintainability factorize development efforts enable accessing data efficiently reliably. What is Lavoisier ?.
E N D
Motivations • data sources provided by many partners • heterogeneity of used technologies • objectives • reduce complexity / increase maintainability • factorize development efforts • enable accessing data • efficiently • reliably
What is Lavoisier ? • an extensible service for providing an unified view of data collected from multiple heterogeneous sources • data is represented in XML • query language is XSLT • easy to write cross data sources queries
Data View XML XML Data View XML user (e.g. CIC-portal) Data View developer administrator plug-ins Overview heterogeneous data sources data view manager service flat file flat file plugin startup notified GetRP RP RP GetMultipleRP RP SQL plugin RP RDBMS about to expire QueryRP getDataView Data View WS SetRP WS plugin XSLT plugin refreshed processXSL config Engine Data View Legend trigger existant operations
reusable plug-ins RDBMS LDAP Web Services (WSRF) Run command line Local XML file Remote XML file http, https HTML file (in progress) Flat file (in progress) specific plug-ins GGUS get server public cert. any java code that build an XML document other plug-ins index of data views status of data views XSL transform XML filter (SAX-based) Role: plug-in developer
configure plug-ins validation of data view retry rules for each Exception java.lang.Exception is the catch-all exception argument values static values extracted from another data view (xpath+regex) plug-in specific config configure data views cache management, depending on characteristics and usage profile of the data source total amount of data, update frequency, effective latency the generated view amount of generated data, time-to-live, tolerable latency Role: administrator (1/2)
configure data views cache management cache type in-memory on-disk no cache cache validity period in case of data source unavailability set of rules triggering cache update startup time-based notification view access write read cache expiration cache dependencies with or without enforcement of data views consistency Role: administrator (2/2) combination Reload configuration on the fly (restart only plug-ins with modified configuration) => minimal service interruption
Role: user • query data views • through WSRF standard commands • with any WSRF-compliant client (e.g. Globus 4) • through server-side XSLT processing • only the result of the processing is transferred • the result can be • XML • HTML • text
Conclusion • Maintainability • thanks to unified view of data • Factorization of efforts • thanks to separation of roles • plug-in developer, administrator, user • Data access • efficiency • thanks to caching of data views • robustness • by keeping previous data view if data source is not available
(Some of the) perspectives • Move to Apache Maven (improve build process) • Schedule plug-ins execution according to memory consumption • Add new configuration features • rules to ignore some partial failures, new triggers… • Develop new plug-ins • XQuery, rewrite remote XML file plug-in (with JSAGA) • …