460 likes | 473 Views
Understand Event-Condition-Action (ECA) rules for XML and RDF systems, explore design issues, syntax, and system architecture for efficient performance analysis.
E N D
Event-Condition-Action Rule Languages over Semistructured Data George Papamarkos
Outline • What Event-Condition-Action (ECA) Rules are and what we can do with them? • ECA Rules for XML • ECA Langugage • System Architecture • Performance • ECA Rules for RDF • ECA Langugage • System Architecture • Performance
What is an ECA Rule? • An Event-Condition-Action rule performs actions in response to events, given that a stated condition holds • An event in a database system can be the insertion of a new tuple • The condition can be a query • The action may be a relational table update • This behaviour is called reactive functionality
What is an ECA Rule? • An ECA rule has the general syntax: on event if condition do action • The event part specifies when the rule is triggered • The condition part determines if the data are in a particular state, in which case the rule fires • The action part describes the actions to be performed if the rule fires.
Advantages of using ECA Rules • Allow applications reactive functionality to be defined and managed within a single rule base rather than being encoded in the programs • Use of a high-level declarative syntax and are thus amenable to analysis and optimisation techniques that cannot be applied if the functionality was encoded in the programming code
Outline • What Event-Condition-Action (ECA) Rules are and what we can do with them? • ECA Rules for XML • ECA Language • System Architecture • Performance • ECA Rules for RDF • ECA Langugage • System Architecture • Performance
ECA Rules for XML - Outline • Design issues of an ECA language for XML • The XTL Language • Implementing an XTL rules processing system • Performance Study
Design issues of an ECA language for XML • Comparing with relational triggers the following are the most important XML-specific issues on designing an ECA language for XML • Event Granularity: Specifying the granularity of where data has be modified is more complex and requires path expressions • Action Granularity: Action may affect an entire sub-document meaning that: • An action can trigger a different set of events • The analysis of which events are triggered by an action cannot be based on syntax alone
The XTL Language • The general syntax of XTL rules is: on event if condition do action • Fragments of XPath and XQuery are used to specify the event, condition and action parts of XTL rules. • XPath is used for selecting and matching fragments of XML • XQuery is used withing actions where it is needed to construct a new XML fragment
The XTL Language • Event Part • Syntax: (INSERT | DELETE) e where e is an XPath expression evaluating to a set of nodes. • A rule is triggered if this set of nodes includes any node in the XML fragment inserted or deleted • The system-defined variable $delta contains this set of nodes and is available for use in condition and action part of the rule
The XTL Language • Condition Part • The condition part is either the constant TRUE or one or more XPath expressions connected by the boolean connectives and, or, not. • Each of these expressions is evaluated on the data to tell whether the condition is TRUE or FALSE
The XTL Language • Action Part: • The action part is a sequece of one or more actions • Syntax: • INSERT r BELOW e (BEFORE | AFTER) q r is an XQuery expression specifying the XML fragment to be inserted, e is an XPath expression specifying the set of nodes under which the new fragment will be inserted, q is either a constant or an XPath qualifier specifying the set of nodes BEFORE or AFTER which the new nodes will be placed. • DELETE e e is an XPath expression specifing the set of nodes to be deleted.
XTL Language • Example rule: ON INSERT doc(‘s.xml’)/shares/share/day-info/prices/price IF $delta > $delta/../../high DO DELETE $delta/../high; INSERT <high>$delta/text()</high> BELOW $delta/../.. AFTER prices
XTL rule processing system - Architecture • ECA Rules Management: Validates and registers a rule to the Rule Base • ECA Rule Processing Engine: • Evaluates the Event and Condition Parts of the rules and schedules their actions for execution in the Action Schedule
System Performance • The system performance was studied by: • Developing an analytical model of the system • Performing experiments in the actual system • We have studied the effects of rule base indexes in system performance • Performance criterion: • Update response time: The mean time taken to complete all rule execution resulting from a single update submitted by a top-level update transaction
System Performance • Varying quantities: • Number of rules in the rule base • Experiments on the actual performed with three (3) different rule sets • XML data set: a fragment of DBLP database
System Performance - Analytical Model • The analytical model is a mathematical description of the system behaviour • Uses queue theory to simulate the transaction queues and database processing • Uses a set of simplifying assumptions to emulate the behaviour of some system parameters (e.g. triggering probability, transaction arrival rate etc.)
System Performance - Analytical Model • Response time increases non-linearly for as long as the system is stable (I.e. arrival rate in the transaction queue is less that the service rate) • After the stability point the transaction queue grows uncontrollably large, flooding the memory and slowing it down • Reasons: • Everything served by a single queue • High number of event query evaluations to find what is triggered
System Performance - Experimental Results • Difference with Analytical Model due to: • implementation choices (use of DOM etc.) and • the simplification assumptions made in the analytical model
System Performance - Indexing Rule Base • Better overall behaviour and scalability characteristics due to smaller number of rules that need to be checked for triggering • Smaller number of rules checked --> smaller number of queries need to be evaluated
Outline • What Event-Condition-Action (ECA) Rules are and what we can do with them? • ECA Rules for XML • ECA Langugage • System Architecture • Performance • ECA Rules for RDF • ECA • Performance Langugage • System Architecture
ECA Rules for RDF • The RDFTL ECA Language • Implementing RDFTL processing system in P2P environments • System performance
The RDFTL Language • We have designed the language from scratch specifically for RDF • General Syntax: • ON event IF condition DO action
The RDFTL Language • Event Part: • May contain let expressions of the form: LET $var := e • (INSERT | DELETE) e e is a path expression that evaluates on a set of RDF nodes. Catches the insertion or deletion of a node • (INSERT | DELETE) triple triple is an expression of the form (source,arc, target) specifying an RDF triple. Catches the insertion or deletion of a property in an RDF triple. • UPDATE upd_triple upd_triple is an expression of the form (source, arc, old_target->new_target). Catches the update of a property from one RDF node to another.
The RDFTL Language • Condition Part: • It is a boolean-valued expression • May consist of conjunctions, disjunctions and negations • May also contain let expressions • The $delta variable bound to the set of nodes or arcs modified and caught by the event part • Action Part: • A sequence of actions • Each action has similar syntax with the event part
RDFTL Rules in P2P Environments • Each peer (P) is supervised by a superpeer (SP) • The set of Ps supervised by an SP form a peergroup • At each SP there is an RDFTL processing engine installed • Each P or SP hosts a fragment of the RDF schema that may change due to updates • Hybrid fragmentation with possible replication
RDFTL Rules in P2P Environments • Ps notify the SPs for any updates on their local data • An ECA rule generated at one P or SP may be replicated, triggered, evaluated or executed in different sites in the net.
Distributed Rule Registration • A rule generated is sent from P to SP for validation and storage • From there it is sent to all other SPs • A replica of it will be stored also to those SPs that are e-relevant to the rule. I.e. the event part queries of a rule can be evaluated on SP • At each SP each rule is annotated with IDs of local peers that are e-, c- and a-relevant to the rule • c- and a- relevance have a similar meaning with e-relevance for the condition and action part
Distributed Rule Execution • Each SP manages its own rule execution schedule • Each execution schedule is a sequence of updates to be executed on the local peergroup • Once an update u occurs in P, SP is notified • SP determines if u may trigger any rule whose event part is annotated with P’s ID. • If yes, the event query is sent to P for evaluation • If the rule is triggered, its condition will be evaluated • If the condition is true SP will send each instance of r’s action part to local peers that are a-relevant to it
System Performance • The system performance was studied by: • Developing an analytical model of the system • Developing a system simulator and performing experiments with it • Performance criterion: • Update response time: The mean time taken to complete all rule execution resulting from a single update submitted by a top-level update transaction
System Performance • Cases studied with both the Analytical Model and the Simulator : • Random Network topology between SPs, with various data replication degree • HyperCup Network topology between SPs, with various data replication degree • Varying quantities: • Number of peergroups • Number of rules
System Performance Random topology - Replication 10% Analytical Model Simulation
System Performance • With random topology system does not scale well even with low replication and number of rules and peergroups • Exponential update response time • System becomes unusable due to high load
System Performance • HyperCup organises the SPs into hypercubes • HyperCup topology guarantees that: • Each peer receives a message only once • A total number of N-1 hops is necessary to broadcast a message to N peers • The more distant peers are reached after log2N hops
System Performance HyperCup - Replication 10% Analytical Model Simulation
System Performance HyperCup - Replication 90% Analytical Model Simulation
System Performance • With HyperCup we achieve higher performance for various replication levels and number of peergroups • System scales better • System remains stable and the update response time within acceptable values • Analytical with simulation approach show good agreement
Conclusions • We have described two ECA languages for XML and RDF • We have studied and defined the architectural characteristics for an ECA rule processing system in centralised and distributed environment • We have conducted a study to determine the system performance in both the centralised and distributed case
Conclusions • The whole study shows that ECA rules is a usable technology for various different application environments over semi-structured data