350 likes | 463 Views
Scalable Trigger Processing. Discussion of publication by Eric N. Hanson et al Int Conf Data Engineering 1999 CS561. Motivation. Triggers popular for: Integrity constraint checking Alerting, logging, etc. Commercial database systems Limited triggering capabilities
E N D
Scalable Trigger Processing Discussion of publication by Eric N. Hanson et al Int Conf Data Engineering 1999 CS561
Motivation • Triggers popular for: • Integrity constraint checking • Alerting, logging, etc. • Commercial database systems • Limited triggering capabilities • 1 trigger/update-type on table; or at best 100. • But : Current technology doesn’t scale well • And, internet and web-based applications may need millions of triggers.
An Example Trigger • Example “stock ticker notification”: • Stock holding: 100*IBM • Query: Inform an agent whenever the price of the stock holding crosses $10,000 Create Trigger stock-watch from quotes q on update(q.price) when q.name=‘IBM’ and 100*q.price > 10,000 do raise event ThresholdCrossed(100*q.price). • Note: We may need 1,000 or millions of such triggers • Web interface may allow users to create such triggers
What Next? • Problem description • TriggerMan system architecture • Predicate index • Trigger processing
Problem Definition • Given: Relational DB, Trigger statements, Data Stream • Find: Triggers corresponding to each stream item • Objective: Scalable trigger processing system • Assumptions: • Number of distinct structures of trigger expressions is relatively small • All trigger expression structures small enough to fit in main memory
The Problem, once more. • Requires millions of triggers (on huge data). • Steps for trigger processing • Event monitoring • Condition evaluation • Executing triggered action • Response time for database operations critical !
Related Work ECA Model (not scalable) Indexing Parallel Processing [Gupt89,Hell98] AI [Forg82,Mira87] (smaller rule set) Range Predicates, Marking-Based [Hans96b, Ston90] (large memory, complicated storage)
Overall Driving Idea • If large number of triggers are created, then many have the same format. • Triggers share same expression signature except that parameters substituted. • Group predicates from trigger conditions based on expression signatures into equivalence classes • Store them in efficient main memory data structures
Components • TriggerMan Datablade (lives inside Informix) • Data Sources • Local/remote tables/streams; must capture updates and transmit to TriggerMan (place in a queue) • TriggerMan Client applications • Create /drop triggers, etc. • TriggerMan Driver • Periodically involve TmanTest() fn to perform condition testing and action execution. • TriggerMan console • Direct user interaction interface for trigger creation, system shutdown, etc.
TriggerMan Syntax • Trigger syntax create trigger <triggerName> [in setName] [optionalFlags] from fromList [on eventSpec] [when condition] [group by attributeList] [having groupCondition] do action
Example : Salary Increases Update Fred’s salary when Bob’s salary is updated create trigger updateFred from emp on update (emp.salary) when emp.name = ’Bob’ do execSQL ’update emp set salary=:NEW.emp.salary where emp.name=’’Fred’’’
Example : Real Estate Database “If new house added which is in neighborhood that salesperson Iris reprensents then notify her” House (hno,address,price,nno,spno) Salesperson (spno,name,phone) Represents (spno,nno) Neighborhood (nno,name,location) create trigger IrisHouseAlert on insert to house from salesperson s, house h, represents r when s.name = ‘Iris’ and s.spno=r.spno and r.nno=h.nno do raise event NewHouseInIrisNeighborhood(h.hno, h.address)
Trigger Condition Structure • Expression signature • Expression signature consists of • Data source ID • Operation code, e.g. insert, delete, etc. • Generalized Expression (parameterized) FROM: Data src: emp ON: Event : update WHEN: boolean exp. = Emp.name CONSTANT
Condition structure (contd) • Steps to obtain canonical representation of WHEN clause • Translate expression to CNF • Group each conjunct by data source they refer to • Selection Predicate will be of form : (C11 OR C12 OR ..) AND ... AND (Ck1 OR …), where each Cij refers to same tuple variable. • Each conjunct refers to zero, one, or more data sources • Group conjuncts by set of sources they refer to • If one data source, then selection predicate • If two data sources, then JOIN predicate
Triggers for stock ticker notification • Create triggerT1fromstock whenstock.ticker = ‘GOOG’ and stock.value < 500 donotify_person(P1) • Create triggerT2fromstock whenstock.ticker = ‘MSFT’ and stock.value < 30 donotify_person(P2) • Create triggerT3fromstock whenstock.ticker = ‘ORCL’ and stock.value < 20 donotify_person(P3) • Create triggerT4 from stock whenstock.ticker = ‘GOOG’ donotify_person(P4)
Expression Signature • Idea: Common structures in condition of triggers Expression Signature: • E1: stock.ticker = const1 and stock.value < const2 • Expression Signature: • E2: stock.ticker = const3 • Expression signature defines equivalence class of all instantiations of expression with different constants T1: stock.ticker = ‘GOOG’ and stock.value < 500 T2: stock.ticker = ‘MSFT’ and stock.value < 30 T3: stock.ticker = ‘ORCL’ and stock.value < 20 T4: stock.ticker = ‘GOOG’
What to do now • Only a few distinct expression signatures, build data structures to represent them explicitly (in memory) • Create constant tables that store all different constants, and link them to their expression signature
Main Structures • A-treat Network • Network for trigger condition testing • For a trigger to fire, all conditions must be true • Expression Signature • Common structure in a trigger • E1: stock.ticker = const1 and stock.value < const2 • Constant Tables • Constants for each expression signature
Root stock.value < const2 stock.ticker = const1 predicates Node 1 Node 2 alpha-node alpha-node A-Treat Network to represent a trigger • For each trigger condition • stock.ticker = const1 and stock.value < const2
Condition Testing • A-Treat network is a discrimination network for trigger condition testing. • For a predicate to be satisfied, all its conjuncts should be true. • This is checked using A-Treat network.
A-Treat network (Hanson 1992) Define rule SalesClerk If emp.sal>30,000 And emp.dno=dept.dno And dept.name=“sales” And emp.jno=job.jno And job.title=“clerk” Then Action
Expression Signature Table E1: stock.ticker = const1 and stock.value < const2 E2: stock.ticker = const3
Constant Tables • Tables of constants in trigger conditions Const_e1 Const_e2 T1: stock.ticker = ‘GOOG’ and stock.value < 500 T2: stock.ticker = ‘MSFT’ and stock.value < 30 T3: stock.ticker = ‘ORCL’ and stock.value < 20 T4: stock.ticker = ‘GOOG’
Tables • Primary tables • trigger_set (tsID, name, comments, creation_date, isEnabled) • Trigger (triggerID, tsID, name, comments, trigger_text, creation_date, isEnabled, …) • Trigger cache in main memory for recently accessed triggers.
Predicate Index • Tables • expression_signature(sigID, dataSrcID, signatureDesc, constTableName, constantSetSize, constantSetOrganization) • const_tableN(exprID, triggerID, nextNetworkNode, const1, … constK, restOfPredicate) • Root of predicate index linked to data source predicate indices • Each data source contains an expression signature list • Each expression signature links to its constant table. • Index expressions on most selective conjunct (rest on fly).
Predicate Index hash(src-ID) Goal: Given an update, identify all predicates that match it.
Processing Trigger Definition • Parse the trigger and validate it • Convert the when clause to conjunctive normal form • Group the conjuncts by the distinct sets of tuple variables they refer to • Form a trigger condition graph, that is, undirected graph with node for each tuple variable and edge for join predicates. • Build A-Treat network
Processing trigger definition (2) • For each selection predicate • If predicate with same signature not seen before • Add signature of predicate to list • And, add signature to expression_signature table • If signature has a constant placeholder in it, create a constant table for the signature. • Add constants • Else if predicate has constants, add a row to the constant table for the expression
Alternate Organizations • Storage for the expression signature’s equivalence class: • Main memory lists • Main memory index • Non-indexed database table • Indexed database table • For each expression signature, choose a structure depending on number of triggers. efficiency Scalability
Processing update descriptors • On getting an update descriptor (token) (data src ID, operator code, old/new tuple) • Locate data source predicate index from root of predicate index. • For each expression signature, find constant matching the token using index. • Check additional predicate clauses against the token. • When all predicate clauses of a trigger have matched, pin the trigger in main memory • Bring in A-treat network representing that trigger to process aremaining part of trigger, like join, etc. • If trigger condition is satisfied, execute action.
Processing an Update Update Stock (ticker=GOOG, value=495) Root Index of stock.ticker=const1 Other source Predicate index… E1 E2 E1: stock.ticker = const1 and stock.value < const2 const_e1 const_e2 const_e1
Concurrency • Better scalability even on single processor
Concurrency • Identified elements that can be parallelized • Token-level • Multiple tokens processed in parallel • Condition-level • Multiple selection conditions tested concurrently • Rule-action-level • Multiple rule actions fired at the same time • Data-level • Set of data values in the network processed in parallel
Conclusion : Overall Key Points • If a large number of triggers are created, many of them have almost the same format • Group triggers with same structure together into expression signature equivalence classes • Number of distinct signatures is small enough to fit into main memory (index) • Develop a selection predicate index structures • Architecture to build a scalable trigger system.