290 likes | 404 Views
Blocking, Monotonicity, and Turing Completeness in a Database Language for Sequences and Streams. Yan-Nei Law, Haixun Wang, Carlo Zaniolo 12/06/2002. Outline. Database Application in ATLaS Turing Completeness Data Streams Blocking and Monotonicity. User Defined Aggregates (UDAs).
E N D
Blocking, Monotonicity, and Turing Completeness in a Database Language for Sequences and Streams Yan-Nei Law, Haixun Wang, Carlo Zaniolo 12/06/2002
Outline • Database Application in ATLaS • Turing Completeness • Data Streams • Blocking and Monotonicity
User Defined Aggregates (UDAs) • Important for decision support, stream queries and other advanced database applications. • UDA consists of 3 parts: • INITIALIZE • ITERATE • TERMINATE
Standard aggregate average AGGREGATE myavg(Next Int) : Real { TABLE state(tsum Int, cnt Int); INITIALIZE : { INSERT INTO state VALUES (Next, 1); } ITERATE : { UPDATE state SET tsum=tsum+Next, cnt=cnt+1; } TERMINATE : { INSERT INTO RETURN SELECT tsum/cnt FROM state; } }
Online aggregation • Standard average aggregate returns results at TERMINATE state. • Online aggregate: Get results before input all the tuples • e.g. return the average for every 200 input tuples. • RETURN statements appear in ITERATE instead of TERMINATE.
Online averages AGGREGATE online_avg(Next Int) : Real { TABLE state(tsum Int, cnt Int); INITIALIZE : { INSERT INTO state VALUES (Next, 1); } ITERATE: { UPDATE state SET tsum=tsum+Next, cnt=cnt+1; INSERT INTO RETURN SELECT sum/cnt FROM state WHERE cnt % 200 = 0; } TERMINATE : { } }
Table from external source Access tables stored in external source 'C:\mydb\employees'. TABLE employee(Eno Int, Name Char(18), Sal Real, Dept Char(6)) SOURCE 'C:\mydb\employees'; SELECT Sex, online_avg(Sal) FROM employee WHERE Dept=1024 GROUP BY Sex;
Calling other UDAs • UDAs can call other UDAs, including recursively calling themselves. • Example: Computation of Transitive Closure (the graph contains no directed cycle). • Find all the nodes that is reachable from ‘000’.
Transitive Closure TABLE dgraph(start Char(10), end Char(10)) SOURCE mydb; AGGREGATE reachable(Rnode Char(10)) : Char(10) { INITIALIZE: ITERATE: { INSERT INTO RETURN VALUES (Rnode) INSERT INTO RETURN SELECT reachable(end) FROM dgraph WHERE start=Rnode; } } SELECT reachable(dgraph.end) FROM dgraph WHERE dgraph.start='000';
Turing Completeness • Turing Completeness is not a trivial property for a language. • SQL is not Turing Complete. • A Turing Machine is a tuple M=(Q,,,,q0,!,F). • TM is determined by 4 elements • Transition map ; • Accepting states F; • Input tape ; • Initial state q0.
Turing Machine for L3eq = {anbncn|n≥1} • Transition map: • Input tape: aaabbbccc • Accepting state: {z} • Initial state: p
Turing Machine for L3eq INSERT INTO transition VALUES ('p','a',1,'q','x'), ('q','a',1,'q','a'), ('t','a',-1,'u','a'), ('u','a',-1,'u','a'), ('q','b',1,'r','x'), ('r','b',1,'r','b'), ('t','b',-1,'t','b'), ('r','c',1,'s','x'), ('s','c',1,'s','c'), ('t','c',-1,'t','c'), ('q','x',1,'q','x'), ('r','x',1,'r','x'), ('t','x',-1,'t','x'), ('u','x',1,'p','x'), ('v','x',1,'v','x'), ('s','!',-1,'t','!'), ('t','!',1,'v','x'), ('v','!',0,'z','x'); INSERT INTO accept VALUES ('z'); INSERT INTO tape VALUES ('a',0),('a',1), ('a',2), ('b',3), ('b',4), ('b',5), ('c',6), ('c',7),('c',8);
Turing Machine in ATLaS(1) • For each iteration: • pass the current state, current symbol, position on the tape to UDA called turing. • If is defined, obtain the next state, next symbol and movement of the head. • current state next state; • current symbol next symbol; • next position = current position+movement. If is not defined, TM halts. check whether current state is an accepting state.
Turing Machine in ATLaS(2) TABLE current(stat Char(1), symbol Char(1), pos Int); TABLE tape(symbol Char(1), pos Int); TABLE transition(curstate Char(1), cursymbol Char(1), move int, nextstate Char(1), nextsymbol Char(1)); TABLE accept(accept Char(1));
Turing Machine in ATLaS(3) AGGREGATE turing(stat Char(1), symbol Char(1), curpos Int) : Int { INITIALIZE: ITERATE: { /*If TM halts, return 1 or 0 (accept or reject)*/ INSERT INTO RETURN SELECT R.C FROM (SELECT count(accept) C FROM accept A WHERE A.accept = stat) R WHERE NOT EXISTS ( SELECT * FROM transition T WHERE stat = T.curstate AND symbol = T.cursymbol);
Turing Machine in ATLaS(4) /* write tape */ DELETE FROM tape WHERE pos = curpos; INSERT INTO tape SELECT T.nextsymbol, curpos FROM transition T WHERE T.curstate = stat AND T.cursymbol = symbol;
Turing Machine in ATLaS(5) /* add blank symbol if necessary */ INSERT INTO tape SELECT '!', curpos + T.move FROM transition T WHERE T.curstate = stat AND T.cursymbol = symbol AND NOT EXISTS ( SELECT * FROM tape WHERE pos = curpos +T.move);
Turing Machine in ATLaS(6) /* move head to the next position */ INSERT INTO current SELECT T.nextstate, A.symbol, A.pos FROM tape A, transition T WHERE T.curstate = stat AND T.cursymbol = symbol) AND A.pos=curpos+T.move; } }
Turing Machine in ATLaS(7) INSERT INTO current SELECT 'p', A.symbol, 0 FROM tape A WHERE A.pos = 0; SELECT turing(stat, symbol, pos) FROM current;
Data Streams(1) • For stream processing and continuous queries, results must be returned promptly, without waiting for future input. • Non-blocking operators are suitable. • No terminate state is needed.
Data Streams(2) • ATLaS UDAs is a powerful tool for approximate aggregates, synopses, and adaptive data mining. • Stream is a source of data instead of table. • Data from a stream are time-stamped. • Example: Compute the average of long distance calls placed by each customer.
Data from a stream STREAM calls(customer_id Int, type Char(6), minutes Int, Tstamp: Timestamp) SOURCE mystream; SELECT periodic_avg(S.minutes) FROM Calls S WHERE S.type = 'Long Distance‘ GROUP BY S.customer_id
Average of the last 200 inputs AGGREGATE periodic_avg(Next Int, Time: Timestamp) : (Real, Timestamp) { TABLE state(tsum Int, cnt Int); INITIALIZE : { INSERT INTO state VALUES (Next, 1); } ITERATE : { UPDATE state SET tsum=tsum+Next, cnt=cnt+1; INSERT INTO RETURN SELECT tsum/cnt, Time FROM state WHERE cnt % 200 = 0; UPDATE state SET tsum=0, cnt=0 WHERE cnt % 200 = 0; } }
Average on a five-minute window(1) AGGREGATE window_avg(Next Int, Time Timestamp): (Real,Timestamp) { TABLE state(tsum Int); TABLE memo(mNext Int, mTime Timestamp); INITIALIZE : { INSERT INTO state VALUES (Next); INSERT INTO memo VALUES (Next, Time); }
Average on a five-minute window(2) ITERATE: { INSERT INTO memo VALUES (Next, Time); UPDATE state SET tsum=tsum+Next - ( SELECT sum(mNext) FROM memo WHERE mTime+ 5 MINUTES < Time); DELETE * FROM memo WHERE mTime+ 5 MINUTES < Time; INSERT INTO RETURN SELECT tsum, Time FROM state; } }
Blocking and Monotonicity • A blocking query operator is a query operator that is unable to produce the first tuple of the output until it has seen the entire input • e.g. myavg in Example 1 • A non-blocking query operator is one that produces all the tuples of the output before it has detected the end of the input. • e.g. online_avg in Example 2
Definition • Sequence of Length n: S=[t1,…,tn]. [ ] has length 0. • Presequence: Sk=[t1,…tk], 0<k≤n. • S L: Lk = S for some k. ( is partial order.) • For an operator G, Gj(S) is the cumulative output produced up to step j by G. • blocking: Gj(S) = [ ] for j<n. • non-blocking: Gj(S)= Gj(Sj)=G(Sj). • partial blocking: [ ] < Gj(S) < G(Sj).
ATLaS UDA is • blocking: returns answers only in TERMINATE state; • non-blocking: returns answers only in INITIALIZE or ITERATE states; • partially blocking: returns answers in TERMINATE and other states.
A function F can be computed using a non-blocking operator if F is monotonic wrt . • ATLaS is complete wrt non-blocking computations. • Every monotonic function can be computed by an ATLaS program with no TERMINATE state.