Extracting Events from Probabilistic Streams

Chris Re, Julie Letchner, Magdalena Balazinskaand Dan Suciu University of Washington Extracting Events from Probabilistic Streams

One Slide Overview • Motivating App: RFID Ecosystem • Tagged people, cups, books, keys, laptops, etc. • Event queries [Cayuga, SASE, Snoop] • Alert when anyone enters the coffee room • Two problems • Missed readings, read-rates in practice are low • Granularity mismatch, e.g. Office v. Antenna 41 • Instead, infer location from sensors • Propose, keep probs & query with PEEX+ PEEX+ (Probabilistic Event EXtraction) keeps data probabilistic to get higher P/R and is still efficient.

Motivating Apps • RFID apps • Diary and Active Calendar Application. • Alert if I go to a database meeting. • Supply chain • Alert if Mach 3 razors are being stolen • Many independent HMMs • Elder care [Intel,Patterson] • Alert if elder takes their medicine with water • Financial applications on predictive HMM • Alert if head-and-shoulders market

Outline • RFID to Probabilities via Particle Filters • PEEX+ query language • Extended Regular Query Algorithm • Experiments

The source of probabilities Connectivity Diagram 6th Floor in PAC Antennas Blue ring is ground truth Each orange particle is a guess of true location

PFs to a (prob) DB person At(tag,loc) To query Particle Filter output, query At

Semantics of the Model possible stream (worlds) Prob =0.4 * 0.6 * … NB: Markovian correlations OK At(tag,loc) “Joe enter O2 at t=8” Query Semantic: sum weight of all worlds where Q is true at time t Probability outside O2 (in H2,H3)

A hierarchy of PEEX+ queries • Regular Queries • Alert me when Joe goes to the coffee room • Extended Regular • Alert when anyone goes to the coffee room • Safe • Alert when anyone goes to the coffee room and a DB member follows them. • Hard Others (Simulation) • This line is sharp for some queries

Peex+ Queries • Fragment of Cayuga, queries define events. p in some location Same p in both Technical Point: Left-to-right eval,

Regulars and Extended Regular • Query is regular if no variable is shared between subgoals • Query is extended regular if any variable shared by two subgoals, is shared by all subgoals, i.e. templated regular query p is shared between subgoals

Wrinkle in the language:Filter v. Selection “Alert next time Joe is in 502 after he is in 501” Yes “Alert if the next place Joe is in after 501 is 502” No At Time

Why are ER queries hard? • Regular Queries ~ Regular Expressions • Mapping is non-trivial • similar to Cayuga [Demers et al. 06] • Queries have #P-combined complexity • Can encode mDNF as regular expression • Intuition: n-sized automaton leads to • Extended regular ~ 1 NFA per/person • k persons implies O(k)-size automaton • Exponential cost When ER, can avoid blowup

Algorithm for Regular Queries Overview Deterministic Algorithm • Compile a query q • NFA –like-thing in a language • Mapping events to subsets of • At runtime, at time t have events E • Create set of symbols at time t: • Process NFA on Focus on the compilation

Compile Select and Filter • Intuition: goal maps to two letters: • match (m) : matches filter • accept (a) : accepted by select Does not contain Final language and automaton are the same for both queries Does contain

The difference is the mapping Does not contain Final Does contain

Regular Queries w. Probabilities State at t+1 only depends on state at t and input at t+1 Probabilistic Algorithm • Compile a query q • NFA with transition in a language • Mapping events to subsets of • At time t have events E with probs • Create set of symbols at time t: • Process NFA on Stays the same Algorithm is constant in data, exponential in |Q| distribution on inputs distribution on states

Extension to Extended regular • “Alert when anyonein 501 and next step in 502” • If substitute for p, result is regular • Bindings use disjoint sets of tuples. • Algorithm: independent copies, multiply Depends on # distinct values (shared vars), not # of timesteps – can stream

Recap of Algorithms • Regular Queries • Compiled them to an NFA, then used image • Data complexity O(1) • Extended regular • Several regulars multiplied together • Depends on number of distinct people in the data, not number of time steps. • Markov Correlations: more arithmetic & state

PEEX+ Algorithms and Analysis • Compilation procedures • Safe plans. • More complicated based on algebra • cost grows with data (useful for archives) • Aggregates • Complexity: Can we do better? • For a restricted class, draw a crisp line • Minor variants of safe result in hardness

Experimental Setup • Quality Experiment • 52 objects, 352 locations, 10k sq. ft. • 2x30m trace with 10 m break in between • Participants marked down true locations • “Alert when anyone enters the Coffee Room” • Consider two Scenarios • Realtime (No correlations) v. MLE • Archived (Smoothing) v. Viterbi In practice, can smooth in a short time

Quality: Realtime • Declare an event “true”, if its Pr > threshold • Vary threshold 10% improvement in F1 Precision Recall F1

Quality: Archived • Smoothing v. Viterbi • PEEX keeps track of Markovian Correlations Approx ~30% gain in F1 Precision Recall F1

Performance

Conclusion • Showed PEEX+ • Processed output of several inference tasks • Applies more generally than just RFID • Quality (F1) gains by keeping probability • 50% from probs, 50% from correlations • Performance was usable in real-time • No indexing! • Preprint available on request

Future Work • Implementing archived stream indexing. • Aggregations in time • Aggressive indexing • Ranking? Top-K? • Shaper lines for complexity • Are there more streamable queries? • Richer language • Similar to linear style plans • What do people need? • Temporal Models! • Consistency

Correlations

Sequencing by example • Sequencing is parameterized [Cayuga] Semicolon means “the next event among those that match next goal” Semicolon is not “after” Time

Compilation by example • Each goal “corresponds” to two letters: • move (m) – the query should advance • accept (a) – the next subgoal accepts Does not contain Final Any other maps to empty set Does contain

Subtle example.. • What about: Does not contain Final Any other maps to empty set Does contain

Extracting Events from Probabilistic Streams

Extracting Events from Probabilistic Streams

Presentation Transcript

Extracting probabilistic severe weather guidance from convection-allowing model forecasts

Extracting Videos from YouTube

Extracting structure from reactions

Extracting fact from fiction

Extracting Opinions from Reviews

Extracting Energy from Wind

Extracting Detergents from Food

Extracting Tables from ERD

Extracting Value from Waste

Extracting Value from SOA

Hancock: A Language for Extracting Signatures from Data Streams

Extracting Schema From Data

Extracting LTAGs from Treebanks

Computer-Aided Techniques for Extracting Usability Information from User Interface Events

Self-supervised Probabilistic Methods for Extracting Facts from Text

LAHAR: Extracting Events from Probabilistic Streams

Extracting Metals from Ores

RPC, Events, Streams

Probabilistic Planning 2: Exogenous events

Extracting Worth From Waste

From Extracting to Abstracting

FROM STREAMS TO RIVERS