200 likes | 321 Views
PQLite : An Overly S implistic Q uery L anguage for D ata Provenance. Michael {Leece, Sevilla}. mleece@soe.ucsc.edu msevilla@soe.ucsc.edu CMPS203 Final Project University of California, Santa Cruz Jack Baskin School of Engineering. Overview. Introduction Current Work
E N D
PQLite:An Overly Simplistic Query Language for Data Provenance Michael {Leece, Sevilla} mleece@soe.ucsc.edu msevilla@soe.ucsc.edu CMPS203 Final Project University of California, Santa Cruz Jack Baskin School of Engineering
Overview • Introduction • Current Work • Design and Implementation • Conclusions
Terminology Terminology • Provenance: history + ancestry of an object [1] • Processes • Data • Provenance Aware Storage (PASS) • Transparent collection • PQL: Path Query Language • Useful for provenance Ancestry Graph
Applications Applications • Security • File System Search • The Cloud • New Hierarchical File Systems • Yan Li’s Photo Album
PQL Broken PQL Broken • Obtained PASSv2 • Ran PQL query on provenance database • Infinite loops • {}
PQL Broken PQL Broken • Obtained PASSv2 • Ran PQL query on provenance database • Infinite loops • {} • “The problem with PQL and Sage is that the implementation… is really slow, and it’s perhaps too easy to generate PQL queries that do not return any data.” • PASS Team
PQL Undocumented PQL Undocumented
Overview Waldo Database Dump Overview App2 App1 User Space PASSv2 Modules Kernel Space Lasagna FS BDB .twig VFS
Use Case Use Case • What we have • [ P ] 1.0 INODE 4 INODE 12[ P ] 1.0 NAME 9 "/file.txt"[ P ] 1.0 TYPE 4 "FILE"[ P ] 1.0 FREEZETIME 8 TIME 1329510432.493134083[ P ] 1.0 FREEZETIME 8 TIME 1329510618.420311721[ P ] 1.0 FREEZETIME 8 TIME 1329510676.040716382[AP ] 1.1 INPUT 12 --> 2.1[AP ] 1.2 INPUT 12 --> 8.1[AP ] 1.3 INPUT 12 --> 16.2[ PT] 2.0 ARGV 4 [1]"cat"[ PT] 2.0 ENV 64 [2]"SHELL=/bin/bash" [3]"TERM=xterm" [4]"XDG_SESSION_COOKIE=06c3f2775eb071081dfacb984bf6c364-1329508695.722050-291519720" [5]"USER=root" [6]"LS_COLORS=no=00:fi=00:di=01;34:ln=01;36:pi=40;33:so=01;35:do=01;35:bd=40;33;01:cd=40;33;01:or=40;31;01:su=37;41:sg=30;43:tw=30;42:ow=34;42:st=37;44:ex=01;32:*.tar=01;31:*.tgz=01;31:*.svgz=01;31:*.arj=01;31:*.taz=01;31:*.lzh=01;31:*.lzma=01;31:*.zip=01;31:*.z=01;31:*.Z=01;31:*.dz=01;31:*.gz=01;31:*.bz2=01;31:*.bz=01;31:*.tbz2=01;31:*.tz=01;31:*.deb=01;31:*.rpm=01;31:*.jar=01;31:*.rar=01;31:*.ace=01;31:*.zoo=01;31:*.cpio=01;31:*.7z=01;31:*.rz=01;31:*.jpg=01;35:*.jpeg=01;35:*.gif=01;35:*.bmp=01;35:*.pbm=01;35:*.pgm=01;35:*.ppm=01;35:*.tga=01;35:*.xbm=01;35:*.xpm=01;35:*.tif=01;35:*.tiff=01;35:*.png=01;35:*.svg=01;35:*.mng=01;35:*.pcx=01;35:*.mov=01;35:*.mpg=01;35:*.mpeg=01;35:*.m2v=01;35:*.mkv=01;35:*.ogm=01;35:*.mp4=01;35:*.m4v=01;35:*.mp4v=01;35:*.vob=01;35:*.qt=01;35:*.nuv=01;35:*.wmv=01;35:*.asf=01;35:*.rm=01;35:*.rmvb=01;35:*.flc=01;35:*.avi=01;35:*.fli=01;35:*.gl=01;35:*.dl=01;35:*.xcf=01;35:*.xwd=01;35:*.yuv=01;35:*.aac=00;36:*.au=00;36:*.flac=00;36:*.mid=00;36:*.midi=00;36:*.mka=00;36:*.mp3=00;36:*.mpc=00;36:*.ogg=00;36:*.ra=00;36:*.wav=00;36:" [7]"MAIL=/var/mail/root" [8]"PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin" [9]"PWD=/test" [10]"LANG=en_US.UTF-8" [11]"SHLVL=1" [12]"HOME=/root" [13]"LOGNAME=root" [14]"LESSOPEN=| /usr/bin/lesspipe %s" [15]"LESSCLOSE=/usr/bin/lesspipe %s %s" [16]"_=/bin/cat" [17]"OLDPWD=/"[ ] 2.0 EXECTIME 8 TIME 1329510428.104272662[ P ] 2.0 TYPE 4 "PROC"[ ] 2.0 PID 4 INT 13739[ P ] 2.0 NAME 8 "/bin/cat"[A ] 2.0 FORKPARENT 12 --> 14762.0[ P ] 2.0 FREEZETIME 8 TIME 1329510428.104272662 • What we want • A list of files or processes that are one-step ancestors of “/file.txt”
Use Case Use Case (cont.) Query: SELECT r FROM Graph AS r WHERE r.child = "/file.txt" Select "r" [From [Alias "Graph" "r"]] [Duo Eq (PathType "r" ["child"]) (Str "/file.txt")] Abstract Syntax Tree Query Parser Evaluator Waldo Database Dump 1 -> file.txt 2 -> jazz.jpg 3 -> bacon.txt … Label Map Ancestry Graph Response: [(MyNode "/usr/bin/pico" 1,1,[2]), (MyNode "/usr/bin/vi” 2,3,[17,16,15]), (MyNode "/bin/cat" 1,4,[0])] Dump Parser
Use Case Use Case (cont.) Query: SELECT r FROM Graph AS r WHERE r.child = "/file.txt" Select "r" [From [Alias "Graph" "r"]] [Duo Eq (PathType "r" ["child"]) (Str "/file.txt")] Abstract Syntax Tree Query Parser Evaluator Waldo Database Dump 1 -> file.txt 2 -> jazz.jpg 3 -> bacon.txt … Label Map Ancestry Graph Response: [(MyNode "/usr/bin/pico" 1,1,[2]), (MyNode "/usr/bin/vi” 2,3,[17,16,15]), (MyNode "/bin/cat" 1,4,[0])] Dump Parser
Use Case Use Case (cont.) Query: SELECT r FROM Graph AS r WHERE r.child = "/file.txt" Select "r" [From [Alias "Graph" "r"]] [Duo Eq (PathType "r" ["child"]) (Str "/file.txt")] Abstract Syntax Tree Query Parser Evaluator Waldo Database Dump 1 -> file.txt 2 -> jazz.jpg 3 -> bacon.txt … Label Map Ancestry Graph Response: [(MyNode "/usr/bin/pico" 1,1,[2]), (MyNode "/usr/bin/vi” 2,3,[17,16,15]), (MyNode "/bin/cat" 1,4,[0])] Dump Parser
Language Specification Select Statement
Language Specification Select Statement
Language Specification Expression
Language Specification Expression
Use Case Use Case (cont.) Query: SELECT r FROM Graph AS r WHERE r.child = "/file.txt" Select "r" [From [Alias "Graph" "r"]] [Duo Eq (PathType "r" ["child"]) (Str "/file.txt")] Abstract Syntax Tree Query Parser Evaluator Waldo Database Dump 1 -> file.txt 2 -> jazz.jpg 3 -> bacon.txt … Label Map Ancestry Graph Response: [(MyNode "/usr/bin/pico" 1,1,[2]), (MyNode "/usr/bin/vi” 2,3,[17,16,15]), (MyNode "/bin/cat" 1,4,[0])] Dump Parser
What We Did Well What we did well • Functional • It works. (PQLite > PQL) • Easy to use • Intuitive (SQL-like) way of querying a provenance graph • Getting stuff we care about
Lessons Learned Lessons Learned • Infinite recursion in parsing • Left recursion in a recursive descent parser • Refined syntax • Began coding too soon • Monads are useful • IO(), Maybe, State, Parsec
References References • Margo Seltzer, Kiran-Kumar Muniswamy-Reddy, David A. Holland, Uri Braun, and Jonathan Ledlie. Provenance-Aware Storage Systems. (PDF) Harvard University Computer Science Technical Report TR-18-05, July 2005 • Stephanie Jones, Christina Strong, Darrell D. E. Long, Ethan L. Miller, Tracking Emigrant Data via Transient Provenance, Proceedings of the 3rd USENIX Workshop on the Theory and Practice of Provenance (TaPP '11), June 2011. • Kiran-Kumar Muniswamy-Reddy, Uri Braun, David A. Holland, Peter Macko, Diana Maclean, Daniel Margo, Margo Seltzer, and Robin Smogor.Layering in Provenance Systems. In proceedings of the 2009 USENIX Annual Technical Conference, San Diego, CA, June 2009. • PQL Language Guide and Reference