130 likes | 215 Views
YOLT. Y uan Zheng O mar Ahmed L ukas Dudkowski T . Mark Kuba. Overview of YOLT. Simple scripting language Easy for coding and maintenance. Regular expression support := and @ “Web-scraping” uses Natural Language Processing Generating RSS Feeds
E N D
YOLT Yuan Zheng Omar Ahmed Lukas Dudkowski T. Mark Kuba
Overview of YOLT • Simple scripting language • Easy for coding and maintenance. • Regular expression support • := and @ • “Web-scraping” uses • Natural Language Processing • Generating RSS Feeds • Reformatting HTML for other uses (XML,etc)
Semantics • YOLT Semantic checker is extremely simple. It serves a few main tasks: • Make sure that functions are declared properly, i.e. function declarations match functions, and function calls match the declarations • Make sure that variables are initialized before they are used (or, in some cases, un-initialized) • (redundant) Make sure that the tree is properly formed (i.e. make sure that an if-then-else node has exactly three children, etc) *note*: there was once basic type-checking, but no longer.
Semantics Lessons Learned • It is very easy to do too much in semantic checking • Either there are types, or no types (NO MIDDLE GROUND) • Scripting languages are an enormous relief to a semantic checker--they take away the biggest hassles • The tree walker should know EXACTLY what the structure of the AST will look like and cannot make ANY assumptions--things, as evident, can break down when you least expect them to.
Code Generation • Written in Java • Input: correct AST • Output: Perl program AST Perl Program Code generator Java
Implementation • Walk AST • According to the information of the node, generate code or go down to the child node e.g.: := $a http://www.columbia.edu Go down to the tree at node “:=“ Generate code at node “$a” and “http://www.columbia.edu”
Implementation (tricks) • The httpget := • invoke UNIX system call “wget” to download the web page into a temp file • Read the file line by line and store them into an perl array • Invoke another UNIX system call “rm” to remove the temp file • Keep the web address in an perl scalar • Scalar and arrays use same syntax • Compiler (code generator) “guesses” whether the variable is a scalar or an array • Arrays can only appears in certain places (e.g.. Foreach)
Documentation and Testing Lexer/Parser - Semantic Checker Log result: Good should be good. Bad should be bad. Test Cases • Good • Bad Lexer/Parser Semantic Checker Diff Reference File: What I think it should produce
Integration Testing Trying little YOLT programs to see functionality, code generation, etc. Working out bugs in implementation & design. Example: Generated Perl • Goal: display any comics that have the word hamster in the URL of www.toothpastefordinner.com, Summer 2002 archive. $toothpaste_home ="http://www.toothpastefordinner.com/"; system('wget -q -O - http://www.toothpastefordinner.com/archives-sum02.php > toothpaste.txt'); open INFILE, "toothpaste.txt"; @toothpaste=<INFILE>; close INFILE; system ('rm toothpaste.txt'); $toothpaste = "http://www.toothpastefordinner.com/archives-sum02.php"; $tags ="<a href=\"(.*)\">.*hamster.*</a>"; @tmp1=(); foreach ( @toothpaste) { if ($_=~m/($tags)/i){ push @tmp1, $2} } @elements = @tmp1; foreach $x ( @elements ) { print "<img src=\"".$toothpaste_home.$x."\""."><br>"; print "\n"; } Yolt Program begin $toothpaste_home="http://www.toothpastefordinner.com/"; $toothpaste:="http://www.toothpastefordinner.com/archives-sum02.php"; $tags="<a href=\"(.*)\">.*hamster.*</a>"; $elements = $tags @ $toothpaste; foreach $x in $elements { echo "<img src=\"".$toothpaste_home.$x."\""."><br>"; echo "\n"; } end Resultant HTML <img src="http://www.toothpastefordinner.com/072802/hamster-table-tennis.gif"><br> <img src="http://www.toothpastefordinner.com/072502/even-hamsters.gif"><br> <img src="http://www.toothpastefordinner.com/060602/hamsters-are-the-best.gif"><br>
The Result The source site The end result
Lessons Learned • Develop and test incrementally • There are ALWAYS bugs, you just haven’t found them yet • CLIC is not designed to be lived in