1 / 31

Perl 6 Update - PGE and Pugs

Perl 6 Update - PGE and Pugs. Dr. Patrick R. Michaud April 26, 2005. Rules and Grammars. Perl 6 completely redesigns the regular expression syntax Regular expressions are now "rules" Rules can call/embed other rules Groups of rules can be combined into Grammars. Current events in Perl 6.

EllenMixel
Download Presentation

Perl 6 Update - PGE and Pugs

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Perl 6 Update - PGE and Pugs Dr. Patrick R. Michaud April 26, 2005

  2. Rules and Grammars • Perl 6 completely redesigns the regular expression syntax • Regular expressions are now "rules" • Rules can call/embed other rules • Groups of rules can be combined into Grammars

  3. Current events in Perl 6 • Parrot 1.2 released • The Perl Foundation receives $25,000 for completion of Parrot milestones • New Parrot pumpking - Chip Salzenburg • New version of Parrot Grammar Engine (PGE / Perl 6 rules) to be released this week • Pugs - Autrijus Tang • Perl 6 test suite

  4. Pugs • Perl 6 compiler written in Haskell • Started by Autrijus Tang • Compiles directly to Haskell or to Parrot AST • Being used to develop Perl 6 tests and experiment with Perl 6 design • Available at http://pugscode.org • Discussion on perl6-compiler@perl.org mailing list

  5. Perl 6 rules / Parrot Grammar Engine • The heart of the Perl 6 compiler is the Perl/Parrot Grammar Engine (PGE) • Implements the Perl 6 rules syntax, compiles to Parrot code • Perl 6 rules compiler currently written in C • Bootstrap to Perl 6

  6. Steps to Perl 6 compiler • Finish PGE bootstrap in C • Parse p6 "rule" statements and grammars • Use p6 rules to define the Perl 6 grammar • P6 grammar can be used to generate Parrot abstract syntax trees from Perl 6 programs • Compile, (optimize), execute the abstract syntax tree to get working Perl 6 program • Use Perl 6 to rewrite the grammar engine in Perl 6 (faster)

  7. Current state of PGE • Handles concatenation, alternation, quantifiers, captures*, subpatterns, subrules • Capture semantics redefined in Dec 2004, still not final • To be added next • Character classes (note: Unicode) • Patterns containing scalars, arrays, hashes

  8. P6 rule syntax • Changes from perl 5 • No more trailing /e, /x, /s options • [...] denotes non-capturing groups • ^ and $ are beginning/end of string • ^^ and $$ are beginning/end of line • . matches any character, including newline • \n and \N match newline/non-newline • # marks a comment (to end of line) • Quantifiers are *, +, ?, and **{m..n}

  9. Character classes • [aeiou] changed to <[aeiou]> • [^0-9] now <-[0..9]> • Properties defined as • <alpha> • <digit> • <alnum> • Combine classes using +/- syntax: • <+<alpha>-[aeiou]>

  10. Subrules • Patterns are now called "rules" • Analogous to subroutines and closures • Like {...}, /.../ compiles into a "rule" subroutine • P6 rule statement allows named rules: rule ident / [<alpha>|_] \w* /; • Named rules can be easily used in other rules: m / <ident> \:= (.*) /; rule expr / <term> [ <[+-]> <term> ]* /;

  11. Interpolation • Variables no longer interpolate directly, thus / $var / matches the contents of $var literally, even if it contains rule metacharacters. (No \Q and \E) • To treat $var as a rule, use / <$var> / • Interpolated arrays match as an alternation: / @cmds / / [ @cmds[0] | @cmds[1] | @cmds[2] | ... ] /

  12. Interpolation, cont'd • Hashes match the keys of the hash, and the value of the hash is either • Executed if it is a closure • Treated as a subrule if it's a string or rule object • Succeeds if value is 1 • Fails for any other value • Useful for parsed languages rule expr / <term> [ %infixop <expr> ]? /

  13. < metasyntax > • The < ... > introduce various forms of metasyntax • A leading alphabetic character indicates a subrule or grammatical assertion <alpha> <expr> <before pattern> <after pattern> • A leading ! negates the match <!before pattern>

  14. < metasyntax > • Leading ' matches a literal string <'match this exactly (whitespace matters)'> • Leading " matches an interpolated string <"match $THIS exactly (whitespace matters)"> • Leading '+' or '-' are character classes /<-[a..z]> <-<alpha>>/

  15. < metacharacters > • Leading '(' indicates code assertion /(\d**{1..3}) <( $1 < 256 )>/ # (fail if $1 is not less than 256) • A $, @, or % indicates a variable subrule, where each value (or key) is a subrule to be matched <$myrule> <@cmds> <%commands>

  16. A cool and somewhat scary example %cmd{'^\d+'} = { say "You entered a number" }; %cmd{'^hello'} = { say "world" }; %cmd{'^print \s (.*)'} = { say $1; }; %cmd{'^exit'} = { exit() }; while =$*IN { /<%cmd>/ || say "Unrecognized command"; }

  17. Backtracking control • Single colons skip previous atom m/ \( <expr> [ , <expr> ]* : \) / (if we don't find closing paren, no point in trying to match fewer <expr>s) • Two colons break an alternation: m:w/ [ if :: <expr> <block> | for :: <list> <block> | loop :: <loop_controls>? <block> ] (once we've found "if", "for", or "loop", no point in trying the other branches of the alternation)

  18. Backtracking control • Three colons (:::) fail the current rule • The <commit> assertion fails the entire match (including any rules that called the current rule) • The <cut> assertion matches successfully, removes the matched portion of the string up to the <cut>, and if backtracked over fails the match entirely • Useful for throwing away successfully processed input when matching from an input stream • Like, say, when writing a compiler :-)

  19. Backslash • \L, \U, \Q, \E, \A, \z gone from rules • \n and \N match newline/not newline • \s matches any Unicode space • backreferences are gone, use $1, $2, $3 (non-interpolated) • Perl 6 allows defining custom backslash sequences for use in rules

  20. Closures • Anything in curlies is executed as a Perl 6 closure / (\w+) { say "Got $1"; } /

  21. Capture semantics • Captures are different in Perl 6 • The result of a match is a "match object" • If a match succeeds, the match object has: • Boolean value true • Numeric value 1 (except for global matches) • String value the matched substring • Array component is matched subpatterns • Hash component is matched subrules

  22. Subpattern captures • Part of a rule in parenthesis is a subpattern • Each subpattern produces its own match object /Scooby (dooby) (doo)!/ $1 $2 • Quantified subpatterns produce arrays of match objects: /Scooby (\w+ \s+)* (doo)!/ $1 $2 $1 is a (possibly empty) array of matches

  23. Non-capturing groups • Brackets do not capture, thus they don't result in a match object /Scooby [ (\w+ \s+)* (doo) ]!/ $1 $2 • Quantified brackets replace nested subpatterns with the last component matched: /Scooby [ (\w+ \s+)* (doo) ]+ !/ $1 $2

  24. Nested capturing subpatterns • Each capturing subpattern introduces a new lexical scope, with nested captures inside the new match object: /Scooby ( (\w+ \s+)* (doo) ) !/ $1[0] $1[1] <-------- $1 --------->

  25. Alternations • Alternations introduce a new lexical scope, thus subpatterns restart counting at zero for each alternative branch (unlike p5): $1 $2 m/ Scooby (dooby)* (doo)! | Yabba (dabba)* (doo) / $1 $2 This avoids lots of empty subpatterns when an alternation doesn't match.

  26. Subrules • Subrules capture into a hash keyed by the name of the subrule: rule ident / [<alpha>|_] \w* /; rule num / \d+ /; m/ <ident> \:= <num> /; places match objects into $<ident> and $<num>

  27. Quantified subrules • Like subpatterns, quantified subrules produce arrays of matches m:w / dir <file>* / produces matches in $<file>[0], $<file>[1], etc. • Nested parens in a subrule capture to the subrule's match object

  28. Named captures • Portions of a match can be captured directly into a match object without a subrule: m:w/ $<name> := \w+ , <$val> := \d+ / captures the first sequence of alphanumerics into $<name>, and digits following the comma into $<val>.

  29. Grammars • Rules can be packaged together into separate name spaces to form Grammars grammar Perl6 { rule ident { ... }; rule term { ... }; rule expr { ... }; }

  30. :parsetree • The :parsetree flag to a rule causes the grammar engine to keep all information about a match. • Thus, one can do something like $parse = ($source ~~ Perl6::program); to get the entire parsetree for a program (including comments)

  31. Questions?

More Related