350 likes | 415 Views
A discussion on Perl 6 Rules and Grammars including challenges, limitations, and implementations, with a working example.
E N D
sub ScanDirectory { my $workDir = shift; my @files; opendir( DIRHND, $workDir) or die "Unable to open $workDir: $!\n"; while( my $file = readdir( DIRHND ) ) { next if( $file eq "." || $file eq ".." ); if( -d $workDir . $file ) { push( @files, ScanDirectory( $workDir . $file ) ); next; } push( @files, "$file\n" ); } closedir( DIRHND ); return @files; }my $dir = shift;my @dirContent = ScanDirectory( $dir );print @dirContent;
kw.pm Perl 6 Rules and Grammars
Introduction Talk in two parts: A quick look and evaluation of what exists to use now. A quick look at what is proposed, and what is good about it. Then beer. perl 6 rules and grammars
State of m:i/Onion/ Here is what exists currently in terms of implementation: Perl6::Rules – a CPAN module that implements a subset of the spec. The Parrot/Perl6 implementation has 0 so far on Pattern matching. Pugs is in the same state. perl 6 rules and grammars
Perl6::Rules Perl6::Rules was created and published to CPAN in April 2004 by Damian Conway. It was originally part of the Perl6 bootstrapping plan: They would implement Ponie and Parrot, and then use Perl6::Rules to compile the initial Perl6. Then Perl6 could be self-hosting. perl 6 rules and grammars
Perl6::Rules But Perl6::Rules simply isn’t very good. At least in my opinion. It is rich on Damian-esque Source-Filter magic, but a little low on… the set of things not Source-Filter magic. First of all, getting it going on Cygwin took me some time: perl 6 rules and grammars
Perl6::Rules I ended up with stuff like this: “Failed 12/41 test scripts, 70.73% okay. -124/2401 subtests failed, 105.16% okay.” “test program seems to have generated a core” “Fatal error in one or more Perl 6 rules panic: regfree data code 'ÿ' during global destruction.” perl 6 rules and grammars
Perl6::Rules I know cygwin is a less popular, less robust place to do business, but 6 cores seemed a little high. However, Testers shows that it also fails on Solaris, BSD and Linux. I also disliked that it depended on YAML, but not in a way that CPAN.pm understood. Lastly, it was slow. Deadly slow. perl 6 rules and grammars
Perl6::Rules The other-other issue was that it doesn’t completely implement Rules, nor consistent with the current spec. All that said, I was able to play around with the module, and get a bit of a feel for Rules. My suspicion is that Perl6 will bootstrap with Pugs. perl 6 rules and grammars
Working Example __DATA__ # id: centre points (relative to 1x1 unit square) # centre indicates squares that are rotationally # symetrical # ------------------------------------------------- 1: (centre) 0, 0; 1, 0; 1, 1; 0, 1 2: 0, 1; 1, 1; 1, 0 3: 0, 0; 0.5, 1; 1, 0 4: (centre) 0, 0.5; 0.5, 1; 1, 0.5; 0.5, 0 5: 0.5, 0; 0.5, 1; 1, 1; 1, 0 6: 0, 0; 0, 0.5; 1, 1; 0.5, 0 7: 0.25, 0.5; 0.5, 0; 0, 0; 0.5, 0.99; \ 1, 0; 0.5, 0; 0.75, 0.5; 0.25, 0.5 # …etc… perl 6 rules and grammars
Working Example grammar NineBlock { # meta bits rule continuation { \\ \n <ws>* }; rule blankline { ^^ <ws>* $$ }; rule comment { ^^ \# .+? $$ }; # top level rule config { <line>+ }; rule line :w { ^^ <idnum> <centre>? <points> $$ }; # line components rule idnum :w { ( \d+ ) \: { $1 && print "$1\n" } }; rule centre { \( [ centre || center ] \) }; rule points :w { <pair> [ ; <pair> ]* }; rule pair :w { <point> , <point> }; rule point { \d+ [ \. \d+ ]? }; } perl 6 rules and grammars
Working Example # slurp my $config = do{ local $/; <DATA> }; # heal continuations, ditch blanks and comments $config =~ s{ <NineBlock.continuation> || <NineBlock.blankline> || <NineBlock.comment> }{}g; # validate print "valid\n" if $config =~ m/<NineBlock.config>/; perl 6 rules and grammars
Working Example And that was about as far as I got with my working example. I spent an awfully long time trying to get it to capture things automatically, but without much luck. The captured results are returned into a tied global called $0, which was quirky and unmanageable. perl 6 rules and grammars
Interpolation and Closures # Perl 5 @cmd = qw{ get put try find copy }; $cmd = join '|', map { quotemeta $_ } @cmd; $str =~ / (?:$cmd) \( .*? \) /x; # Perl 6 @cmd = qw{ get put try find copy }; $str ~~ m:w/ @cmd \( .*? \) /; # Or my @cmds; $str ~~ m:w:each/^^ ( @cmd ) ::: ( \( \N*? \) )$$ { push @cmds, { :cmd($1) :arg($2) } } /; perl 6 rules and grammars
Qualifiers # Other qualifiers my $foo = "fee fi fo feh far foo fum "; $foo ~~ s:each:2nd/ ( f \w+ ) /bar/; # $foo now "fee bar fo bar far bar fum "; my $str = "ahhh"; @matches = $str ~~ m/ ah* /; # returns "ahhh" @matches = $str ~~ m:any/ ah* /; # returns "ahhh", "ahh", "ah", "a" perl 6 rules and grammars
Flow control The backtracking controls are cleaner and way more powerful. But, they force you to think in a different way. : # don’t retry previous atom :: # fail out of current [] alternation ::: # fail out of current rule <commit> # fail out of entire match $uri ~~ m{ [http\:|ftp\:|mailto\:] <commit> // <path> }; perl 6 rules and grammars
Flowcontrol Specifically, they cause you to think about your regex as a little program. # Better examples: m/ rm @options* ::: <path>+ $$/; m:w/ [ if :: <expr> <block> | for :: <list> <block> | loop :: <loop_controls>? <block> ] / rule subname { ( [ <alpha> | _ ] \w* ) <commit> { fail if %reserved{$1} } } perl 6 rules and grammars
Rules/Assertions Predefined Rules/Assertions: <ws> <alpha> <ident> <lt>,<gt>,<dot> # lookahead/behind <before pattern> (was (?=pattern)) <!before pattern> <cut>, <null> perl 6 rules and grammars
More Assertions m/ (<ident>) <{ %cache{$1} //= get_body($1) }> / <{ }> # a closure that evaluates to a rule { } # closure that runs at canonical time <( )> # assertion that can be optimized away # ie. m/^ (.*) <( print $1 )> bar $/; <::($somename)> # symbolic indirect <&foo()> # short for <{ foo() }> <[a-z_]> # same as p5 [a-z_] <+[a-z_]> # same <-[a-z_]> # negated [^a-z_] <+<alpha>-[A-Za-z]> # set ops perl 6 rules and grammars
Yet More Assertions # aggregate rules <%hash> | <@array> # much like bare %hash && @array, but only # evaluate to rules (string or ref). my @dog = ( rx:i{ Butch }, rx:i{ Duke }, rx:i{ Teacup }, ); m:w/Let’s take <@dog> for a walk./; # Literal assertions: <‘Spaces matter ,literally’> <“Literally\tafter $interpolation.”> perl 6 rules and grammars
Rules are Subs/Methods rule badline( Str $errmsg ) { ( \N* ) ::: { fail "$errmsg: $1" } } # other rules given( $lines ) { m { ^ [ # line group ^^[ @cmd | <badline( "invalid line" )> ]:: <etc> # command followed by etc. $$ ]* $ } } perl 6 rules and grammars
Rules are Subs/Methods rule block { \( [ # stuff that isn’t a block, or a block <-[()]>* : | <block> ]* \) } rule xblock($left, $right) { $left [ <-[$left $right]>* : | <xblock $left $right> ]* $right } perl 6 rules and grammars
Readable 0-width assertions rule quoted_string ($type) { $type [ <-[$type]>+ : | [ <after \\ > $type ] ]* $type } rule string { ( <quoted_string "> | <quoted_string '> ) } perl 6 rules and grammars
Assertions and Binding rule quoted_string ($type) { $type [ <-[$type]>+ : | [ <after $esc := ( \\* ) <( $esc.length() % 2 )> > $type ] ]* $type } perl 6 rules and grammars
Binding So, the part I like best about Perl6 Rules is the ease of reasonable capturing. In p5, I do this sort of junk all the time: %hash = $str =~ m/(\w+)=(\w+)/g; Or, I see people doing stuff like this: $str =~ m/Number:\s*(\d+)/; my $num = $1; perl 6 rules and grammars
Binding Or struggling with things like this: m/(?: attr=“(\w+)” )+/xi; # or m/I ((don’t)? know (how|where)+) to count from./ In Perl6 rules, there are lots of nice, clean ways of binding your results to meaningful, lexical names. perl 6 rules and grammars
Hypothetical Binding Starting from least cool, moving towards most cool: # normal lexical my $num; m/ (\d+) { let $num := $1 } (<alpha>+) / # match lexical: m/ (\d+) { let $<num> := $1 } (<alpha>+) / # In this case the variable lives in match # variable $/ rather than the surrounding # lexical pad. $/{‘num’} or $/«num». # Also available as $<num> while $/ in scope. perl 6 rules and grammars
Hypothetical Binding # Alternations m/ [ (\d+) { let $<num> := $1 } | (<alpha>+) { let $<alpha> := $2 } | (.) { let $<other> := $3 } ] / # Shorthand for binding hypothetically: m/ [ $<num> := (\d+) | $<alpha> := (<alpha>+) | $<other> := (.) ] / perl 6 rules and grammars
Hypothetical Binding # hypo-bind $n, if desired: my ($key, $val) = m:w{ $1 := (\w+) =\> $2: = (.*?) | $2 := (.*?) \<= $1: = (\w+) }; # Repeated captures can be bound to arrays: m/ @<values> := [ (.*?) , ]* / # Pairs of repeated captures can be bound to hashes: m/ %<options> := [ (<ident>) = (\N+) ]* / m/ %<nonblank> := [ ^^ (\N+) $$ ]* / perl 6 rules and grammars
Auto Hypothetical Binding # <rule> captures to $<<rule>> by default: m/ <key> «ws» =\> «ws» <value> { %hash{$«key»} = $«value» } /; # <<rule>> suppresses autocapture What exactly happens in recursive-auto-hypo-bind cases is entirely hazy to me at this time. And I can’t test it. In theory we are supposed to end up with a nice tree structure. perl 6 rules and grammars
Auto Hypothetical Binding # <rule> captures to $<<rule>> by default: m/ <key> «ws» =\> «ws» <value> { %hash{$«key»} = $«value» } /; # <<rule>> suppresses autocapture What exactly happens in recursive-auto-hypo-bind cases is entirely hazy to me at this time. And I can’t test it. In theory we are supposed to end up with a nice tree structure. perl 6 rules and grammars
Example grammar NineBlock { # meta bits rule continuation { \\ \n <ws>* }; rule blankline { ^^ <ws>* $$ }; rule comment { ^^ \# .+? $$ }; # top level rule config { <line>+ }; rule line :w { ^^ <idnum> <centre>? <points> $$ }; # line components rule idnum :w { ( \d+ ) \: { $1 && print "$1\n" } }; rule centre { \( [ centre || center ] \) }; rule points :w { <pair> [ ; <pair> ]* }; rule pair :w { <point> , <point> }; rule point { \d+ [ \. \d+ ]? }; } perl 6 rules and grammars
Ameliorated Example grammar NineBlock { rule config { [ <line> ::: { push @<nodes>, { :id ( $<idnum> ) :centre( $<<centre>> ? 1:0 ) :coord ( @<points> ) } } ]+ }; rule line :w { ^^ <idnum> <centre>? <points> $$ }; rule points :w { { @<points> = () } [ ;? <pair> ::: { push @<points>, [ $<x>,$<y> ] } ]+ }; rule pair :w { $<x> := <point> , $<y> := <point> }; rule point { ( \d+ [\.\d+]? )::: <( $1 <= 1 )> }; rule idnum :w { $<idnum> := ( \d+ ) \: }; rule centre { \( [ centre || center ] \) }; } perl 6 rules and grammars
Ameliorated Example # slurp my $conffile = do{ local $/; <DATA> }; # Parse! my @nodes; if ( $conffile ~~ m/^ <NineBlock.config> $/ ) { @nodes := @<nodes>; } else { die “couldn’t parse conffile!”; } if ( @node[0]«centre» ) { # etc… perl 6 rules and grammars
Conclusion In fact, I shouldn’t have to do all that pushing and manual capturing. I should be able to let the grammar do it for me. But, I don’t know how yet, and I suspect no one does. Questions? Beers? perl 6 rules and grammars