150 likes | 162 Views
This appendix discusses the SLLGEN parsing system scanning, which divides character sequences into units such as whitespace, comments, identifiers, and numbers. The scanner returns tokens with lexical class, descriptive data, and input position indication.
E N D
Environment-Passing Interpreters • Programming Language Essentials • 2nd edition • Appendix: The SLLGEN Parsing System
Scanning • divide character sequence into units such as whitespace, comments, identifiers, numbers, … • commonly expressed through regular expressions, i.e., patterns. • scanner should return token: lexical class, descriptive data, and input position indication: • class Scheme data • identifier symbol • number numerical value • literal string value
Scanner • (define the-lexical-spec • '((white-sp (whitespace) skip ) • (comment • ("%" (arbno (not #\newline))) • skip • ) • (id • (letter (arbno (or letter digit "?"))) • symbol • ) • (number (digit (arbno digit)) number) • ) )
Lexical Specification • ((class(regexp ..)outcome) .. ) • class a symbol that will be used in the grammar specification. • regexp a pattern to be matched. • outcome one of • skip to ignore the token, • symbol to return a Scheme symbol as data, • number to return a Scheme number as data, • string to return a Scheme string as data. • longest match; ties are string rather then symbol.
regexp • regexp: string • : letter | digit | whitespace | any • : (notcharacter) • : (orregexp ..) • : (arbnoregexp) • : (concatregexp ..) • this is more than grep and less than egrep or lex.
Parsing • organize token sequence into abstract syntax tree over a defined datatype based on context-free grammar • grammar representation • nonterminal datatype • each alternative rhs variant in datatype • identifier in rhs field with symbol • number field with number • nonterminal field with AST value • string [not collected]
Parser • (define the-grammar • '((program (expression) a-program) • (expression (number) lit-exp) • (expression (id) var-exp) • (expression (primitive • "(" (separated-list expression ",") ")") • primapp-exp) • (primitive ("+") add-prim) • (primitive ("-") subtract-prim) • (primitive ("*") mult-prim) • (primitive ("add1") incr-prim) • (primitive ("sub1") decr-prim) • ) )
Grammar Specification • ((nonterminal(item ..)variant) .. ) • nonterminal a symbol representing a nonterminal; the first one is the start symbol. • item.. a sequence defining one alternative right hand side. • variant a symbol to be used as the variant name for the datatype representing the nonterminal. • alternative right hand sides are specified by repeating the nonterminal and using a different variant.
item • item: nonterminal | class | string • : (arbnoitem ..) • : (separated-listitem .. string) • this represents extended BNF, without notations for optional items, items to be repeated at least once, or alternatives. • SLLGEN checks that the grammar is LL(1).
Operations • (sllgen:list-define-datatypes scan parse) • (sllgen:make-define-datatypes scan parse) • display or create the AST datatype • (sllgen:make-string-scanner scan parse) • (sllgen:make-string-parser scan parse) • return functions accepting a string and returning a token list or an AST • (sllgen:make-rep-loop prompt eval (sllgen:make-stream-parser scan parse) • ) • returns parameterless function running a read-evaluate-print loop
AST Mapping • ((nonterminal(item ..)variant) .. ) • item: nonterminal | class | string • : (arbnoitem ..) • : (separated-listitem .. string) • (define-datatypenonterminalnonterminal? • (variant • (field-namenonterminal?) • .. • ) .. • )
AST Mapping • ((nonterminal(item ..)variant) .. ) • item: nonterminal | class | string • : (arbnoitem ..) • : (separated-listitem .. string) • (define-datatypenonterminalnonterminal? • (variant • (field-namesymbol?) • .. • ) .. • )
AST Mapping • ((nonterminal(item ..)variant) .. ) • item: nonterminal | class | string • : (arbnoitem ..) • : (separated-listitem .. string) • string, i.e., a keyword, is not represented in the AST — if necessary, the grammar has to map a string to a nonterminal for representation.
AST Mapping • ((nonterminal(item ..)variant) .. ) • item: nonterminal | class | string • : (arbnont class ..) • : (separated-listnt class .. string) • (define-datatypenonterminalnonterminal? • (variant • (field-name1(list-ofnt?)) • (field-name2(list-ofsymbol?)) • .. • ) .. • )
AST Mapping • ((nonterminal(item ..)variant) .. ) • item: (arbno • (separated-listnt class .. string)) • (define-datatypenonterminalnonterminal? • (variant • (name1(list-of (list-ofnt?))) • (name2(list-of (list-ofsymbol?))) • .. • ) .. • ) • the symbol sequence is flattened into the variant