150 likes | 238 Views
Environment-Passing Interpreters. Programming Language Essentials 2nd edition Appendix: The SLLGEN Parsing System. Scanning. divide character sequence into units such as whitespace, comments, identifiers, numbers, … commonly expressed through regular expressions, i.e., patterns.
E N D
Environment-Passing Interpreters • Programming Language Essentials • 2nd edition • Appendix: The SLLGEN Parsing System
Scanning • divide character sequence into units such as whitespace, comments, identifiers, numbers, … • commonly expressed through regular expressions, i.e., patterns. • scanner should return token: lexical class, descriptive data, and input position indication: • class Scheme data • identifier symbol • number numerical value • literal string value
Scanner • (define the-lexical-spec • '((white-sp (whitespace) skip ) • (comment • ("%" (arbno (not #\newline))) • skip • ) • (id • (letter (arbno (or letter digit "?"))) • symbol • ) • (number (digit (arbno digit)) number) • ) )
Lexical Specification • ((class(regexp ..)outcome) .. ) • class a symbol that will be used in the grammar specification. • regexp a pattern to be matched. • outcome one of • skip to ignore the token, • symbol to return a Scheme symbol as data, • number to return a Scheme number as data, • string to return a Scheme string as data. • longest match; ties are string rather then symbol.
regexp • regexp: string • : letter | digit | whitespace | any • : (notcharacter) • : (orregexp ..) • : (arbnoregexp) • : (concatregexp ..) • this is more than grep and less than egrep or lex.
Parsing • organize token sequence into abstract syntax tree over a defined datatype based on context-free grammar • grammar representation • nonterminal datatype • each alternative rhs variant in datatype • identifier in rhs field with symbol • number field with number • nonterminal field with AST value • string [not collected]
Parser • (define the-grammar • '((program (expression) a-program) • (expression (number) lit-exp) • (expression (id) var-exp) • (expression (primitive • "(" (separated-list expression ",") ")") • primapp-exp) • (primitive ("+") add-prim) • (primitive ("-") subtract-prim) • (primitive ("*") mult-prim) • (primitive ("add1") incr-prim) • (primitive ("sub1") decr-prim) • ) )
Grammar Specification • ((nonterminal(item ..)variant) .. ) • nonterminal a symbol representing a nonterminal; the first one is the start symbol. • item.. a sequence defining one alternative right hand side. • variant a symbol to be used as the variant name for the datatype representing the nonterminal. • alternative right hand sides are specified by repeating the nonterminal and using a different variant.
item • item: nonterminal | class | string • : (arbnoitem ..) • : (separated-listitem .. string) • this represents extended BNF, without notations for optional items, items to be repeated at least once, or alternatives. • SLLGEN checks that the grammar is LL(1).
Operations • (sllgen:list-define-datatypes scan parse) • (sllgen:make-define-datatypes scan parse) • display or create the AST datatype • (sllgen:make-string-scanner scan parse) • (sllgen:make-string-parser scan parse) • return functions accepting a string and returning a token list or an AST • (sllgen:make-rep-loop prompt eval (sllgen:make-stream-parser scan parse) • ) • returns parameterless function running a read-evaluate-print loop
AST Mapping • ((nonterminal(item ..)variant) .. ) • item: nonterminal | class | string • : (arbnoitem ..) • : (separated-listitem .. string) • (define-datatypenonterminalnonterminal? • (variant • (field-namenonterminal?) • .. • ) .. • )
AST Mapping • ((nonterminal(item ..)variant) .. ) • item: nonterminal | class | string • : (arbnoitem ..) • : (separated-listitem .. string) • (define-datatypenonterminalnonterminal? • (variant • (field-namesymbol?) • .. • ) .. • )
AST Mapping • ((nonterminal(item ..)variant) .. ) • item: nonterminal | class | string • : (arbnoitem ..) • : (separated-listitem .. string) • string, i.e., a keyword, is not represented in the AST — if necessary, the grammar has to map a string to a nonterminal for representation.
AST Mapping • ((nonterminal(item ..)variant) .. ) • item: nonterminal | class | string • : (arbnont class ..) • : (separated-listnt class .. string) • (define-datatypenonterminalnonterminal? • (variant • (field-name1(list-ofnt?)) • (field-name2(list-ofsymbol?)) • .. • ) .. • )
AST Mapping • ((nonterminal(item ..)variant) .. ) • item: (arbno • (separated-listnt class .. string)) • (define-datatypenonterminalnonterminal? • (variant • (name1(list-of (list-ofnt?))) • (name2(list-of (list-ofsymbol?))) • .. • ) .. • ) • the symbol sequence is flattened into the variant