100 likes | 320 Views
CPSC320 Tutorial 5. Introduction to SLLGEN 19 Oct. 2010, 09 Nov. 2010 Narek Nalbandyan. The Concept. Programs are just strings of characters. In order to process a program, we need to group these characters into meaningful units.
E N D
CPSC320 Tutorial 5 Introduction to SLLGEN 19 Oct. 2010, 09 Nov. 2010 Narek Nalbandyan
The Concept Programs are just strings of characters. In order to process a program, we need to group these characters into meaningful units. The grouping is usually divided into two stages: scanning and parsing. Scanning is the process of dividing the sequence of characters into words, punctuation, etc. These units are called lexems or tokens. Parsing is the process of organizing the sequence of tokens into hierarchical syntactic structures such as expressions, statements, blocks SLLGEN is a package for generating scanners and parsers in Racket.
Scanning The way in which a given stream of characters is to be separated into lexical items is part of the language specification, called lexical specification Typical pieces of lexical specification might be: Any sequence of spaces and newlines is equivalent to a single space A comment begins with % and continues until the end of the line An identifier is a sequence of letters and digits starting with a letter The job of the scanner is to go through the input and analyze it to produce data structures with these items
Scanning cont… One could write a scanner from scratch, but that would be tedious and error-prone A better approach is to write-down the lexical specification in regular expressions The language of regular expressions is defined as follows: R::= Character | RR | R U R | R* | ¬Character The specifications of our example would be: whitespace = (space U newline) (space U newline)* comment = %(¬newline)* identifier = letter (letter U digit)*
Parsing Parsing is the process of organizing the sequence of tokens into hierarchical syntactic structures such as expressions, statements, blocks, etc. The syntactic structure of a language is typically specified using context-free grammar The parser takes as input a sequence of tokens, and its output is an abstract syntax tree The abstract syntax trees produced by an SLLGEN parser can be described by define-datatype
Parsing cont… Statement ::= { Statement ; Statement } ::= while Expression do Statement ::= Identifier := Expression Expression ::= Identifier ::= (Expression – Expression) The trees produced by this grammar could be described by this data type: (define-datatype statement statement? (compound-statement (stmt1 statement?) (stmt2 statement?)) (while-statement (test expression?) (body statement?)) (assign-statement (lhs symbol?) (rhs expression?)))
Parsing cont… (define-datatype expression expression? (var-exp (var symbol?)) (diff-exp (exp1 expression?) (exp2 expression?)) And the input { x := foo; while x do x := (x – bar) } produces the output #(struct: compound-statement #(struct: assign-statement x #(struct: var-exp foo)) #(struct: while-statement #(struct: var-exp x) #(struct: assign-statement x #(struct: diff-exp #(struct: var-exp x) #(struct: var-exp bar)))))
Scanners and Parsers in SLLGEN In SLLGEN, scanners are specified by regular expressions. Our example would be written in SLLGEN as follows: (define scanner-spec-a ‘((white-sp (whitespace) skip) (comment (“%” (arbno (not #\newline))) skip) (identifier (letter (arbno (or letter digit))) symbol) (number (digit (arbno digit)) number)))
Scanners and Parsers in SLLGEN cont… SLLGEN also includes a language for specifying grammars. Our grammar of statements and expressions would be written in SLLGEN as: (define grammar-a1 ‘((statement (“{” statement “;” statement “}”) compound-statement) (statement (“while” expression “do” statement) while-statement) (statement (identifier “:=” expression) assign-statement) (expression (identifier) var-exp) (expression (“(” expression “-” expression “)”) diff-exp)))
Scanners and Parsers in SLLGEN cont… For incorporating these lexical specifications and grammars into an executable parser, SLLGEN includes several procedures: (sllgen:make-define-datatypes the-lexical-spec the-grammar) generates define-datatype expression for each production of the grammar,for use by cases (define show-the-datatypes (lambda () (sllgen:list-define-datatypes the-lexical-spec the-grammar))) generates define-datatype again, but lists them instead of executing them (define just-scan (sllgen:make-string-scanner the-lexical-spec the-grammar)) takes the lexical specification and the grammar and generates a scanning procedure. This procedure may be applied to a string and produces list of tokens. (define scan&parse (sllgen:make-string-parser the-lexical-spec the-grammar)) generates the parser. It takes a string, scans, parses, and returns an abstract syntax tree