Understanding the Computation Involved in Scanner Implementation

Programming Language Syntax 3 http://flic.kr/p/zCyMp

Why separate scanner and parser? ANTLR generates for you … • Parser much more computationally intensive than scanner • Scanner considerably reduces number of items that parser must inspect But what computation is involved in scanner? That’s today’s topic…

Consider these “calculator language” tokens

Complete this ad hoc scanner Tokens

Here’s a solution Tokens

Ad hoc scanner implementations common for production languages • Fast, compact code • But… • Finite automata can be generated automatically from a set of regular expressions • Good for developing languages • Easy to regenerate scanner

Calculator language scanner automaton • Note that this is a deterministic finite automaton (DFA) • Only ever one possible transition for an input character

Three steps generate scanner • Generate nondeterministic finite automaton (NFA) • Multiple transition out of state for same character • Epsilon transitions (ε) • Convert NFA to DFA • No need to search all paths in DFA • Optimize by minimizing states in DFA

NFA building blocks

Construct an NFA for this regex Solve in this order . d d* .d d. .d|d.

A solution…

NFA to DFA conversion • “Set of subsets” construction • State of DFA after reading given input represents set of states NFA might have reached • Example: Start 1, 2, 4 d 2, 3, 4

Given this NFA…

Fill in the blanks

The solution…

Minimizing the DFA Steps: • Merge all non-final states into a single state and merge all final states into a single state (expect ambiguity) • For each ambiguous input, split states back to their original division until input is no longer ambiguous

Minimizing the DFAStep 1: Merge state into non-final and final

Minimizing the DFAStep 2 (repeated): Disambiguate by splitting First, we disambiguate d Now, how to disambiguate “.”?

Minimizing the DFAStep 2 (repeated): Disambiguate by splitting

How implement scanner based on automaton? Two common approaches: • Nested case statements • Tend to be ad hoc • Table and driver • Tend to be generated

Nested-case approach Outer cases handle states Inner cases handle transitions (set a new state) Note: Look-ahead may be necessary to accept longest possible token

Table and driver approach Good news! We’ll let ANTLR handle the implementing!

What’s next? • Homework 1 due next class!

Understanding the Computation Involved in Scanner Implementation

Understanding the Computation Involved in Scanner Implementation

Presentation Transcript

LC-3 Assembly Language Programming Examples

Programming language 3

Chapter 2 Scott Programming Language Syntax

Syntax 3

CS3101-3 Programming Language – Java

Programming Language Implementation Lexical and Syntax Analysis Part II

The TXL Programming Language (3)

CS3101-3 Programming Language – Java

Chapter 2 :: Programming Language Syntax

3 Components of Java Programming Language

CS3101-3 Programming Language – Java

JAVA PROGRAMMING Chapter 3 SYNTAX, ERRORS, AND DEBUGGING

Assembly Language Programming Part 3

Programming Language Syntax 6

CSCE 330 Programming Language Structures Ch.2: Syntax and Semantics

3 Syntax

CSCE 330 Programming Language Structures Chapter 2: Syntax