40 likes | 131 Views
Regular Expressions (RE's)– Review. A means of describing a possibly infinite language in finite terms. We aim to turn a RE into a Deterministic Finite State Automaton (DFA) Steps: 1: RE -> Non Deterministic Finite State Automaton (FSA) 2: FSA -> DFA 3: DFA -> minDFA.
E N D
Regular Expressions (RE's)– Review • A means of describing a possibly infinite language in finite terms. • We aim to turn a RE into a Deterministic Finite State Automaton (DFA) • Steps: • 1: RE -> Non Deterministic Finite State Automaton (FSA) • 2: FSA -> DFA • 3: DFA -> minDFA • Aim is the create a mechanism to recognise valid words in a Language. • In our course it means recognising words like int, float, public etc. • These are called Tokens. • NB: Also it classifies the Tokens !!
JLex • Java version of Lex. • Given a file containing RE's and JLex macros (.lex file) • We run JLex over this .lex file and a .java file is produced. • We then call JLex to produce a Token by using next_token(). • No need to code the DFA ourselves, it is automatic, saves time.
Limitations of RE's • Say we define the following RE's: • digits = [0-9]+ • sum = (digits “+” )* digits • we can define sums like 3+78+9 etc. • If we have: • digits = [0-9]+ • sum = expr “+” expr • expr = “(“ sum “)” | digits • we can define (1+(5+8)) etc. • It is impossible for a RE to recognise balanced parenthesis. • A machine with only N states can onle recognise N levels of parenthesis nesting. • Therefore we need a new notation to represent the language above. • We move on to Context Free Grammars.
Context Free Grammars (CFG's) • RE's define lexcial structure declaratively. • Similarly CFG's define syntactic structure declaratively. • Definitions: • A langauge is a set of strings. • Each string is a finite sequence of symbols. • Symbols come from a finite alphabet. • CFG's describe languages and is formed of productions. • E.g. symbol -> sym1 sym2 sym3 ...... sym(N) • Symbols are either • 1: Terminal < -- > Token • 2: Non Terminal : Variable to denote a set of Strings.