150 likes | 261 Views
Programming Languages Third Edition. Chapter 6 Part I Syntax / Regular Expressions. Objectives. Understand the lexical structure of programming languages Understand regular expressions Read Section 6.1, pp. 204-208. Introduction. Syntax is the structure of a language
E N D
Programming LanguagesThird Edition Chapter 6 Part I Syntax / Regular Expressions
Objectives • Understand the lexical structure of programming languages • Understand regular expressions • Read Section 6.1, pp. 204-208 Programming Languages, Third Edition
Introduction • Syntax is the structure of a language • Syntax rules are analogous to the grammar rules of a natural language • John Backus and Peter Naur developed a notational system for describing these grammars, now called Backus-Naur forms, or BNFs • First used to describe the syntax of Algol60 • Every modern computer scientist needs to know how to read, interpret, and apply BNF descriptions of language syntax Programming Languages, Third Edition
Flowchart for Compilation Source Code (your program) Compiler Object Code (machine language) Programming Languages, Third Edition
Flowchart for Compilation - Details Source Code (your program = char stream) Semantic analysis (analyzes meaning) Scanner (lexical analysis) Intermediate Code Lexical items / Tokens Optimization Parser (syntactic analysis) Object Code (machine language) Parse tree Programming Languages, Third Edition
Lexical Structure of Programming Languages • Lexical structure: the structure of the tokens, or words, of a language • Scanning phase: the phase in which a translator collects sequences of characters from the input program and forms them into tokens • Parsing phase: the phase in which the translator processes the tokens, determining the program’s syntactic structure Programming Languages, Third Edition
Lexical Structure of Programming Languages (cont’d.) • Tokens generally fall into several categories: • Reserved words (or keywords) • Literals or constants • Special symbols, such as “;” “<=“ “+” • Identifiers Programming Languages, Third Edition
Lexical Structure of Programming Languages (cont’d.) • Token delimiters (or white space): formatting that affects the way tokens are recognized • Indentation can be used to determine structure • Free-format language: one in which format has no effect on program structure other than satisfying the principle of longest substring • Fixed format language: one in which all tokens must occur in prespecified locations on the page • Tokens can be formally described by regular expressions Programming Languages, Third Edition
ScanningRegular Expressions • Metalanguage for describing patterns for strings of characters – metasymbols are | means choice * means zero or more occurrences + means one of more occurrences ? means one optional occurrence [ ] choose one of list of chars in brackets can use a range . (period) means one of any character ( ) can be used for grouping \ can precede metasymbol with this to use metasymbol in string Programming Languages, Third Edition
Regular Expressions (cont’d.) • Most modern text editors use regular expressions in text searches • Utilities such as lex can automatically turn a regular expression description of a language’s tokens into a scanner Programming Languages, Third Edition
Regular Expressions (cont’d.) • Examples: [aeiou] [aeiouAEIOU] [aeiouAEIOU]+ [aeiouAEIOU]* (a|b)*c [ab]*c (ab|ba|aa)*c [A-Z][a-z]* [A-Z]+[a-z] [A-Za-z]* [0-9]+ [0-9]+(\.[0-9]+) Programming Languages, Third Edition
Regular Expressions (cont’d.) • Let’s try writing some: • Signed integers, sign not optional • Signed integers, sign optional • Signed integers, sign optional, no signed zero Programming Languages, Third Edition
Regular Expressions (cont’d.) • Let’s try writing some for license plates: • Start with VA, followed by zero or more digits • Start with VA, followed by one or more digits • Start with VA, followed by 2 digits, followed by zero or more lower case letters • Start with V or A, followed by -, followed by 2-4 digits • Start with VA, any case, followed by 2-3 digits or 2-3 letters Programming Languages, Third Edition
Regular Expression Fun • Regular Expression Crossword Puzzles • http://regexcrossword.com/ Programming Languages, Third Edition