320 likes | 333 Views
Explore the concepts of languages, formal grammars, and generative grammars in software engineering. Learn about formal language specifications, production operations, and Chomsky's generative grammar components.
E N D
LANGUAGE AND GRAMMARS COMP 319
Contents • Languages and Grammars • Formal languages • Formal grammars • Generative grammars • Analytic grammars • Context-free grammars • LL parsers • LR parsers • Rewrite systems • L-systems COMP319
Software Engineering Foundation Software engineering may be summarised by saying that it concerns the construction of programs to solve problems and that there are three parts: • Construction/engineering, and methods • Problems, and problem solving, and • Programs COMP319
Languages and grammar • Languages are spoken and written (linguistics) • To be effective they must be based on a shared set of rules – a grammar • Grammars are introspective they are based on and couched in language • Natural language grammars are constantly shifting and locally negotiated • A grammar is a formal language in which the rules of discourse are discussed and are the aim COMP319
Formal language concepts • The concept emerges because of the need to define rules (for language) • Formally, they are collections of words composed of smaller, atomic units • Issues of concern are • the number and nature of the atomic units, • the precision level required, • the completeness of the formalism COMP319
Examples of formal languages • The set of all words over {a, b} • The set {an : n is a prime number} • The set of syntactically correct programs in a given computer programming language • The set of inputs upon which a certain Turing machine halts COMP319
Formal language specification There are many ways in which a formal language can be specified e.g. • strings produced in a formal grammar • strings produced by regular expressions • the strings accepted by automata • logic and other formalisms COMP319
Language Production Operations • Concatenation of strings drawn from the two languages • Intersection or union of common strings in both languages • Complement of one language • Right quotient of one by the other • Kleene star operation on one language • Reverse of a language • Shuffle combination of languages COMP319
Formal Grammars • Noam Chomsky • Linguist, philosopher at MIT • 1956, papers on information and grammar • Types of formal grammar • Generative grammar • Analytical grammar COMP319
Generative formal grammars • Generative grammars: A set of rules by which all possible strings in a language to be described can be generated by successively rewriting strings starting from a designated start symbol. In effect it formalises an algorithm that generates strings in the language. COMP319
Analytic formal grammars • Analytic grammars: A set of rules that assumes an arbitrary string as input, and which successively reduces or analyses that string to yield a final boolean “yes/no” that indicates whether that string is a member of the language described by the grammar In effect a parser or recogniser for a language COMP319
Generative grammar components Chomsky’s definition – essentially for linguistics but perfect for formal computing grammars; consists of the following components: • A finite set N of nonterminal symbols • A finite set of terminal symbols disjoint from N • A finite set P of production rules where a rule is of the form: string in (N)* → string in (N)* • A symbol S in N that is identified as the start symbol COMP319
Generative grammar definition • A language of a formal grammar: • G = (N, ,P, S) • Is denoted by L(G) • And is defined as all those strings over such that can be generated by starting from the symbol S and then applying P until no more nonterminal symbols are present COMP319
A generative formal grammar • Given the terminals {a, b}, nonterminals {S, A, B} where S is the special start symbol and • Productions: S → ABS S → (the empty string) BA → AB BS → b Bb → bb Ab→ab Aa→aa Defines all the words of the from anbn, (i.e. n copies of a followed by n copies of b) COMP319
Context Free Grammars • Theoretical basis of most programming languages. • Easy to generate a parser using a compiler compiler. • Two main approaches exist: top-down parsing e.g. LL parsers, and bottom-up parsing e.g. LR parsers. COMP319
LL parser • Table based, top down parser for a subset of the context-free grammars (LL grammars). • Parsing is Left to right, and constructs a Leftmost derivation of the sentence. • LL(k) parsers use k tokens of look-ahead to parse the LL(k) grammar sentence. • LL(1) grammars are popular and fast because only the next token is considered in parsing decisions. COMP319
Table based LL parsing • Consider the grammar • S → F • S → ( S + F) • F → 1 • This has the parsing table • e.g. 1 and S implies rule 1 • i.e. Stack S is replaced with F • and 1 is output • Stack and Input same = delete • Stack and Input different = error • Example input • ( 1 + 1 ) $ Architecture Input buffer: <null> | | +-------------+ Stack | | S <---| Parser | --> Output $ | | +-------------+ ^ | +-----------+ | Parsing | | table | +-----------+ COMP319
Table based LL parsing • Consider the grammar • S → F • S → ( S + F) • F → 1 • This has the parsing table • e.g. 1 and S implies rule 1 • i.e. Stack S is replaced with F • and 1 is output • Stack and Input same = delete • Stack and Input different = error • Example input • ( 1 + 1 ) $ COMP319
Left Right Parser • Bottom up parser for context-free grammars used by many program language compilers • Parsing is Left to right, and produces a Rightmost derivation. • LR(k) parsers uses k tokens of look-ahead. • LR(1) is the most common type of parser used by many programming languages. Usually always generated using a parser generator which constructs the parsing table; e.g. Simple LR parser (SLR), Look Ahead LR (LALR) e.g. Yacc, Canonical LR. COMP319
Left Right parser example.. • Rules ... • 1) E → E * B • (2) E → E + B • (3) E → B • (4) B → 0 • (5) B → 1 COMP319
Left Right parser example COMP319
Re-writing • Rewriting is a general process involving strings and alphabets. Classified according to what is rewritten e.g. strings, terms, graphs, etc. • A rewrite system is a set of equations that characterises a system of computation that provides one method of automating theorem proving and is based on use of rewrite rules. • Examples of practical systems that use this approach includes the software Mathematica. COMP319
Re-writing logic example • ! ! A = A // eliminate double negative • !(A AND B) = !A OR !B // de-morgan COMP319
L-systems • Named after Aristid Lindenmeyer (1925-1989) a Swedish theoretical biologist and botanist who worked at the University of Utrecht (Netherlands) • Are a formal grammar used to model the growth and morphology of plants and animals • In plant and animal modelling a special form, the parametric L-system is used – based on rewriting. • Because of their recursive, parallel, and unlimited nature they lead to concepts of self-similarity and fractional dimension and fractal-like forms. COMP319
L-system structure • The basic system is identical to formal grammars: G = {V, S, Ω, P} • where G is the grammar defined V (the alphabet) a set of symbols that can be replaced by (variables) S is a set of symbols that remain fixed (constants) Ω(start, axiom or initiator) a string from V, the initial state P is a set of rules or productions defining the ways variables can be replaced by constants and other variables. Each rule, consists of a LHS (predecessor) and RHS (successor) COMP319
Slide 28 Example 1: Fibonacci numbers N=0 A N=1 → B N=2 → AB N=3 → BAB N=4 → ABBAB N=5 → BABABBAB N=6 → ABBABBABABBAB N=7 → BABABBAB... Counting lengths we get: 1,1,2,3,5,8,13,21,... The Fibonacci numbers • V: A B • C: none • Ω : A • P: p1: A → B • p2: B → AB COMP319
Slide 29 Example 2: Algal growth N=0 A → AB N=1 → ABA N=2 → ABAAB N=3 → ABAABABA • V: A B • C: none • Ω : A • P: p1: A → AB • p2: B → A COMP319
Example 3: Koch snowflake N=0 F N=1 → F+F-F-F+F N=2 → F+F-F-F+F+F... N=3 etc • V: F • C: none • Ω : F • P: p1: F → F+F-F-F+F COMP319 Software Engineering II COMP319
Example 4: 3D Hilbert curve COMP319
Example 5: Branching COMP319