180 likes | 336 Views
Languages & Strings. String Operations Language Definitions. Strings. A string x (over alphabet A) is a finite sequence x = x 1 x 2 .. x n where x i A. Length – the length of x is the number of characters, n, in the sequence. Empty String – λ denotes the empty string of length 0.
E N D
Languages & Strings String Operations Language Definitions
Strings • A string x (over alphabet A) is a finite sequence x = x1x2 .. xn where xi A. • Length – the length of x is the number of characters, n, in the sequence. • Empty String – λ denotes the empty string of length 0. • Recursive definition of the set of strings A* over alphabet A • Basis : The empty string λ A* • Recursive Step : If x A* and a A, then xa A* • Closure : A* contains no other strings
Languages and String Operations • Languages • A language L over alphabet A is any subset of A* • Concatenation : The concatenation of two strings x, y is xy, a string of length of x + length of y. • The concatenation of two languages : The concatenation of two languages L and M is LM, where LM = { z | z = xy where x L, y M. • Example: T = D* and O = {“+”,”-”} where D = {0,1,..,9}. Then TOT is the language {“1+1, 12+24, . . .}
Recursive Definition of Regular Sets • Let A be an alphabet. The regular sets over A are: • Basis : , {λ} and {a} are each regular sets • Recursive Step : If X, Y are regular sets, so is • X Y • XY • X* • Closure : X is a regular set over A iff it can be obtained by a finite number of applications of the recursive step
Regular Set Examples • Signed and unsigned integers • Unparethesized expressions with variable operands and binary operators if variables are formed by l – letter followed by string of string of l,d where d - digit • English sentences with structure <noun phrase><verb phrase><noun phrase> with • Lexical categories : d – determiner, a – adjective, n – noun, x – adverb, v - verb
Regular Set Examples • Signed and unsigned integers • ({} {+} {-}){d}{d}* • Expressions without parentheses • (({l}({l} {d})*)(({+} {*})(({l}({l} {d})*))* • Sentences • ({d}{a}*{n})(({} {x}){v})({d}{a}*{n})
Regular Expressions • The set of strings which begin with an “a” and end with a “b” is a regular set over {a,b} since it equals {a}({a} {b})*{b}. • Regular expressions represent regular sets as follows: • , λ and a represent , {λ} and {a}. • If u and v are regular expressions (representing reguar sets) then (u v), (uv) and (u*) are regular expressions representing their union, concatenation and Kleene closure. • Dropping superfluous parentheses, a(a,b)*b represents the regular set: all strings starting with a and ending with b.
Grammars A context free grammar G is a 4-tuple : G = ( V,,P,S ) where 1.V is a set of nonterminals (or string variables), each representing a sublanguage from which the variable takes its values. Examples are <noun phrase> which can take on values such as “the big box” and T which can take on string values used to represent products in an algebraic expression. 2. is a finite alphabet. Examples are the English vocabulary (consisting of over a hundred thousand words, each treated as an atomic symbol). Another example is the printable ASCII character set. The binary alphabet consists of {0,1}. The alphabet contains the symbols from which language strings are formed.
Grammars Continued 3.P is a finite set of productions or rules used to define the sublanguages represented by the nonterminals. In a context free grammar, a rule has the format A X where A V and X ( V )* . The interpretation is that the strings in the sublanguage represented by A can be constructed according to the format indicated by X. For a terminal character in X, the terminal character is used in the A string and for a variable in X, a string in the sublanguage is substituted for the variable. Examples are <noun phrase> <determiner> <adj-list> <noun> and T a * T. 4.S is a designated variable (referred to as the start symbol or the head of the language). It represents the language being defined by the grammar G.
Grammar Examples • Signed and unsigned integers • Unparethesized expressions with variable operands and binary operators if variables are formed by l – letter followed by string of string of l,d where d - digit • English sentences with structure <noun phrase><verb phrase><noun phrase> with • Lexical categories : d – determiner, a – adjective, n – noun, x – adverb, v - verb
Grammar Examples • Signed and unsigned integers • I SD, S + | - | , D dD, D d • Unparethesized expressions with variable operands and binary operators if variables are formed by l – letter followed by string of string of l,d where d – digit • E VE, E V, V lU, U lU, U dU, U • English sentences with structure <noun phrase><verb phrase><noun phrase> with • Lexical categories : d – determiner, a – adjective, n – noun, x – adverb, v - verb
Grammars and Derivations Derivations If u,v are strings in ( V )* , A is in V and A X is in P, then uAv uXv , referred to as uAv “derives” uXv by application of the rule A X. For repeated applications of 0 or more rules, the symbol * is used. Language Definition The language L(G) defined by G is { x | x *, S * x }
Language Definition • Language Definition is a means of specifying which strings belong to the language. Two approaches to language definition are • Acceptive – Given a string, a device specifies whether or not it belongs to the language. • An automaton A which processes a language string x accepts x as belonging to the language if it’s final state belongs to set of legal final states. • A parser constructed from the grammar defining the language accepts the string if it can parse it. • Generative – Given an alphabet, a generative device tells how strings in the language are formed • A language manual which tells how strings are formed can be used to generate language strings. • A grammar is a generative means of specification. Any string which can be derived from the start symbol by applying gramar rules is in the language.
Grammars and Derivations • Derivations If u,v are strings in ( V )* , • A is in V and • A X is in P, • then uAv uXv , referred to as uAv “derives” uXv by application of the rule A X. • For repeated applications of 0 or more rules, the symbol * is used. • Language Definition The language L(G) defined by G is • { x | x *, S * x }
Finite state automata and language recognition d I d S · · F D d d Finite state automaton has = {d,•} , start state S and legal final states I and D. The transition function is represented by above diagram or table below: d • S I F I I D F D D D - Accepts : ddd, d.dd, .ddd Rejects d.dd.d
Automata as Acceptors d I d S · · F D d d • The string • ddd.d produces the state sequence : SIIIDD is accepted in L because the last state D is a legal final state. • The string • .dd produces the state sequence : SFD is accepted because D is legal. • The string • ddd produces the state sequence : SIII is accepted because I is legal
Parsing • Given a Grammar G with distinguished nonterminal S and a string X over the alphabet, does S * X? • Parsing attempts to find a sequence of rules by which • S * X
Parse tree for d d . d d d I d I d I • D d D d D d Grammar for Decimal Numbers I d I I d I • D D d D D d A parse tree has intermediate nodes for nonterminals, a child node for each RHS character in the production used to replace the nonterminal, a leaf node for each character in the language string produced by the derivation. The language is the set of strings for which there exist parse trees.