1 / 18

Languages & Strings

Languages & Strings. String Operations Language Definitions. Strings. A string x (over alphabet A) is a finite sequence x = x 1 x 2 .. x n where x i  A. Length – the length of x is the number of characters, n, in the sequence. Empty String – λ denotes the empty string of length 0.

cathal
Download Presentation

Languages & Strings

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Languages & Strings String Operations Language Definitions

  2. Strings • A string x (over alphabet A) is a finite sequence x = x1x2 .. xn where xi A. • Length – the length of x is the number of characters, n, in the sequence. • Empty String – λ denotes the empty string of length 0. • Recursive definition of the set of strings A* over alphabet A • Basis : The empty string λ A* • Recursive Step : If x  A* and a  A, then xa  A* • Closure : A* contains no other strings

  3. Languages and String Operations • Languages • A language L over alphabet A is any subset of A* • Concatenation : The concatenation of two strings x, y is xy, a string of length of x + length of y. • The concatenation of two languages : The concatenation of two languages L and M is LM, where LM = { z | z = xy where x  L, y  M. • Example: T = D* and O = {“+”,”-”} where D = {0,1,..,9}. Then TOT is the language {“1+1, 12+24, . . .}

  4. Recursive Definition of Regular Sets • Let A be an alphabet. The regular sets over A are: • Basis :  , {λ} and {a} are each regular sets • Recursive Step : If X, Y are regular sets, so is • X  Y • XY • X* • Closure : X is a regular set over A iff it can be obtained by a finite number of applications of the recursive step

  5. Regular Set Examples • Signed and unsigned integers • Unparethesized expressions with variable operands and binary operators if variables are formed by l – letter followed by string of string of l,d where d - digit • English sentences with structure <noun phrase><verb phrase><noun phrase> with • Lexical categories : d – determiner, a – adjective, n – noun, x – adverb, v - verb

  6. Regular Set Examples • Signed and unsigned integers • ({}  {+}  {-}){d}{d}* • Expressions without parentheses • (({l}({l}  {d})*)(({+}  {*})(({l}({l}  {d})*))* • Sentences • ({d}{a}*{n})(({}  {x}){v})({d}{a}*{n})

  7. Regular Expressions • The set of strings which begin with an “a” and end with a “b” is a regular set over {a,b} since it equals {a}({a}  {b})*{b}. • Regular expressions represent regular sets as follows: • , λ and a represent , {λ} and {a}. • If u and v are regular expressions (representing reguar sets) then (u  v), (uv) and (u*) are regular expressions representing their union, concatenation and Kleene closure. • Dropping superfluous parentheses, a(a,b)*b represents the regular set: all strings starting with a and ending with b.

  8. Grammars A context free grammar G is a 4-tuple : G = ( V,,P,S ) where 1.V is a set of nonterminals (or string variables), each representing a sublanguage from which the variable takes its values. Examples are <noun phrase> which can take on values such as “the big box” and T which can take on string values used to represent products in an algebraic expression. 2. is a finite alphabet. Examples are the English vocabulary (consisting of over a hundred thousand words, each treated as an atomic symbol). Another example is the printable ASCII character set. The binary alphabet consists of {0,1}. The alphabet contains the symbols from which language strings are formed.

  9. Grammars Continued 3.P is a finite set of productions or rules used to define the sublanguages represented by the nonterminals. In a context free grammar, a rule has the format A  X where A  V and X  ( V  )* . The interpretation is that the strings in the sublanguage represented by A can be constructed according to the format indicated by X. For a terminal character in X, the terminal character is used in the A string and for a variable in X, a string in the sublanguage is substituted for the variable. Examples are <noun phrase>  <determiner> <adj-list> <noun> and T a * T. 4.S is a designated variable (referred to as the start symbol or the head of the language). It represents the language being defined by the grammar G.

  10. Grammar Examples • Signed and unsigned integers • Unparethesized expressions with variable operands and binary operators if variables are formed by l – letter followed by string of string of l,d where d - digit • English sentences with structure <noun phrase><verb phrase><noun phrase> with • Lexical categories : d – determiner, a – adjective, n – noun, x – adverb, v - verb

  11. Grammar Examples • Signed and unsigned integers • I  SD, S  + | - | , D  dD, D  d • Unparethesized expressions with variable operands and binary operators if variables are formed by l – letter followed by string of string of l,d where d – digit • E  VE, E  V, V  lU, U  lU, U  dU, U   • English sentences with structure <noun phrase><verb phrase><noun phrase> with • Lexical categories : d – determiner, a – adjective, n – noun, x – adverb, v - verb

  12. Grammars and Derivations Derivations If u,v are strings in ( V  )* , A is in V and A  X is in P, then uAv  uXv , referred to as uAv “derives” uXv by application of the rule A  X. For repeated applications of 0 or more rules, the symbol * is used. Language Definition The language L(G) defined by G is { x | x *, S * x }

  13. Language Definition • Language Definition is a means of specifying which strings belong to the language. Two approaches to language definition are • Acceptive – Given a string, a device specifies whether or not it belongs to the language. • An automaton A which processes a language string x accepts x as belonging to the language if it’s final state belongs to set of legal final states. • A parser constructed from the grammar defining the language accepts the string if it can parse it. • Generative – Given an alphabet, a generative device tells how strings in the language are formed • A language manual which tells how strings are formed can be used to generate language strings. • A grammar is a generative means of specification. Any string which can be derived from the start symbol by applying gramar rules is in the language.

  14. Grammars and Derivations • Derivations If u,v are strings in ( V  )* , • A is in V and • A  X is in P, • then uAv  uXv , referred to as uAv “derives” uXv by application of the rule A  X. • For repeated applications of 0 or more rules, the symbol * is used. • Language Definition The language L(G) defined by G is • { x | x *, S * x }

  15. Finite state automata and language recognition d I d S · · F D d d Finite state automaton has  = {d,•} , start state S and legal final states I and D. The transition function is represented by above diagram or table below: d • S I F I I D F D D D - Accepts : ddd, d.dd, .ddd Rejects d.dd.d

  16. Automata as Acceptors d I d S · · F D d d • The string • ddd.d produces the state sequence : SIIIDD is accepted in L because the last state D is a legal final state. • The string • .dd produces the state sequence : SFD is accepted because D is legal. • The string • ddd produces the state sequence : SIII is accepted because I is legal

  17. Parsing • Given a Grammar G with distinguished nonterminal S and a string X over the alphabet, does S * X? • Parsing attempts to find a sequence of rules by which • S * X

  18. Parse tree for d d . d d d I d I d I • D d D d D d Grammar for Decimal Numbers I  d I I  d I  • D D  d D D  d A parse tree has intermediate nodes for nonterminals, a child node for each RHS character in the production used to replace the nonterminal, a leaf node for each character in the language string produced by the derivation. The language is the set of strings for which there exist parse trees.

More Related