510 likes | 985 Views
COGN1001: Introduction to Cognitive Science Topics in Computer Science Formal Languages and Models of Computation. Qiang HUO Department of Computer Science The University of Hong Kong (E-mail: qhuo@cs.hku.hk). Outline. What is a Formal Language? Phrase-Structure Grammars
E N D
COGN1001: Introduction to Cognitive ScienceTopics in Computer Science Formal Languages and Models of Computation Qiang HUO Department of Computer Science The University of Hong Kong (E-mail: qhuo@cs.hku.hk)
Outline • What is a Formal Language? • Phrase-Structure Grammars • Finite State Automata • Formal languages and Models of Computation
Natural Language vs. Formal Language • Natural language: written and/or spoken languages in the world, suchas Chinese, English, Japanese, German, French, Spanish, etc. • Syntax • Semantics • Formal language: a language specified by a well-defined set of rules ofsyntax. • A study of formal languages is important to computer science. • For example,we need to understand what kind of statements are acceptable in the Cprogramming language. This is the task of acompiler of a programminglanguage.
Formal Language • We will describe the sentences of a formal language using agrammar. • How can we determine whether a combination of words is a validsentence in a formal language? • How can we generate the valid sentences of a formal language? • We will only be interested in thesyntax, not thesemantics(meaning), of alanguage.
a sentence is made up of a noun-phrase followed by a verb-phrase; • a noun-phrase is made up of an article followed by an adjective followed by a noun, or • a noun-phrase is made up of an article followed by a noun; • a verb-phrase is made up of a verb followed by an adverb, or • a verb-phrase is made up of a verb; • an article isa, or • an article isthe; • an adjective islarge, • an adjective ishungry; • a noun israbbit, or • a noun ismathematician; • a verb iseats, or • a verb ishops; • an adverb isquickly, or • an adverb iswildly. If we define a subset of English using the list of rules shown here that describe how a validsentence can be produced, how the language looks like?
Example: a Subset of English • From the previous rules we can form valid sentences using a series ofreplacementsuntil no more rules can be used. • For instance, the valid sentencethe large rabbit hops quicklycan beobtained by the following sequence of replacements: • sentence • noun-phrase verb-phrase • article adjective noun verb-phrase • article adjective noun verb adverb • theadjective noun verb adverb • the largenoun verb adverb • the large rabbitverb adverb • the large rabbit hopsadverb • the large rabbit hops quickly • Some other valid sentences: • a hungry mathematician eats wildly • the rabbit eats quickly • An invalid sentence:the quickly eats mathematician
Some Terminologies • Avocabulary(oralphabet)Vis a finite, nonempty set ofelementscalledsymbols. • Aword(orsentence) overVis a string of finite length of elements ofV . • Theempty stringornull string, denoted by , is thestring containingno symbols. • The set of all words (orsentences) over V is denoted by V*. • AlanguageoverVis a subset of V* . • Example: In English, • ThealphabetV consists of English letters and other symbols. • Aword(orsentence) overVis a finite string of symbols. • The meaningful word(orsentence) of English is a subset ofV* .
How to specify a language? • to list all the words (or sentences) in the language; or • to give some criteria that a word (or a sentence) must satisfy to be in the language;or • to specify a language through the use of agrammar, such as the setof rules we gave in the previous example of English subset.
Outline • What is a Formal Language? • Phrase-Structure Grammars • Finite State Automata • Formal languages and Models of Computation
What is a Phrase-Structure Grammar? • Aphrase-structure grammaris G = (V,T,S,P), where • V is a vocabulary; • T is a subset of V consisting of terminal elements (i.e., the elementsof V which can not be replaced by othersymbols); • The elements of N = V–Tare callednonterminal symbols(i.e., the elements ofV which can be replaced by other symbols) • S is astart symbolfrom V (i.e., the element of the V that we alwaysbegin with; • P is a set ofproductions. • We denote by w0w1 the production that specifies that w0 canbe replaced by w1. • Every production in P must contain at least one nonterminal onits left side.
Example: a Phrase-Structure Grammar • G = (V,T,S,P), where • V = { a,the,large,hungry,rabbit,mathematician,eats,hops,quickly,wildly; sentence, noun-phrase, verb-phrase, article,adjective, noun, verb, adverb }; • T = { a,the, large,hungry,rabbit,mathematician,eats,hops,quickly, wildly }; • V–T= { sentence, noun-phrase, verb-phrase, article, adjective, noun, verb, adverb}; • S = sentence; • Production rules:P
P = { sentencenoun-phrase verb-phrase, noun-phrasearticle adjective noun, noun-phrasearticle noun, verb-phraseverb adverb, verb-phraseverb, articlea, articlethe, adjectivelarge, adjectivehungry, nounrabbit, nounmathematician, verb eats, verbhops, adverbquickly, adverbwildly}
Some Terminologies Let G = (V,T,S,P) be a phrase-structure grammar. Let w0 = lz0r andw1 = lz1r be strings over V . • If z0z1 is a production of G, we say that w1 isdirectly derivablefrom w0 and we write w0w1. • Example: theadjective noun verb adverb the largenoun verb adverbbecause adjective large • If w0,w1, … ,wn, n 0, are strings over V such that w0w1, w1w2, … ,wn-1wn, then • we say that wn isderivablefromw0, and • we write w0wn. • The sequence of steps used toobtain wn from w0 is called aderivation.
Example:sentencethe large rabbit hops quickly via the followingderivation: sentence noun-phrase verb-phrase, noun-phrase verb-phrase article adjective noun verb-phrase, article adjective noun verb-phrase article adjective noun verb adverb, article adjective noun verb adverbtheadjective noun verb adverb, the adjective noun verb adverbthe largenoun verb adverb, the largenoun verb adverbthe large rabbitverb adverb, the large rabbitverb adverbthe large rabbit hopsadverb, the large rabbit hopsadverbthe large rabbit hops quickly.
What is the language generated by a Phrase-Structure Grammar? • Let G = (V,T,S,P) be a phrase-structure grammar. • Thelanguage generated byG(or thelanguageofG), denotedby L(G), is the set of all strings of terminals that are derivable fromthe starting symbolS. L(G) = { wT* | Sw }
Example:Suppose G = (V,T,S,P), where V = {a,b,A,B,S}, T = {a,b},S is the start symbol, and P = { SABa, ABB, Bab, ABb }. All the “sentences" (words) generated by this grammar are {abababa, ba}, since S ABa BBBa abababa S ABa ba • Example:Let G be the grammar with V = {S,0,1},T = {0,1}, starting symbol S, and production rules P ={ S11S, S0 }. L(G) = {(11)n0 | n = 0,1,2, …}.
How to construct a grammar that generates a given language? • Example: Find a phrase-structure grammar to generate the set { 0n1n | n = 0,1,2, … } • Solution: G = (V,T,S,P), where V = { S, 0, 1 }, T = { 0,1 }, S isthe start symbol, and P = { S0S1,S }.
How to construct a grammar that generates a given language?? • Example: Find a phrase-structure grammar to generate the set { 0m1n | m,n = 0,1,2, … } • Solution 1:G1 = (V,T,S,P), where V = {S,0,1}, T = {0,1}, Sis the start symbol, and P = { S0S, SS1, S} • Solution 2:G2 = (V,T,S,P), where V = {S,A,0,1}, T = {0,1},S is the start symbol, and P = { S0S, S1A, S1, A1A, A1, S } Two grammars can generate the same language!
How to construct a grammar that generates a given language??? • There are many techniques from thetheory ofcomputationwhichcan be used to systematically constructa grammar for a given formallanguage, but • This is beyond the scope of this course.
Types of Phrase-Structure Grammars (1) • Phrase-structure grammars can be classified according to thetypes of productionsthat are allowed. • Such a classification scheme introduced by NoamChomsky is as follows: • Type 0 grammar: has no restrictions on its production. • Type 1, or context-sensitive, grammar:can haveproductions only of theform • w1 w2, where l(w1) l(w2), or of the form • w1. • Type 2, or context-free grammar:can haveproductions only of the form • A w2, where A is a nonterminal symbol.
Types of Phrase-Structure Grammars (2) • Type 3, or regular grammar:can have productions only of the form • AaB, • Aa, • S , where A and B are nonterminal symbols, S is the start symbol, and ais aterminal symbol. • Alanguagegenerated by a • type 1 grammaris called acontext-sensitive language; • type 2 grammaris called acontext-free language; • type 3 grammaris called aregular language.
Examples • { 0m1n | m,n = 0,1,2, … } is a regular language, since it can begenerated by a regular grammarGwith P: P = { S0S, S1A, S1, A1A, A1, S } • { 0n1n | n = 0,1,2, … }is a context-free language, since it can begenerated by a context-free grammarGwith P: P = { S0S1, S } • { 0n1n2n | n = 0,1,2, … }is a context-sensitive language, since itcan be generated by a type 1 grammar G = (V,T,S,P) withV = {0,1,2,S,A,B}, T = {0,1,2}, starting symbol S, and productions P = { S0SAB, S, BAAB, 0A01, 1A11, 1B12, 2B22 }; but not by any type 2 grammar.
Example: a Phrase-Structure Grammar • G = (V,T,S,P), where • V = { a,the,large,hungry,rabbit,mathematician,eats,hops,quickly,wildly; sentence, noun-phrase, verb-phrase, article,adjective, noun, verb, adverb }; • T = { a,the, large,hungry,rabbit,mathematician,eats,hops,quickly, wildly }; • V–T= { sentence, noun-phrase, verb-phrase, article, adjective, noun, verb, adverb}; • S = sentence; • Production rules:P
P = { sentencenoun-phrase verb-phrase, noun-phrasearticle adjective noun, noun-phrasearticle noun, verb-phraseverb adverb, verb-phraseverb, articlea, articlethe, adjectivelarge, adjectivehungry, nounrabbit, nounmathematician, verb eats, verbhops, adverbquickly, adverbwildly}
Example: Backus-Naur Form • What is the Backus-Naur Form of the grammar for a subset of English described before? <sentence> ::= <noun phrase><verb phrase> <noun phrase> ::= <article><adjective><noun>|<article><noun> <verb phrase> ::= <verb><adverb>|<verb> <article> ::= a | the <adjective> ::= large | hungry <noun> ::= rabbit | mathematician <verb> ::= eats | hops <adverb> ::= quickly | wildly
What is Backus-Naur Form (BNF)? • There is another notation that is used to specify a type 2 (context-free) grammar, called theBackus-Naur Form: • all productions having the same nonterminal as their left-hand sideare combined with the different right-hand sides of these productions, each separated by a bar ( | ), with • nonterminal symbols enclosed in angular brackets (<>), and • the symbol replaced by ::= • Example:The Backus-Naur form for a grammar that produces signedintegers is as follows: <signed integer> ::= <sign><integer> <sign> ::= +|- <integer> ::= <digit>|<digit><integer> <digit> ::= 0|1|2|3|4|5|6|7|8|9
What is a Derivation (or Parse) Tree? • A derivation in the language generated by a context-free grammar can berepresented graphically using an ordered rooted tree, called aderivation (orparse) tree: • the root represents the starting symbol, • internal vertices represent nonterminals, • leaves represent terminals, and • the childrenof a vertex are the symbols on the right side of a production, in order from left to right, where the symbolrepresented bythe parent is on the left-hand side.
Example • Construct a derivation tree for the derivation of the sentence,the hungry rabbit eats quickly, discussed previously.
How to determine whether a string is in the language generated by a context-free grammar? • Top-down parsing: • begins with the starting symbol and proceedsby successively applying productions to see if the given string can bederived. • Bottom-up parsing: • work backwards.
Top-down parsing: S AB S AB CaB S AB CaB cbaB S AB CaB cbaB cbab • Example:Determine whether the word cbab belongs to the L(G), where, G = (V,T,S,P) with V = {a,b,c,A,B,C,S}, T = {a,b,c}, S is the starting symbol, and the productions are S AB A Ca B Ba B Cb B b C cb C b • Bottom-up parsing: Cab cbab Ab Cab cbab AB Ab Cab cbab SAB Ab Cab cbab
Outline • What is a Formal Language? • Phrase-Structure Grammars • Finite State Automata • Formal languages and Models of Computation
Finite State Machines with No Output • Finite-state machines with no output are also calledfinite-state automata. • Finite-state automata do not generate output. But they have a set of specialstates, calledfinal states. • A finite-state automaton is often used for language recognition. • This application plays a fundamental role in the design and construction of compliers for programming languages.
What is a Deterministic Finite-State Automaton? • Afinite-state automatonM = (S,I,f,s0,F) consists of • a finite set S ofstates, • a finiteinput alphabetI, • atransition functionf that assigns a state to every pair of state andinput, • aninitial states0, and • a subset F of S consisting offinal states.
How to represent a Finite-State Automaton? • We can represent a finite-state automaton using either a state table or a state diagram. Final states are indicated in the state diagram by using double circles. • What is the state table of the above finite-state automaton?
What is the language recognized by a given Finite-State Automaton? • Aninput string is recognizedoracceptedby an automaton M if thestring takes the automaton to one of its final states. • The languagerecognized by an automaton M,denoted by L(M), is the set of all strings that are recognized by M. The language recognized by the above finite-state automaton M is L(M) = { 0n,0n10x | n=0,1,2, …, and x is any string }.
DeterministicvsNondeterministicFinite-State Automata • The finite-state automata discussed so far are deterministic, since for eachpair of state and input value there is a unique next state given by the transitionfunction. • There is another important type of finite-state automaton in which there maybe several possible next states for each pair of state and input value. • Suchmachines are callednondeterministic. • Nondeterministic finite-state automataare important in determiningwhich languages can be recognized by a finite-state automaton.
What is a Nondeterministic Finite-State Automaton? • Anondeterministic finite-state automaton M = (S,I,f,s0,F)consists of • a finite set S ofstates, • a finiteinput alphabet I, • atransition function f that assignsa set of statesto each pair ofstate and input, • aninitial state s0, and • a subset F of S consisting offinal states.
How to represent a NondeterministicFinite-State Automaton? • Using a state table:for each pair of state and input value we give a list ofpossible next states. • Using a state diagram: include an edge from each state to all possible nextstates, labelling edges with the input(s) that lead to this transition.
What is the language recognized by a given Nondeterministic Finite-State Automaton? • What does it mean for a nondeterministic finite-state automaton torecognizea string x = x1x2 … xk? • x1 takes the starting state s0 to a set S1of states; • x2 takes each of the states in S1 to a set of states. Let S2 be the union ofthese sets; • Continue this process, including at a stage all states that can be obtained using • a stateobtained at the previous stage and • the current input symbol; • The string x isrecognized oraccepted if there is a final state in the set ofall states that can be obtained from s0using x. • The language recognized by a nondeterministic finite-state automatonis the set of all strings recognized by this automaton.
Example • Determine the language recognized by the nondeterministic finite-state automaton M shown in the following figure. • Solution: L(M) = {0n, 0n01, 0n11 | n=0,1 ,2, … }.
An Important Fact • Theorem: If the languageL is recognized by a nondeterministic finite-state automaton M0, then L is also recognized by a deterministic finite-stateautomaton M1. • Two finite-state automata are calledequivalentif they recognize the samelanguage.
Outline • What is a Formal Language? • Phrase-Structure Grammars • Finite State Automata • Formal languages and Models of Computation
Build an FSA from a Regular Grammar • Suppose that G = (V,T,S,P) is a regular grammar generatingthe set L(G), where each production is of the form S ,Aa, or AaB, with a being a terminal symbol, A and Bare nonterminal symbols. • We can build a nondeterministic finite-state machine M = (S,I,f,s0,F) that recognizes L(G).
M = (S,I,f,s0,F) • S: contains a statesA for each nonterminal symbol A of G,and an additional final state sF ; • The start state s0 is the state formed from the start symbol S; • A transition from sA tosF on input of a is included if Aais a production; • A transition from sAto sB on input of a is included if AaBis a production; • s0 will also be a final state if S is a production. • It can be shown that L(M) = L(G).
Example • Construct a nondeterministic finite-state automaton that recognizes the language generated by the regular grammar G = (V,T,S,P)where • V = {0,1,A,S}, • T = {0,1}, and • the productions in P are S1A,S 0, S, A 0A, A 1A, and A 1.
Construct a Regular Grammar from an FSA • Suppose thatM = (S,I,f,s0,F) is a finite-state machine with the property that s0is never the next state for a transition. • A regular grammar G = (V,T,S,P) can be defined as follows: • V is formed by assigning a symbol to each state of S and eachinput symbol inI; • T is formed from the input symbols in I; • S is the symbol formed from the start state s0; • The set P of productions is formed from the transitions inM: • As a is included if the state s goes to a final state underinput a, where As is the nonterminal symbol formed from s; • As aAt is included if the state s goes to t under input a. • S is included if and only if L(M). • It can be shown that L(G) = L(M).
Example • Find a regular grammar that generates the language recognized by the finite-state automaton shown in the following figure: Soultion:G = (V,T,S,P) where • V = {S,A,B,0,1}, the symbols S,A, andB correspondto the states S0,S1, and S2, respectively; • T = {0,1}; • S is the start symbol; and • The productions are S 0A, S 1B, S 1, S , A 0A, A 1B, A 1, B 0A, B 1B, B 1.
More Powerful Types of Machines (1) • The main limitation of finite-state automata is their finite amount of memory.This prevents them from recognizing languages that are not regular, such as{0n1n|n = 0,1,2,…}. • A more powerful model of computation calledpushdown automatoncan beused to recognize the above language. • Theorem:A set is recognized by apushdown automatonif and only if itis the language generated by acontext-free grammar. • However, there are sets that cannot be expressed as the language generatedby a context-free grammar. One such set is { 0n1n2n|n = 0,1,2, … }.
More Powerful Types of Machines (2) • Actually, there exists an even more powerful machine than pushdown automata, calledlinear bounded automatawhich • can recognizecontext-sensitive languagessuch as the sets { 0n1n2n | n=0,1,2, …}; but they • cannot recognize all the languages generated by phrase-structure grammars. • The most general model of a computing machine is the so-calledTuring Machinewhich can • recognize all languagesgenerated by phrase-structure grammars; • model all the computationsthat can be performed on a computing machine.
Future: Scientists vs Engineers • Scientiststry tounderstand what is . • Engineerstry tocreate what has never been ! • The really great engineers have astrong background in science so that they thoroughly understand what is. • These special people also have to have theimaginationto create whathas never been, and this is what really sets them apart ! • The methodology of engineering research: • There exists some phenomenon of nature for which a model shouldbe found; • The mathematical analysis is just a tool that helps one to find this model; • The results of any analysis should be confirmed by experiments. • Future:What you make it to be !