580 likes | 699 Views
Formal s yntax and related matters. This sentence is true. This sentence is false. The sentence “This sentence is false.” is not really well-formed: we must distinguish between the language ( blue , in quotes) and the meta-language (black).
E N D
Formal syntax and related matters Syntax etc.
This sentence is true. Syntax etc.
This sentence is false. Syntax etc.
The sentence “This sentence is false.” is not really well-formed: we must distinguish between the language (blue, in quotes) and the meta-language (black). Syntax etc.
The sentence “This sentence is false.” is not really well-formed: we must distinguish between the language (blue, in quotes) and the meta-language (black). For example, there is nothing paradoxical in: “Noun phrase” is a noun phrase. Syntax etc.
Consider the following collection of words: “John” , “Mary” , “loves” , “bad” , “dog” , “hates” , “nice” , “children” Syntax etc.
Consider the following collection of words: “John” , “Mary” , “loves” , “bad” , “dog” , “hates” , “nice” , “children” One might want to construct simple sentences out of these words, by putting them together. Syntax etc.
Consider the following collection of words: “John” , “Mary” , “loves” , “bad” , “dog” , “hates” , “nice” , “children” One might want to construct simple sentences out of these words, by putting them together. For example: “Mary loves bad dog” Syntax etc.
Consider the following collection of words: “John” , “Mary” , “loves” , “bad” , “dog” , “hates” , “nice” , “children” One might want to construct simple sentences out of these words, by putting them together. For example: “Mary loves bad dog” “bad dog hates nice children” Syntax etc.
Consider the following collection of words: “John” , “Mary” , “loves” , “bad” , “dog” , “hates” , “nice” , “children” One might want to construct simple sentences out of these words, by putting them together. For example: “Mary loves bad dog” “bad dog hates nice children” “John Mary dog bad loves nice” Syntax etc.
Consider the following collection of words: “John” , “Mary” , “loves” , “bad” , “dog” , “hates” , “nice” , “children” One might want to construct simple sentences out of these words, by putting them together. For example: “Mary loves bad dog” “bad dog hates nice children” “John Mary dog bad loves nice” Hmm. Apparently not all sequences of words will do. Syntax etc.
Consider the following collection of words: “John” , “Mary” , “loves” , “bad” , “dog” , “hates” , “nice” , “children” One might want to construct simple sentences out of these words, by putting them together. For example: “Mary loves bad dog” “bad dog hates nice children” “John Mary dog bad loves nice” Hmm. Apparently not all sequences of words will do. What is needed is some consideration of the syntax of English sentences: they must be constructed according to certain rules. Syntax etc.
Consider the following collection of words: “John” , “Mary” , “loves” , “bad” , “dog” , “hates” , “nice” , “children” One might want to construct simple sentences out of these words, by putting them together. For example: “Mary loves bad dog” “bad dog hates nice children” “John Mary dog bad loves nice” Hmm. Apparently not all sequences of words will do. What is needed is some consideration of the syntax of English sentences: they must be constructed according to certain rules. Such a collection of rules is called a grammar. Syntax etc.
“John” , “Mary” , “loves” , “bad” , “dog” , “hates” , “nice” , “children” Mary loves bad dog” “bad dog hates nice children” “John Mary dog bad loves nice” What is needed is some consideration of the syntax of English sentences: they must be constructed according to certain rules. Such a collection of rules is called a grammar. A very simple grammar, for a very small subset of English, could look like this: Sentence = NounPhrase Verb NounPhrase . This means that a sentence consists of a noun phrase, followed by a verb, followed by a noun phrase. Syntax etc.
“John” , “Mary” , “loves” , “bad” , “dog” , “hates” , “nice” , “children” Mary loves bad dog” “bad dog hates nice children” “John Mary dog bad loves nice” What is needed is some consideration of the syntax of English sentences: they must be constructed according to certain rules. Such a collection of rules is called a grammar. A very simple grammar, for a very small subset of English, could look like this: Sentence = NounPhrase Verb NounPhrase . This means that a sentence consists of a noun phrase, followed by a verb, followed by a noun phrase. Sentence is being defined here: it is on the LHS of the equality sign. Syntax etc.
“John” , “Mary” , “loves” , “bad” , “dog” , “hates” , “nice” , “children” Mary loves bad dog” “bad dog hates nice children” “John Mary dog bad loves nice” What is needed is some consideration of the syntax of English sentences: they must be constructed according to certain rules. Such a collection of rules is called a grammar. A very simple grammar, for a very small subset of English, could look like this: Sentence = NounPhrase Verb NounPhrase . This means that a sentence consists of a noun phrase, followed by a verb, followed by a noun phrase. Sentence is being defined here: it is on the LHS of the equality sign. The stuff on the RHS forms the definition. Syntax etc.
“John” , “Mary” , “loves” , “bad” , “dog” , “hates” , “nice” , “children” Mary loves bad dog” “bad dog hates nice children” “John Mary dog bad loves nice” What is needed is some consideration of the syntax of English sentences: they must be constructed according to certain rules. Such a collection of rules is called a grammar. A very simple grammar, for a very small subset of English, could look like this: Sentence = NounPhrase Verb NounPhrase. This means that a sentence consists of a noun phrase, followed by a verb, followed by a noun phrase. Sentence is being defined here: it is on the LHS of the equality sign. The stuff on the RHS forms the definition. The rule ends with a period. Syntax etc.
“John” , “Mary” , “loves” , “bad” , “dog” , “hates” , “nice” , “children” Mary loves bad dog” “bad dog hates nice children” “John Mary dog bad loves nice” What is needed is some consideration of the syntax of English sentences: they must be constructed according to certain rules. Such a collection of rules is called a grammar. A very simple grammar, for a very small subset of English, could look like this: Sentence = NounPhrase Verb NounPhrase . NounPhrase = [ Adjective ] Noun . A noun phrase is either an adjective followed by a noun, or just a noun. The square brackets mean that their contents is optional. Syntax etc.
“John” , “Mary” , “loves” , “bad” , “dog” , “hates” , “nice” , “children” Mary loves bad dog” “bad dog hates nice children” “John Mary dog bad loves nice” What is needed is some consideration of the syntax of English sentences: they must be constructed according to certain rules. Such a collection of rules is called a grammar. A very simple grammar, for a very small subset of English, could look like this: Sentence = NounPhrase Verb NounPhrase . NounPhrase = [ Adjective ] Noun . Adjective = “bad” | “nice” . An adjective is either the word “bad” or the word “nice” . The vertical bar separates alternatives. Syntax etc.
“John” , “Mary” , “loves” , “bad” , “dog” , “hates” , “nice” , “children” Mary loves bad dog” “bad dog hates nice children” “John Mary dog bad loves nice” What is needed is some consideration of the syntax of English sentences: they must be constructed according to certain rules. Such a collection of rules is called a grammar. A very simple grammar, for a very small subset of English, could look like this: Sentence = NounPhrase Verb NounPhrase . NounPhrase = [ Adjective ] Noun . Adjective = “bad” | “nice” . An adjective is either the word “bad” or the word “nice” . The vertical bar separates alternatives. The double quotes enclose actual symbols (words) that would appear in a sentence. Technically, these are called terminal symbols (i.e., symbols of the defined language). Syntax etc.
“John” , “Mary” , “loves” , “bad” , “dog” , “hates” , “nice” , “children” Mary loves bad dog” “bad dog hates nice children” “John Mary dog bad loves nice” What is needed is some consideration of the syntax of English sentences: they must be constructed according to certain rules. Such a collection of rules is called a grammar. A very simple grammar, for a very small subset of English, could look like this: Sentence = NounPhraseVerbNounPhrase . NounPhrase = [ Adjective ] Noun . Adjective = “bad” | “nice” . An adjective is either the word “bad” or the word “nice” . The vertical bar separates alternatives. The double quotes enclose actual symbols (words) that would appear in a sentence. Technically, these are called terminal symbols (i.e., symbols of the defined language). The other symbols, such as Sentence, NounPhrase etc. are called nonterminalsymbols (i.e., symbols of the meta-language being used to define the language). Syntax etc.
“John” , “Mary” , “loves” , “bad” , “dog” , “hates” , “nice” , “children” Mary loves bad dog” “bad dog hates nice children” “John Mary dog bad loves nice” What is needed is some consideration of the syntax of English sentences: they must be constructed according to certain rules. Such a collection of rules is called a grammar. A very simple grammar, for a very small subset of English, could look like this: Sentence = NounPhrase Verb NounPhrase . NounPhrase = [ Adjective ] Noun . Adjective = “bad” | “nice” . Verb = “loves” | “hates” . Syntax etc.
“John” , “Mary” , “loves” , “bad” , “dog” , “hates” , “nice” , “children” Mary loves bad dog” “bad dog hates nice children” “John Mary dog bad loves nice” What is needed is some consideration of the syntax of English sentences: they must be constructed according to certain rules. Such a collection of rules is called a grammar. A very simple grammar, for a very small subset of English, could look like this: Sentence = NounPhrase Verb NounPhrase . NounPhrase = [ Adjective ] Noun . Adjective = “bad” | “nice” . Verb = “loves” | “hates” . Noun = ProperName | “dog” | “children” . Syntax etc.
“John” , “Mary” , “loves” , “bad” , “dog” , “hates” , “nice” , “children” Mary loves bad dog” “bad dog hates nice children” “John Mary dog bad loves nice” What is needed is some consideration of the syntax of English sentences: they must be constructed according to certain rules. Such a collection of rules is called a grammar. A very simple grammar, for a very small subset of English, could look like this: Sentence = NounPhrase Verb NounPhrase . NounPhrase = [ Adjective ] Noun . Adjective = “bad” | “nice” . Verb = “loves” | “hates” . Noun = ProperName | “dog” | “children” . ProperName = “John” | “Mary” . Syntax etc.
A very simple grammar, for a very small subset of English, could look like this: Sentence = NounPhrase Verb NounPhrase . NounPhrase = [ Adjective ] Noun . Adjective = “bad” | “nice” . Verb = “loves” | “hates” . Noun = ProperName | “dog” | “children” . ProperName = “John” | “Mary” . Here is an example of a sentence: Sentence NounPhrase Verb NounPhrase Noun ProperName Adjective Noun Mary loves bad dog Syntax etc.
A very simple grammar, for a very small subset of English, could look like this: Sentence = NounPhrase Verb NounPhrase . NounPhrase = [ Adjective ] Noun . Adjective = “bad” | “nice” . Verb = “loves” | “hates” . Noun = ProperName | “dog” | “children” . ProperName = “John” | “Mary” . Here is an example of a sentence: Sentence NounPhrase Verb NounPhrase Noun ProperName Adjective Noun Mary loves bad dog However, we would not be able to construct such a tree for John Mary dog bad loves nice
A very simple grammar, for a very small subset of English, could look like this: Sentence = NounPhrase Verb NounPhrase . NounPhrase = [ Adjective ] Noun . Adjective = “bad” | “nice” . Verb = “loves” | “hates” . Noun = ProperName | “dog” | “children” . ProperName = “John” | “Mary” . Here is another example: Sentence NounPhrase Verb NounPhrase Noun Adjective Noun children loves nice dog Syntax etc.
A very simple grammar, for a very small subset of English, could look like this: Sentence = NounPhrase Verb NounPhrase . NounPhrase = [ Adjective ] Noun . Adjective = “bad” | “nice” . Verb = “loves” | “hates” . Noun = ProperName | “dog” | “children” . ProperName = “John” | “Mary” . Here is another example: Sentence NounPhrase Verb NounPhrase Noun Adjective Noun children loves nice dog So things get complicated rather quickly! Syntax etc.
A very simple grammar, for a very small subset of English, could look like this: Sentence = NounPhrase Verb NounPhrase . NounPhrase = [ Adjective ] Noun . Adjective = “bad” | “nice” . Verb = “loves” | “hates” . Noun = ProperName | “dog” | “children” . ProperName = “John” | “Mary” . Here is another example: Sentence NounPhrase Verb NounPhrase Noun Adjective Noun children loves nice dog So things get complicated rather quickly! We could extend our grammar to handle number, but this would get quite hairy quite soon. It is better to keep this simple context-free grammar, and to enforce contextual constraints – such as maintaining number – by other means. Syntax etc.
Of course, if we want to handle a reasonably complete subset of natural language, we have to take advantage not only of surface syntax (expressed by a context-free grammar) and simple contextual information, but also of usage etc. Syntax etc.
Of course, if we want to handle a reasonably complete subset of natural language, we have to take advantage not only of surface syntax (expressed by a context-free grammar) and simple contextual information, but also of usage etc. Above all, we must be able to figure out and maintain some information about meaning. This is very, very difficult… Syntax etc.
Of course, if we want to handle a reasonably complete subset of natural language, we have to take advantage not only of surface syntax (expressed by a context-free grammar) and simple contextual information, but also of usage etc. Above all, we must be able to figure out and maintain some information about meaning. This is very, very difficult… The task of dealing with syntax is somewhat simpler for artificially constructed languages, such as programming languages. These are constructed with the explicit goal of making it relatively easy to distinguish “sentences” from meaningless jumbles of “words”. Syntax etc.
Of course, if we want to handle a reasonably complete subset of natural language, we have to take advantage not only of surface syntax (expressed by a context-free grammar) and simple contextual information, but also of usage etc. Above all, we must be able to figure out and maintain some information about meaning. This is very, very difficult… The task of dealing with syntax is somewhat simpler for artificially constructed languages, such as programming languages. These are constructed with the explicit goal of making it relatively easy to distinguish “sentences” from meaningless jumbles of “words”. But the question of meaning, i.e., “semantics”, is not simple: we will make a short detour into that area in another lecture. Syntax etc.
We have already seen the general form of a grammar rule, how to distinguish terminal symbols from nonterminal symbols, how to introduce alternatives (“|”) and optional constructs (“[]”). Syntax etc.
We have already seen the general form of a grammar rule, how to distinguish terminal symbols from nonterminal symbols, how to introduce alternatives (“|”) and optional constructs (“[]”). There is also a way to specify constructs that can be repeatedzero or more times: these are enclosed in curly braces (“{}”). For example: Digit = “0” | “1” | “2” | “3” | “4” | “5” | “6” | “7” | “8” | “9” . Number = [ “+” | “–” ] Digit { Digit } . A number begins with an optional sign. It must contain at least one digit, and that digit can be followed by any number of digits. Syntax etc.
We have already seen the general form of a grammar rule, how to distinguish terminal symbols from nonterminal symbols, how to introduce alternatives (“|”) and optional constructs (“[]”). There is also a way to specify constructs that can be repeatedzero or more times: these are enclosed in curly braces (“{}”). For example: Digit = “0” | “1” | “2” | “3” | “4” | “5” | “6” | “7” | “8” | “9” . Number = [ “+” | “–” ] Digit { Digit } . A number begins with an optional sign. It must contain at least one digit, and that digit can be followed by any number of digits. Note that the grammar does not specify the maximum size of an integer: that would be rather difficult to do, and would make the grammar unreadable. It is better to treat that issue separately. Syntax etc.
Digit = “0” | “1” | “2” | “3” | “4” | “5” | “6” | “7” | “8” | “9” . Number = [ “+” | “–” ] Digit { Digit } . We might now want to define the syntax of simple arithmetic expressions: Expression = Term { ( “+” | “–” ) Term } . An expression is a number of terms separated by addition or subtraction operators. Note the use of parentheses to override the precedence of “|”. Syntax etc.
Digit = “0” | “1” | “2” | “3” | “4” | “5” | “6” | “7” | “8” | “9” . Number = [ “+” | “–” ] Digit { Digit } . We might now want to define the syntax of simple arithmetic expressions: Expression = Term { ( “+” | “–” ) Term } . Term = Factor { ( “*” | “div” | “mod” ) Factor } . Similarly, a term is a number of factors separated by multiplicative operators. Syntax etc.
Digit = “0” | “1” | “2” | “3” | “4” | “5” | “6” | “7” | “8” | “9” . Number = [ “+” | “–” ] Digit { Digit } . We might now want to define the syntax of simple arithmetic expressions: Expression = Term { ( “+” | “–” ) Term } . Term = Factor { ( “*” | “div” | “mod” ) Factor } . Factor = Number A factor can be a number… Syntax etc.
Digit = “0” | “1” | “2” | “3” | “4” | “5” | “6” | “7” | “8” | “9” . Number = [ “+” | “–” ] Digit { Digit } . We might now want to define the syntax of simple arithmetic expressions: Expression = Term { ( “+” | “–” ) Term } . Term = Factor { ( “*” | “div” | “mod” ) Factor } . Factor = Number | Identifier … or the name of a variable Syntax etc.
Digit = “0” | “1” | “2” | “3” | “4” | “5” | “6” | “7” | “8” | “9” . Number = [ “+” | “–” ] Digit { Digit } . We might now want to define the syntax of simple arithmetic expressions: Expression = Term { ( “+” | “–” ) Term } . Term = Factor { ( “*” | “div” | “mod” ) Factor } . Factor = Number | Identifier | “(“ Expression “)” . … or an expression in parentheses! Syntax etc.
Digit = “0” | “1” | “2” | “3” | “4” | “5” | “6” | “7” | “8” | “9” . Number = [ “+” | “–” ] Digit { Digit } . We might now want to define the syntax of simple arithmetic expressions: Expression = Term { ( “+” | “–” ) Term } . Term = Factor { ( “*” | “div” | “mod” ) Factor } . Factor = Number | Identifier | “(“ Expression “)” . For example, 7 + ( a * ( b – 3)) – c is an expression. Syntax etc.
Digit = “0” | “1” | “2” | “3” | “4” | “5” | “6” | “7” | “8” | “9” . Number = [ “+” | “–” ] Digit { Digit } . We might now want to define the syntax of simple arithmetic expressions: Expression = Term { ( “+” | “–” ) Term } . Term = Factor { ( “*” | “div” | “mod” ) Factor } . Factor = Number | Identifier | “(“ Expression “)” . Note that the syntax does not express everything: we need some contextual information. For example, if a variable v has been declared to be of type char, then we do not want to allow the expression v div 10 . Syntax etc.
The notation that we have seen here is called EBNF, which stands for Extended Backus-Naur Form. Syntax etc.
The notation that we have seen here is called EBNF, which stands for Extended Backus-Naur Form. The original BNF was applied in the Report on the Algorithmic Language Algol 60. John Backus was an American, considered the inventor of Fortran. Peter Naur was a Dane, the editor of the report. Syntax etc.
The notation that we have seen here is called EBNF, which stands for Extended Backus-Naur Form. The original BNF was applied in the Report on the Algorithmic Language Algol 60. John Backus was an American, considered the inventor of Fortran. Peter Naur was a Dane, the editor of the report. BNF was based on the formalism introduced by Noam Chomsky, who was then a linguist. Syntax etc.
The notation that we have seen here is called EBNF, which stands for Extended Backus-Naur Form. The original BNF was applied in the Report on the Algorithmic Language Algol 60. John Backus was an American, considered the inventor of Fortran. Peter Naur was a Dane, the editor of the report. BNF was based on the formalism introduced by Noam Chomsky, who was then a linguist. BNF did not have constructs for optional or iterated constructs. For example, the rule for Number could be expressed as follows: <number> ::= <sign> <digit> <rest_of_number> <sign> ::= + <sign> ::= – <sign> ::= <rest_of_number> ::= <digit> <rest_of_number> <rest_of_number> ::= Compare this with Number = [ “+” | “–” ] Digit { Digit } . This particular form of EBNF was invented by Niklaus Wirth, and published in 1977. Syntax etc.
Since EBNF is itself a small artificially constructed language, we can use it to express its own grammar! Syntax etc.
Since EBNF is itself a small artificially constructed language, we can use it to express its own grammar! grammar = { production } . Syntax etc.