390 likes | 839 Views
Introduction to Cognitive Science Linguistics Component. Lecture 2 September 22, 2005. (2.00 p.m. – 3.50 p.m.) Venue: Meng Wah Complex Room 324 Lecturer: Dr. A. B. Bodomo Department of Linguistics <abbodomo@hku.hk>. Topic 3: Formal Grammar: Parsing and Generation. Introduction.
E N D
Introduction to Cognitive Science Linguistics Component Lecture 2 September 22, 2005. (2.00 p.m. – 3.50 p.m.) Venue: Meng Wah Complex Room 324 Lecturer: Dr. A. B. Bodomo Department of Linguistics <abbodomo@hku.hk>
Introduction • In my previous lectures, we discussed how tacit linguistic knowledge can be represented at various levels of phonology, morphology, syntax, semantics, pragmatics, and their interfaces, including morphophonology, morphosyntax, and the syntax-semantics interrelationships. • In this lecture, we shall look closely at how these linguistic knowledge representations can be formalised into an algorithm, a computational procedure for processing this linguistic knowledge.
Keywords • Constituent structure rules • initial symbol • terminal symbol • non-terminal symbol • generative grammar • formal grammar
Formal devices and notation • The symbol ‘’ • indicates that a node is ‘rewritten as…’ or ‘consists of ’, or ‘has the constituents…’ • This is used in rewrite rules of the type: • S NP + VP • a sentence, S, has the constituents: noun phrase (NP) and verb phrase (VP) • Optionality in the grammar is expressed as {X, Y}. • This means apply either X or Y but not both
The symbol # is used to indicate constituent boundary e.g. # _ is word initial while_# is word final The notation X (Y) implies that X is obligatory and may be followed by Y Initial symbol: the symbol from which a rewrite rule begins (e.g. S) Terminal symbol: the end symbols from which no constituent structure can be further developed (N, V, Art). All others are non-terminal symbols (e.g. NP, VP). Formal devices and notation (cont’d)
Two main aspects of grammatical information processing:Generatingand Parsing sentences • Before we begin let us illustrate with a simple grammar and lexicon, using the following sentence: • The students greeted the teacher.
Grammar: S NP +VP VP V + NP NP Art + N The students greeted the teacher. • Lexicon 1: • Greeted: V, - NP • Students: N • The: Art • Teacher: N This grammar can also generate (i.e. produce) the following sentences: The teacher greeted the students The teacher scared the students The child ate an apple • But you have to augmenti.e. increase the lexicon as follows: • Lexicon2: • an: Artteacher: Ngreeted: V, -NPthe: Artstudents: Nscared: V, -NPapple: Nate: V, -NPchild: N
Sentence Generation:the algorithm • To produce a sentence we need three things: • A set of phrase structure rules (as illustrated above) • A lexicon (as illustrated above), and • A lexical insertion rule (as explained below) • A lexical insertion rule is an instruction to select the right word from a lexicon • The following is an example of a lexical rule:
Lexical insertion rule • For each terminal symbol of a phrase structure rule, select a word from the lexicon that satisfies the following conditions: • terminal symbol (e.g. N, V) It is a member of the class of • its subcategorization frame matches that of the terminal symbol (e.g. V, _NP). Attach this word as the daughter of this terminal symbol. • The set of rules above constitutes what is known as a sentence generator.
The whole procedure of beginning with an initial symbol and then working through phrase structure rules to adding the lexical items via lexical insertions rules is driven by an algorithm or a set of instructions. • Let us set out an algorithm for the generation (production) of the sentence: The students greeted the teacher, a grammar and a lexicon as follows:
The students greeted the teacher Lexicon1: Greeted: V, - NP Students: N The: Art Teacher: N Grammar: PS Rule (a): S NP +VP PS Rule (b): VP V + NP PS Rule (c): NP Art + N Rule 1Start with the initial symbol, S. Rule 2For every non-terminal symbol, X, find a phrase structure rule with X as left-hand symbol and others as the right hand symbol(s), and develop a rewrite rule with X as the mother and the right hand symbols as ordered daughters. Rule 3Apply rule 2 until all branches end in terminal symbols. Rule 4 Apply lexical rule iteratively until every terminal symbol is replaced by a lexical item.
S VP NP Art N NP V Art N The professor greeted the students Illustrating the algorithm Applying Rule 1 Applying Rule 2,3 Applying Rule 3 Applying Rule 4
From the above we can see that we have started from an initial string and have ended with terminal strings with lexical items as their daughters. A sentence has thus been generated (produced), telling us how this sentence is built up. • Now, let us see how we can begin with an existing sentence and then break it down into its component parts by applying rules.
Sentence parsing: the algorithm • To parse a sentence means to analyse it into its constituent parts by the systematic application of lexical insertion rules and some phrase structure rules. • It is like the reverse process of generation.
Types of Parsing • Top-down: Begin with the symbol S. • Bottom-up: Begin with terminal symbols (words). Possible research: Which types of parsing in natural languages provide the most cognitively realistic and efficient parser?
Some sentence parsing rules which constitute aPARSER • For a sentence, S • Rule 1: Determine from the lexicon the word class of every item and develop a partial tree for each word where the word class label dominates the word. • Rule 2: Find a PS rule of the type X Y, Z and where the right hand symbols match some sequence of categories in the structure so far, and develop a partial tree with X as the mother and the right hand symbols as ordered daughters. • Rule 3: Continue rule 2 until the root, S, is reached and there are no unattached strings.
NP NP Art N V Art N The man drank the tea Art N V Art N The man drank the tea The man drank the tea. Lexicon1: drank: V, - NP man: N the: Art tea: N Grammar: PS Rule1: S NP +VP PS Rule2: VP V + NP PS Rule3: NP Art + N Applying Rule 1 Applying Rule 2
VP NP NP Art N V Art N The man drank the tea S NP VP NP Art N V Art N The man drank the tea Applying Rule 3
Conclusion • Parsing and generation of natural language data is a very important area of linguistics, especially in computer applications of natural languages which has become an important aspect of the computer or information processing industry.
language acquisition innateness hypothesis language faculty / Language Acquisition Device (LAD) literacy levels of literacy literacy acquisition Keywords
Introduction • Theme • A survey of how linguistic knowledge is acquired/learnt by speakers of a language, from the point of view of spoken language and from the point of view of literacy (reading and writing). • Objective • an understanding of the basic terms and issues in language and literacy acquisition • an interface approach: rather than rigidly discussing these issues from language acquisition as separate and different from literacy acquisition, we will look at how language acquisition relates to literacy acquisition.
Gleitman and Bloom 1999:434 ‘refers to the process of attaining a specific variant of human language…the fundamental puzzle in understanding this process has to do with the open-ended nature of what is learned: children appropriately use words acquired in one context to make reference in the next, and they construct novel sentences to make known their changing thoughts and desires’ (in MIT Encyclopedia of the Cognitive Sciences). Crystal 1997: 430 The process of learning a first language in children. The analogous process of gaining a foreign or second language. What is language acquisition?
Explaining how languages are acquired • In previous lectures we have tried to account for how all and only the grammatical sentences of a language are produced and represented in the brain of the speakers of a language. • However, a complete account of linguistic knowledge representation must address the issue of how we acquire a language as children and how we learn foreign languages as adults. • We will mainly be concerned with first language acquisitionandnot foreign language learning.
Stages of language development • the single word stage (12-18 months) • the language of the child consists of just a few isolated words of the target language, e.g. ‘mamma’, ‘daddy’,etc. • very little grammatical development • the grammar stage (19-29 months) • marked by the emergence of a few nominal and verbal inflections in languages that have these. • a few phrases and word utterances apparently strung together: ‘mammy, milk’; ‘daddy bye bye’, etc. • 30 months • can produce more adult-like speech: ‘Where's daddy ?’ ‘Daddy, I want to go with you.’
Explaining language acquisition: • The reason for the uniformity and rapidity in child language acquisition is contained in the innateness hypothesis. • This is, at least, the position of Chomsky and most cognitive approaches to linguistic explanation. • In this hypothesis, language acquisition is determined by a biologically endowed innate language faculty (also called Language Acquisition Device (LAD)). • LAD or language learning ‘program’ in children’s brains provides them with a set of procedures (let us call it an ‘algorithm’ since we are computer/cognitive science inclined) for developing a grammar. • Input: linguistic experience they get from the parents and teachers.
The nature of the language faculty • Children can acquire any language as their native tongue. • e.g. a child of Cantonese speaking parents growing up in England can learn to speak perfect English as her native tongue. • Those aspects of language innately determined are universal • language faculty does not vary significantly from human to human An important aspect in the language faculty is the search for principles of Universal Grammar!
Universal Grammar (UG) • A theory of the human language faculty, i.e. a module of the mind/brain involved in the basic design of language (Noam Chomsky) • It is part of an innate biologically endowed language faculty,an innate mental organ specific to the human species • It allows us to perceive and interpret information governed by certain formal constraints • These formal constraints refer to a system of rules and representations and one of its operations (its grammar) by which the acceptable sentences of a language can be generated • Examples of formal universals, linguistic constraints of an abstract nature: the binding principles determining what can or cannot be the antecedent of an anaphoric, pronominal, or fully referential nominal element, etc.
Literacy Acquisition • Literacy: the ability to read, write and calculate basic numbers • Difficult to define: • can mean different things to different people in different areas: computer literacy, investment literacy, etc. • Is literacy part of our mental, cognitive faculty? • Yes, because any human can acquire literacy i.e. learn how to read, write and calculate basic numbers given the right environment
Levels of Literacy (cf. Stages of language acquisition) • 6 stages of reading (Daswani 1999) • Stages 1-3: Pre-reading, decoding, fluency (approx. grades 1 – 3) • Stage 4: Acquiring new knowledge (approx.grades 4 – 8) • Stage 5: Reading a range of complex materials critically (grades 9 – 12) • Stage 6: Mature reader: able to read for various purposes: professional, personal, civic (university and beyond)
The relationship between language and literacy acquisition • Traditional/historical view of child language acquisition: • learning to speak happens up to the age of five years, while learning to read happens after five. • Now they are seen as very intertwined i.e. very related: learning to speak and learning to be literate both deal with learning to use language • the basis of learning to speak has been outlined to provide an ecology for literacy. The most important lesson is that learning to speak and learning to read are very much interwoven.
Evidence of the interface of language and literacy acquisition • They are both part of learning to USE language. • Both need input from the environment. • can be compared with Vygotsky's idea of ZOPED, zone of proximal development, i.e. the distance between child initiative and ability of child to do things under the influence of parental support. • The learning environment: participants, situation, activity and a mechanism • Literacy acquisition is like language acquisition (cf. Givon's idea of literacy acquisition as a weak reflex of language acquisition). • Literacy is best acquired in a language one has acquired.
Conclusion • Literacy (reading and writing) is then another level/kind of linguistic knowledge representation. • Spoken and written linguistic knowledge representation interface with each other and are very intertwined. • Language and literacy acquisition have very important social, educational and cognitive implications. • Language and Literacy acquisition should therefore form an integral part of cognitive science.
References • David Barton. 1994. The roots of literacy. Literacy: An Introduction to the Ecology of Written Language. Oxford UK and Cambridge USA: Blackwell. Chapter 9, p.130-139. • C. J. Daswani. 1999. Literacy. In Bernard Spolsky (ed) 1999. Concise Encyclopedia of Educational Linguistics. Oxford: Elsevier Science Ltd.. • Viv Edwards and David Corson (eds.) 1997. Encyclopedia of Language and Education, Volume 2: Literacy. Netherlands: Kluwer Academic Publishers. • Talmy Givon. 1998. The grammar of Literacy. In Syntaxis, 1, 1998: 1-40. • Elfrieda Hierbert. 1994. Literacy in preschool programs. In Alan C. Purves et al.(eds.) 1994. Encyclopedia of English Studies and Language Arts. New York: Scholastic. 754-756. • Ernest Lepore and Zenon Pylyshyn (eds). 1999. What Is Cognitive Science. Blackwell Publishers. (especially chapters 10, 11, 12, and 13) • Neil Stillings and others. 1995. Cognitive Science: An Introduction. MIT Press. (especially chapters 6, 9, 10, and 11) • Daniel A. Wagner. 1994. Literacy: definitions. In Alan C. Purves et al.(eds.) 1994. Encyclopedia of English Studies and Language Arts. New York: Scholastic. 748-752. • R. Wilson and Frank C. Neil (eds.) 1999. The MIT Encyclopedia of the Cognitive Sciences. MIT Press. • Lila Gleitman and Paul Bloom. Language Acquisition. p.434-438 • David Olson. Literacy. p.481-482
Tentative List of research topics for Cognitive Science Students • Supervisor: Dr. Adams BODOMO (abbodomo@hku.hk) • Topics in Syntax: Theory, Description and Application • Building human language components in Computational Systems • The LFG treatment of serial verbs, Complex Predicates, and other verbal constructions in various languages: French, Norwegian, Japanese, Chinese, Dagaare, etc • Topics in Language and Literacy as cognitive processes • Chinese writing and computer technology: Survey and evaluation of various inputting systems. • New forms and functions of language and literacy in the age of Information technology (emails, ICQ, bulletin boards, mobile phone texting,etc).:A survey of SMS texting as a cognitive and communicative process in HK • The grammar of aphasic patients
Further studies - courses by Dr Bodomo • LING1002 - Language.com: Language in the Contemporary World (1st year undergraduate, co-taught with other staff members) • LING2011 - Language and Literacy in the Information Age • LING2032 - Syntactic Theory • LING2018 - Lexical-Functional Grammar • LING2041 - Language and Information Technology • LING2050 – Grammatical Description • LING2051 – French Syntax and Universal Grammar • Also consider B.A. in Human Language Technology (HLT) as an option for a minor
Take-home Quiz • Please submit your answers to your tutor on or before September 22, 2005.