370 likes | 1.58k Views
Language Processing: Humans & Computer. Psycholinguistics & Computational Linguistics. Lauren Kafka Marina Hamoy August 3, 2006. Psycholinguistics:.
E N D
Language Processing: Humans & Computer Psycholinguistics & Computational Linguistics Lauren Kafka Marina Hamoy August 3, 2006
Psycholinguistics: • The area of linguistics that is concerned with linguistic performance–how we use our linguistic competence–in speech (or sign) production and comprehension.
The Speech Chain: Brain-to-Brain Linking • A spoken utterace starts as a message in the speaker’s brain/mind. • The message is put into linguistic form and interpreted as articulation commands. • It emerges as an acoustic signal. • The signal is processed by the listener’s ear and sent to the brain/mind, where it is interpreted.
Comprehension • One goal of psycholinguistics is to describe the processes people normally use in speaking and understanding language. • Breakdowns in performance such as “tip-of-the-tongue” phenomena, speech errors, and failure to comprehend tricky sentences tell us a lot about how language is processed.
Can you think of any of your own? • Examples of when some word was on the tip-of-your-tongue, but you couldn’t think of it • Speech errors (Hung go) • Failure to comprehend tricky sentences • http://www.zippyvideos.com/5589295543497276/time_out-1/original
Speech Sounds: Understanding Begins with Hearing • Sound is produced whenever there is a disturbance in the position of air molecules. • Acoustic phonetics is concerned only with speech sounds, all of which can be heard by the normal human ear.
Frequency, Pitch & Volume • The speed of the variations of air pressure determines the fundamental frequency of sounds. • This is perceived by the hearer as pitch. • The magnitude, or intensity, of the variations determines the loudness of the sound.
Speech Perception • The speech signal can be broken into strings of: • Phonemes • Syllables • Morphemes • Words • Phrases
Context & Lexical Access • Night rate vs. nitrate depends on context • Meaning of words depends on lexical access or word recognition Example: A sniggle blick is procking a slar. • If you don’t recognize the words, you conclude that the sentence is nonsense.
Lexical Semantics • Processing speech to get at the meaning of what is said requires syntactic analysis as well as knowledge of lexical semantics. • Stress and intonation provide some clues to syntactic structure. Example: He lives in the white house. He lives in the White House. • Loudness, pitch, and duration of syllables provide information about meaning.
Timing & Rhythm • I vant to sock your blut. • Ivan tsuckyour blut. • Ted Koppel gave an address. • Ted Koppel gave Ann a dress. • Can you think of two sentences that include the same letters or sounds, but differ in timing, rhythm, and meaning?
Language Analysis & Computer Technology • Machine translation (MT) • Between natural languages • Analysis of authentic materials • Communication between people & computers • Artificial intelligence (AI) • World Wide Web (www) • Research in linguistic theories
Frequency Analysis • Corpus: ~1M spoken or written language data gathered for linguistic research or analysis • Frequency analysis • SAE: 30% - and, the, to, that, of, a, I, you, it, &know • WAE: 25% - the (7%), of, and, to, a, that, in, is, was, &he • English prepositions WAE (except TO) • Profane/taboo SAE • http://textalyser.net/
Concordance Analysis • http://www.dundee.ac.uk/english/wics/wics.htm
CollocationAnalysis • 2 or more words with customary relationships • http://esl.about.com/library/vocabulary/blcollocation_1.htm
Information Retrieval: WWW • Search engines • Databases • http://www.language-archives.org/index.html • Prevent spammers from scanning your e-mail address by clicking on the active e-mail link & by using a simple JavaScript code
Data Mining • Information extraction using keyword queries • Typical applications: customer profiling, fraud detection, credit risk analysis, promotion evaluation • Norway to Wal-Mart: We don't want your shares - Pension-fund investing with a social consciousness. • Intelligence obtained by applying data mining to a database of French theses on the subject of Brazil
Machine Translation • “There's a message coming through, captain - TRANSLATION SOFTWARE, the science-fiction dream of a machine that understands any language, has taken a step closer to reality.” • http://www.gutenberg.org/etext/6737 free download of literature
Computational Phonetics & Phonology • Computers programmed to produce synthetic speech by following a ‘recipe’ of electronic blending • Speech Recognition • Speech Synthesis • TTS difficulties • > 300 Heteronyms: read [reed] & [red] • Inconsistent spelling: tough, bough, cough, dough
Computational Morphology • Computers need to understand the inter-weaving of rules, exceptions & morpheme & word structure • Computer’s dictionary: morphological forms – needs continual updating • Form predictability: impossible for compounding – sky+box= skybox • Component morpheme • Monomorpheme or not – [reZENT] or [Resent] • Heteronyms - lead [leed] & [led]
Computational Syntax: ELIZA • ELIZA: 1st human-machine communication invented by J Weizenbaum • using syntax (print) simulating a psychiatric session • Circuit-Fix-It-Shop: NCSU & DU repair tech programmed speech • Capable of understanding & speaking complex utterances • Computer parser
References • http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2001T02 • http://www.language-archives.org/index.html • http://www.gutenberg.org/etext/6737 • http://www.nsknet.or.jp/~peterr-s/concordancing/usingconcs.html • www.otal.umd.edu/SHORE2001/ crossLang/index.html • http://www.dundee.ac.uk/english/wics/wics.htm • http://textalyser.net/ • http://www.zippyvideos.com/5589295543497276/time_out-1/original