540 likes | 1.32k Views
Introduction to Computational Linguistics . Misty Azara. Agenda . Introduction to Computational Linguistics (CL) Common CL applications Using CL in theoretical linguistics (computational modeling). What is Computational Linguistics?. CL is interdisciplinary Linguistics Computer Science
E N D
Introduction to Computational Linguistics Misty Azara
Agenda • Introduction to Computational Linguistics (CL) • Common CL applications • Using CL in theoretical linguistics (computational modeling)
What is Computational Linguistics? • CL is interdisciplinary • Linguistics • Computer Science • Mathematics • Electrical Engineering • Psychology • Speech and Hearing Science
What is Computational Linguistics? • Computational Linguistics covers many areas • Essentially, CL is any task, model, algorithm, etc. that attempts to place any type of language processing (syntax, phonology, morphology, etc.) in a computational setting
Core Areas of CL • Machine Translation • Speech Recognition • Text-to-Speech • Natural Language Generation • Human-Computer Dialogs • Information Retrieval • Computational Modeling …
Machine Translation Using computers to automate some or all of translating from one language to another
Three general models or tasks: • Tasks for which a rough translation is adequate • Tasks where a human post-editor can be used to improve the output • Tasks limited to a small sublanguage
Machine Translation (cont.) • Linguistic knowledge is extremely useful in this area of CL • MT benefits from knowledge of language typology and language-specific linguistic information
Speech Recognition Taking spoken language as input and outputting the corresponding text
Architecture • SR takes the source speech and produces “guesses” as to which words could correspond to the source via some type of acoustic model • The word with the highest probability is selected as the optimal candidate
Why use SR? • Allow for hands-free human-computer interaction
Text-to-Speech Taking text as input and outputting the corresponding spoken language
Three types of TTS • Articulatory- models the physiological characteristics of the vocal tract • Concatenative- uses pre-recorded segments to construct the utterance(s)
Three types of TTS (cont.) • Parametric/Formant- models the formant transitions of speech [baj]
Why is TTS so difficult? • Spelling • through, rough • Homonyms • PERmit (n) vs. perMIT (v) • Prosody • Pitch, duration of segments, phrasing of segments, intonational tune, emotion “I am so angry at you. I have never been more enraged in my life!!”
Why use TTS? • Allows for text to be read automatically • Extremely useful for the visually impaired
Natural Language Generation Constructing linguistic outputs from non-linguistic inputs
Natural Language Generation • Maps meaning to text • Nature of the input varies greatly from one application to another (i.e documenting structure of a computer program) • The job of the NLG system is to extract the necessary information to drive the generation process
NLG systems have to make choices: • Content selection- the system must choose the appropriate content for input, basing its decision on a pre-specified communicative goal • Lexical selection- the system must choose the lexical item most appropriate for expressing a concept
Sentence Structure • Aggregation- the system must apportion the content into phrase, clause, and sentence-sized chunks • Referential expression- the system must determine how to refer to the objects under discussion (not a trivial task)
Discourse structure- many NLG systems have to deal with multi-sentence discourses, which must have a coherent structure
Sample NLG output To save a file 1. Choose save from the file menu 2. Choose the appropriate folder 3. Type the file name 4. Click the save button The system will save the document. …
Human-Computer Dialogs Uses a mix of SR, TTS, and pre-recorded prompts to achieve some goal
Human-Computer Dialogs • Uses speech recognition, or a combination of SR and touch tone as input to the system • The system processes the spoken information and outputs appropriate TTS or pre-recorded prompts
Dialog systems have specific tasks, which limit the domain of conversation • This makes the SR problem much easier, as the potential responses become very constrained
Sample dialog system for banking … Sys: would you like information for checking or savings? User: Checking, please. Sys: Your current balance is $2,568.92. Would you like another transaction? User: Yes, has check #2431 cleared? …
Linguistic knowledge in dialog systems • Discourse structure- ensuring natural flowing discourse interaction • Building appropriate vocabularies/lexicons for the tasks • Ensuring prosodic consistencies (i.e. questions sound like questions and spliced prompts sound continuous)
Why use human-computer systems? • Automate simple tasks- no need for a teller to be on the other end of the line! • Allow access to system information from anywhere, via the telephone
Information Retrieval Storage, analysis, and retrieval of text documents
Information Retrieval • Most current IR systems are based on some interpretation of compositional semantics • IR is the core of web-based searching, i.e. Google, Altavista, etc.
Information Retrieval Architecture • User inputs a word or string of words • System processes the words and retrieves documents corresponding to the request
“Bag of Words” • The dominant approach to IR systems is to ignore syntactic information and process the meaning of individual words only • Thus, “I see what I eat” and “I eat what I see” would mean exactly the same thing to the system!
Linguistic Knowledge in IR • Semantics • Compositional • Lexical • Syntax (depending on the model used)
Computational Modeling Computational approaches to problem solving, modeling, and development of theories
How can we use computational modeling? • Test our theories of language change~ synchronic or diachronic • Develop working models of language evolution • Model speech perception, production, and processing • Almost any theoretical model can have a computational counterpart
Why Use Computational Modeling? • Forces explicitness – no black boxes or behind the scenes “magic” • Allows for modeling that would otherwise be impossible • Allows for modeling that would otherwise be unethical
Conclusions • CL applications utilize linguistic knowledge from all of the major subfields of theoretical linguistics • Computational modeling can aid linguists’ theories of language processing and structure