240 likes | 366 Views
LING 388: Language and Computers. Sandiway Fong Lecture 1: 8/23. Administrivia. Where CCIT 309 When TR 3:45-5:00PM Tuesday/Thursday September 6-8th (no class: week of Labor Day) Thursday November 24th (no class: Thanksgiving) Office Hours TR 5:15-6:15PM (after class)
E N D
LING 388: Language and Computers Sandiway Fong Lecture 1: 8/23
Administrivia • Where • CCIT 309 • When • TR 3:45-5:00PM • Tuesday/Thursday September 6-8th • (no class: week of Labor Day) • Thursday November 24th • (no class: Thanksgiving) • Office Hours • TR 5:15-6:15PM (after class) • Other times by appointment • Location: Douglass 308 (Linguistics)
Administrivia • Map • Classroom (CCIT) • Office (Douglass)
Administrivia • Email: • sandiway@email.arizona.edu • Homepage: • http://dingo.sbs.arizona.edu/~sandiway • Lecture powerpoint slides: • available on homepage after each class • in both ppt and Adobe PDF formats • slides from last two years available online • caution: there will be changes from last year
Administrivia • Tips on how to take this class • No required textbook • save time • suggested readings on request • Lecture slides contain everything you need to know in order to do the homeworks • To understand the slides, you need to attend classes to “grok” the concepts • Unclear on something? • You are encouraged to ask questions in or after class • Ask while the question is still fresh in your mind • Have an idea, want to go over some of the material again, or have more in-depth questions? • Office hours • Make an appointment
Administrivia • Course Objectives • Theoretical • Introduction to natural language processing techniques • Practical • Be able to write a natural language grammar that runs on a computer • Get an idea of what’s hard and what’s easy to do on a computer
Administrivia • Laboratory Exercises • Some lectures will be laboratory sessions • 50/50 lecture/exercises on the computer in class • Homework questions will be handed out in these sessions • Homework questions are designed to continue the lab exercises • You may do the homework exercises on your own computer or at the computer laboratory
Administrivia • Grading • 6 (approx.) homeworks + 1 final “take-home” exam • Homeworks are due 1 week from the date (at midnight) they are handed out • Homeworks must be submitted by email • You may discuss the homeworks with your classmates • However, you must do the work and write them up independently • If you use sources, e.g. online, you must acknowledge them • Cheaters will be sanctioned
Administrivia • Homework tips • Homeworks are based on lab exercises • make sure you show up for the lab lectures • Possible time-saving strategy: Stay on after the lecture and do the homework questions right there • exercises are fresh in your mind • may even be possible to complete the homework in an hour right there … • Nightmare strategy: Wait until the evening homework is due, scratch your head over the lecture notes, have tons of questions and start panicking • your computer crashes, the net goes down …
Administrivia • Late Policy • All homeworks are mandatory • deduction if handed in late • You must schedule a meeting with me and explain • Upcoming Emergencies • Must let me know ahead of time or as soon as you can
Administrivia • Homework Disaster Policy • You “tank” on a homework • do badly or way worse than you expected • Schedule a meeting • What are your options? • 1: there are always extra credit questions to bump your score back up • Chances to demonstrate you really knew the material well • 2: final exam will re-test you on the areas covered • 2nd chance • Philosophy • You are not penalized for learning or making an unfortunate mistake
Administrivia • Fill out form to be passed out • Name • Email • Year • Major • Why are you interested in computers and language? • Relevant background
Natural Language Processing (NLP)Human Language Technology (HLT)Computational Linguistics • Question: • How to process natural languages on a computer • Intersects with: • Computer science (CS) • Mathematics/Statistics • Artificial intelligence (AI) • Linguistic Theory • Psychology: Psycholinguistics • e.g. the human sentence processor
Applications • Information retrieval • information is stored and accessed using language (keywords etc.) • document classification (email, news) • Machine translation • babelfish • http://babelfish.altavista.com/ • Google • Language Comprehension • document summarization • Speech • automated 800 toll-free directory (800 555 1212) • cellphones (handsfree dialing) • car navigation (voice-synthesized directions)
Applications • technology is still in development • computers can’t really understand language (yet) • see babelfish or google webpage translation • well, it’s free! • even if we are willing to pay... • machine translation has been worked on since after World War II (1950s) • still not perfected today • why? • what are the properties of human languages that make it hard?
Natural Language Properties • Which ones are going to be difficult for computers to deal with? • Grammar (Rules for putting words together into sentences) • How many rules are there? • 100, 1000, 10000, more … • Portions learnt or innate • Do we have all the rules written down somewhere? • Lexicon (Dictionary) • How many words do we need to know? • 1000, 10000, 100000 …
Computers vs. Humans • Knowledge of language • Computers are way faster than humans • They kill us at arithmetic and chess • But human beings are so good at language, we often take our ability for granted • Processed without conscious thought • Do pretty complex things
Examples • Knowledge • Which report did you file without reading? • (Parasitic gap sentence)
Examples • Changes in interpretation • John is too stubborn to talk to • John is too stubborn to talk to Bill
Examples • Ambiguity • Where can I see the bus stop? • stop: verb or part of the noun-noun compound bus stop • Context (Discourse or situation)
Examples • Ungrammaticality • *Which book did you file the report without reading? • * = ungrammatical • relative • ungrammatical vs. incomprehensible
Example • The human parser has quirks • Ian told the man that he hired a story • Ian told the man that he hired a secretary • Garden-pathing • Temporary ambiguity • tell: someone something vs. …
Examples • More subtle differences • The reporter who the senator attacked admitted the error • The reporter who attacked the senator admitted the error • Processing time • Subject vs. object relative clauses • Q: Do we want to mimic the human parser completely?
Next time … • We will begin by gently introducing you to a programming language you will become familiar with • Two lectures • Name: PROLOG • Variant: SWI-PROLOG (free software) • Download: http://www.swi-prolog.org/ • Based on logic • “Natural” and easy to learn but powerful • Contains lots of nifty built-in features for writing grammars • language was originally designed for this purpose