1 / 27

CS 124/LINGUIST 180 From Languages to Information

CS 124/LINGUIST 180 From Languages to Information. Dan Jurafsky Stanford University Introduction and Course Overview. What this course is about. Automatically extracting meaning and structure from: Natural language text Speech Web pages Social networks (and other networks)

anahid
Download Presentation

CS 124/LINGUIST 180 From Languages to Information

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CS 124/LINGUIST 180From Languages to Information Dan Jurafsky Stanford University Introduction and Course Overview

  2. What this course is about • Automatically extracting meaning and structure from: • Natural language text • Speech • Web pages • Social networks (and other networks) • Genome sequences

  3. Commercial World • Lots of exciting stuff going on…

  4. Question Answering: IBM’s Watson

  5. Information Extraction and Sentiment Analysis • http://www.bing.com/search?q=canon+powershot&go=&form=QBLH&qs=n • Sentiment analysis • Attribute detection • Relation extraction

  6. Sentiment • Emotional Spell Check • New York Times “10 big ideas of 2010” • http://video.nytimes.com/video/2010/12/15/magazine/1248069422438/emotional-spell-check.html?scp=1&sq=emotional%20spell%20check&st=cse

  7. Blog Analytics • Data-mining of blogs, discussion forums, message boards, user groups, and other forms of user generated media • Product marketing information • Political opinion tracking • Social network analysis • Buzz analysis (what’s hot, what topics are people talking about right now).

  8. Livejournal.com: I, me, my on or after Sep 11, 2001 Cohn, Mehl, Pennebaker. 2004. Linguistic markers of psychological change surrounding September 11, 2001. Psychological Science 15, 10: 687-693. Graph from Pennebaker slides

  9. September 11 LiveJournal.com study: We, us, our Cohn, Mehl, Pennebaker. 2004. Linguistic markers of psychological change surrounding September 11, 2001. Psychological Science 15, 10: 687-693. Graph from Pennebaker slides

  10. Machine Translation • Helping human translators • Fully automatic Enter Source Text:  这 不过 是 一 个 时间 的 问题 . Translation from Stanford’s Phrasal: This is only a matter of time.

  11. Google Translate • Fried ripe plantains: • http://laylita.com/recetas/2008/02/28/platanos-maduros-fritos/

  12. Information Extraction Event: Curriculum mtg Date: Jan-16-2012 Start: 10:00am End:11:30am Where:Gates 159 Subject: curriculum meeting Date: January 15, 2012 To: Dan Jurafsky Hi Dan, we’ve now scheduled the curriculum meeting. It will be in Gates 159 tomorrow from 10:00-11:30. -Chris Create new Calendar entry

  13. Pictures from SerafimBatzoglou Intron 1 Intron 2 5’ 3’ Exon 3 Exon 1 Exon 2 Splice sites Stop codon TAG/TGA/TAA Start codon ATG Computational Biology: Finding Genes

  14. Slide stuff from SerafimBatzoglou Computational Biology: Comparing Sequences AGGCTATCACCTGACCTCCAGGCCGATGCCC TAGCTATCACGACCGCGGTCGATTTGCCCGAC -AGGCTATCACCTGACCTCCAGGCCGA--TGCCC--- | | | | | | | | | | | | | x | | | | | | | | | | | TAG-CTATCAC--GACCGC--GGTCGATTTGCCCGAC • Sequence comparison is key to • Finding genes • Determining function • Uncovering the evolutionary processes

  15. Ambiguity • Resolving ambiguity is a crucial goal throughout string and language processing

  16. Ambiguity • Find at least 5 meanings of this sentence: • I made her duck

  17. Ambiguity • Find at least 5 meanings of this sentence: • I made her duck • I cooked waterfowl for her benefit (to eat) • I cooked waterfowl belonging to her • I created the (plaster?) waterfowl she owns • I caused her to quickly lower her head or body • I waved my magic wand and turned her into undifferentiated waterfowl

  18. Ambiguity is Pervasive • I caused her to quickly lower her head or body • Syntactic category: “duck” can be a Noun or Verb • I cooked waterfowl belonging to her. • Syntactic category: “her” can be a possessive (“of her”) or dative (“for her”) pronoun • I made the (plaster) duck statue she owns • Word Meaning : “make” can mean “create” or “cook”

  19. Ambiguity is Pervasive • Grammar: makecan be: • Transitive: (verb has a noun direct object) • I cooked [waterfowl belonging to her] • Ditransitive: (verb has 2 noun objects) • I made [her] (into) [undifferentiated waterfowl] • Action-transitive (verb has a direct object + verb) • I caused [her] [to move her body]

  20. Ambiguity is Pervasive: Phonetics!!!!! • I mate or duck • I’m eight or duck • Eye maid; her duck • Aye mate, her duck • I maid her duck • I’m aid her duck • I mate her duck • I’m ate her duck • I’m ate or duck • I mate or duck

  21. Why else is natural language understanding difficult? segmentation issues non-standard English idioms Great job @justinbieber! Were SOO PROUD of what youve accomplished! U taught us 2 #neversaynever & you yourself should never give up either♥ dark horse get cold feet lose face throw in the towel the New York-New Haven Railroad the New York-New Haven Railroad tricky entity names world knowledge neologisms unfriend Retweet bromance Where is A Bug’s Life playing … Let It Be was recorded … … a mutation on the for gene … Mary and Sue are sisters. Mary and Sue are mothers. But that’s what makes it fun!

  22. Making progress on this problem… • The task is difficult! What tools do we need? • Knowledge about language • Knowledge about the world • A way to combine knowledge sources • How we generally do this: • probabilistic models built from language data • P(“maison”  “house”) high • P(“L’avocatgénéral”  “the general avocado”) low • Luckily, rough text features can often do half the job.

  23. Models • Finite state machines • Markov models • Alignment models • Genome alignment • Alignment of sentence in L1 to sentence in L2 • Alignment of text to speech • Vector space model of IR • Network models

  24. Dynamic Programming • Don’t do the same work over and over. • Avoid this by building and making use of solutions to sub-problems that must be invariant across all parts of the space. • Minimum Edit Distance • The Viterbi Algorithm • Baum-Welch/Forward-Backward • (In parsing: CKY, Earley, charts, etc)

  25. Machine Learning • Machine learning based classifiers that are trained to make decisions based on features extracted from the context • Simple Classifiers: • Naïve Bayes • Decision Trees • Sequence Models: • Hidden Markov Models • Maximum Entropy Markov Models • Conditional Random Fields

  26. Course logistics in brief • Instructor: Dan Jurafsky • TAs: Leon Lin, Robin Melnick, Evan Rosen, Alden Timme, Adam Vogel • Time: TuTh 9:30-10:45, Braunlec • Requirements: • Online Video Lectures with embedded quizzes • Homeworks: In Java or Python • Online Review Exercises • Final Exam • Class sessions: • Tuesdays: Discussions/Guest Lectures • Thursdays: Open group working hours

  27. Overview of the course • http://cs124.stanford.edu

More Related