1 / 33

Computational Language

Computational Language. Andrew Hippisley. Computational Language. Computational language and AI Language engineering: applied computational language Case study: spell checkers. Computational language & AI. Artificial Intelligence:

vidor
Download Presentation

Computational Language

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Computational Language Andrew Hippisley

  2. Computational Language • Computational language and AI • Language engineering: applied computational language • Case study: spell checkers

  3. Computational language & AI • Artificial Intelligence: “the simulation on computer of distinctly human mental functions.” Wilks (1993)

  4. Computational language & AI • Language integral to intelligent systems • Artificial Intelligence • Turing Test • ELIZA

  5. ELIZA

  6. Computational language & AI • Why language engineering? • Language integral to intelligent systems • Artifiicial Intelligence • Turing Test • ELIZA • Expert systems: natural language interface, natural language database

  7. Computational language & AI • Methods shared across systems • Finite State Transition Networks (FSTN) • Logic • Formal rules • Probability • Data: you know it!

  8. Applied computational language • History of the field • Machine Translation: 1960, 1966, post 1966 • Database access • Text interpretation • Information retrieval • Text categorisation

  9. Language engineering • Information overload • Need a way of automatically processing text documents • Information extraction

  10. Language engineering • Information extraction • GIDA: system for automatically monitoring financial market sentiment

  11. GIDA

  12. Language engineering • Information overload • Need a way of automatically processing text documents • Information extraction • Summarisation

  13. Automatic summarisation(courtesy of Paulo FERNANDES de OLIVEIRA, PhD) • Get information source; • Extract some content from it; • Present the most important part to the user xx xxx xxxx x xx xxxx xxx xx xxx xx xxxxx x xxx xx xxx xx x xxx xx xx xxx x xxx xx xxx x xx x xxxx xxxx xxxx xx xx xxxx xxx xxx xx xx xxxx x xxx xx x xx xx xxxxx x x xx xxx xxxxxx xxxxxx x x xxxxxxx xx x xxxxxx xxxx xx xx xxxxx xxx xx x xx xx xxxx xxx xxxx xx xxx xx xxx xxxx xx xxx x xxxx x xx xxxx xx xxx xxxx xx x xxx xxx xxxx x xxx x xxx xx xx xxxxx x x xx xxxxxxx xx x xxxxxx xxxx xx xx xxxxx xxx xx xxx xx xxxx x xxxxx xx xxxxx x

  14. Lexical Cohesion Links Example Sentence 23: J&J's stock added 83 cents to $65.49. Sentence 15: "For the stockmarket this move was so deeply discounted that I don't think it will have a major impact". Sentence 26: Flagging stockmarkets kept merger activity and new stock offerings on the wane, the firm said. Sentence 42: Lucent, the most active stock on the New York Stock Exchange, skidded 47 cents to $4.31, after falling to a low at $4.30. Text title: U.S. stocks hold some gains. Collected from Reuters’ Website on 20 March 2002.

  15. Lexical Cohesion Bonds Example 17. In other news, Hewlett-Packard said preliminary estimates showed shareholders had approved its purchase of CompaqComputer -- a result unconfirmed by voting officials. 19. In a related vote, Compaqshareholders are expected on Wednesday to back the deal, catapulting HP into contention against International Business Machines for the title of No. 1 computer company. Text title: U.S. stocks hold some gains. Collected from Reuters’ Website on 20 March 2002.

  16. Language engineering • Information overload • Need a way of automatically processing text documents • Information extraction • Summarisation • Translation • Retrieve only relevant documents • Voice processing

  17. Language engineering • Two main approaches • Symbolic • Stochastic

  18. Case study spell checkers

  19. Spelling dictionaries • aim? • given a sequence of symbols: • 1. identify misspelled strings • 2. generate a list of possible ‘candidate’ correct strings • 3. select most probable candidate from the list

  20. Spelling dictionaries • Implementation: • Probabilistic framework • bayesian rule • noisy channel model

  21. Spelling dictionaries • Types of spelling error • actual word errors • non-word errors

  22. Spelling dictionaries • Types of spelling error • actual word errors • /piece/ instead of /peace/ • /there/ instead of /their/ • non-word errors

  23. Spelling dictionaries • Types of spelling error • actual word errors • /piece/ instead of /peace/ • /there/ instead of /their/ • non-word errors • /graffe/ instead of /giraffe/

  24. Spelling dictionaries • Types of spelling error • actual word errors • /piece/ instead of /peace/ • /there/ instead of /their/ • non-word errors • /graffe/ instead of /giraffe/ • of all errors in type written texts, 80% are non-word errors

  25. Spelling dictionaries • non-word errors • Cognitive errors • /seperate/ instead of /separate/ • phonetically equivalent sequence of symbols has been substituted • due to lack of knowledge about spelling conventions

  26. Spelling dictionaries • non-word errors • Cognitive errors • Typographic (‘typo’) errors • influenced by keyboard • e.g. substitution of /w/ for /e/ due to its adjacency on the keyboard • /thw/ instead of /the/

  27. Spelling dictionaries • non-word errors • noisy channel model • The actual word has been passed through a noisy communication channel • This has distorted the word, thereby changing it in some way • The misspelled word is the distorted version of the actual word • Aim: recover the actual word by hypothesising about the possible ways in which it could have been distorted

  28. Spelling dictionaries • non-word errors • noisy channel model • What are the possible distortions? • insertion • deletion • substitution • transposition • all of these viewed as transformations that take place in the noisy channel

  29. Spelling dictionaries • Implementing spelling identification and correction algorithm

  30. Spelling dictionaries • Implementing spelling identification and correction algorithm • STAGE 1: compare each string in document with a list of legal strings; if no corresponding string in list mark as misspelled • STAGE 2: generate list of candidates • Apply any single transformation to the typo string • Filter the list by checking against a dictionary • STAGE 3: assign probability values to each candidate in the list • STAGE 4: select best candidate

  31. Spelling dictionaries • STAGE 3 • prior probability • given all the words in English, is this candidate more likely to be what the typist meant than that candidate? • P(c) = c/N where N is the number of words in a corpus • likelihood • Given, the possible errors, or transformation, how likely is it that error y has operated on candidate x to produce the typo? • P(t/c), calculated using a corpus of errors, or transformations • Bayesian rule: • get the product of the prior probability and the likelihood • P(c) X P(t/c)

  32. Spelling dictionaries • non-word errors • Implementing spelling identification and correction algorithm • STAGE 1: identify misspelled words • STAGE 2: generate list of candidates • STAGE 3a: rank candidates for probability • STAGE 3b: select best candidate • Implement: • noisy channel model • Bayesian Rule

  33. Next week Finite state machines and regular expressions

More Related