670 likes | 816 Views
Would life be easier if. You could tell the computer what you wanted and it understood you (no programming skills required)?You could dictate a letter to the computer, it printed it and then saved it as a file?Having no time to read a 1000 page book, you could ask the computer to summarise it for
E N D
1. Can computers understand our language? Ruslan Mitkov
Research Institute of Information and Language Processing
University of Wolverhampton
2. Would life be easier if You could tell the computer what you wanted and it understood you (no programming skills required)?
You could dictate a letter to the computer, it printed it and then saved it as a file?
Having no time to read a 1000 page book, you could ask the computer to summarise it for you and it produced a one page summary in a few minutes?
You could ask the computer to translate for you a text in Japanese which you did not understand?
3. The Beginning: Machine Translation Weaver (1947):
Source language can be encoded and then decoded into a target language
4. Limitations of the time Computers were slow and unreliable
Programming languages were almost non-existent
There was no adequate theory of language
5. Why did Weaver fail? Weaver would have failed even if he had the supercomputers of today
View was too simplistic
Language is not Mathematics
6. Understanding Language involves processing at various levels morphological (structure of words)
syntactic (structure of sentences)
semantic (meaning of words and sentences)
discourse (topic of sentences, anaphora)
pragmatic (interpretation of utterances in different contexts)
7. Initial research focused on syntax Colourless green ideas furiously dream
Me speaker, you audience
Thank you for coming here today
8. Understanding language requiresnot only linguistic but also extra-linguistic knowledge
9. Bar-Hillel’s famous example The box is in the pen.
I do not believe that machines whose programs do not enable them to learn, in a sophisticated sense of the word, will ever be able consistently to produce high-quality translations
10. The ALPAC Report (1966) “There is no immediate or predictable prospect of useful Machine Translation”
11. Revival in the eighties • Japanese project for computers 5. generation
• Japanese investment in Machine Translation
• In 1985 (Japan alone) 500 000 pages translated by computer!
12. Why is language different from Mathematics? In Mathematics relationships can be formulated in strict theorems.
The sum of the lengths of any two sides of a triangle is greater than the length of the third side.
13. Can we formulate theorems for language? Each word has only one meaning.
Each sentence can be interpreted in only one way.
The utterances produced by humans are sincere.
The size of the vocabulary in English is the linear function y = 2x of the size of the vocabulary in Chinese.
14. LANGUAGE IS IRREGULAR AND AMBIGUOUS
15. Ambiguity of words (Lexical ambiguity) Bank
File
Chair
16. Ambiguity of sentences(Syntactic ambiguity) John saw the man with the telescope.
John saw the man in the park with the telescope.
17. Syntactic ambiguity (continued) Sentences that no human would deem ambiguous can cause problems to computers:
She boarded the airplane with two suitcases
She boarded the airplane with two engines
18. Ambiguity of meaning(Semantic ambiguity) The rabbit is ready for lunch.
We serve only men here.
19. Ambiguity of language use (Pragmatics) You owe me twenty pounds
Fact?
Request?
20. Anaphoric ambiguity John put the vase on the plate and broke it.
The soldiers shot at the women and they fell.
The soldiers shot at the women and they missed.
21. Anaphoric ambiguity (interference of preferences and constraints) (Example Y. Wilks)
Jack drank the wine on the table. It was brown and round.
World War II leaflet (Britain)
If an incendiary bomb drops next to you, don’t lose your head. Put it in a bucket and cover it with sand.
22. Understanding language successfully is not enough ANALYSIS
INFERENCE
GENERATION
23. The production of language is another challenging task
24. The production of language is another challenging task Many lecturers and students attended today's talk. They took part in the discussions.
Many lecturers and students attended today's talk and took part in the discussions.
25. Generation as a selection process Many lecturers and students attended today's talk, taking part in the discussions.
Today's talk was attended by many lecturers and students. They took part in the discussions.
Today's talk was attended by many lecturers and students who took part in the discussions.
26. The computer programs should be able to reason....
27. The computer programs should be able to reason.... Researchers should model the computers so that they can simulate human thinking in a reasonable way and if possible, learn from each conversation.
28. Do we have good speech technology?
29. The problem of resources Designing and developing a program which has a huge amount of knowledge (knowledge base), the ability to understand and produce natural language, to think and to learn, is an extremely difficult, time-consuming and labour-intensive task.
30. The problem of resources The development of a Machine Translation Program at Kyoto University took 200 human years!
31. Any realistic, short-term solution?
32. Machine Translation High quality output translation (sublanguages, controlled languages, post-editing)
Low quality output translation (casual translation)
33. Low quality Machine Translation Gisting (indicative translation)
Web-page translation
Email translation
Chat room translation
34. Machine Translation: when can it be really successful? Restricting the genre, the grammar or the vocabulary
Restricting the role of the computer
35. The Sublanguage Solution Sublanguages are used by people sharing common specialised knowledge. They have restricted vocabulary, word order and avoid ambiguity of meaning.
Sublanguage of weather forecasts
Sublanguage of medical reports
36. METEO: English-to-French Machine Translation METRO TORONTO.
TODAY... MAINLY CLOUDY AND COLD WITH OCCASIONAL
FLURRIES. BRISK WESTERLY WINDS TO 50 KM/H. HIGH
NEAR MINUS 7.
TONIGHT... VARIABLE CLOUDINESS. ISOLATED FLURRIES.
DIMINISHING WINDS. LOW NEAR MINUS 15.
FRIDAY... VARIABLE CLOUDINESS.. HIGH NEAR MINUS 6.
LE GRAND TORONTO.
AUJOURD HUI... GENERALEMENT NUAGEUX ET FROID
AVEC QUELQUES AVERSES DE NIEGE. VENTS VIFS D'OUEST
A 50 KM/H. MAXIMUM D'ENVIRON MOINS 7.
CETTE NUIT.. CIEL VARIABLE. AVERSES DE NIEGE
EPARSES. AFFAIBLISSMENT DES VENTS. MAXIMUM D'ENVIRON
MOINS 15.
VENDREDI... CIEL VARIABLE. MAXIMUM D'ENVIRON MOINS 6
37. The Controlled Language Solution Controlled Language is a specially simplified version of a language which contains short, unambiguous sentences and uses restricted vocabulary.
38. Typical writing rules in a controlled language Keep sentences short (use only simple sentences)
Use only one sense per word
Do not use anaphors
Omit redundant words
39. The Selective Solution Machine-Aided Translation
The translator sends the simple
sentences for translation to the computer
and translates the more difficult, complex
ones him(her)self.
40. Increased efficiency: the Penang experiment Books/manuals averaging about 250 pages translated manually by a translation bureau and by a Machine-Aided Translation program (SISKEP).
Manual translation took 360 hours on average
Translation by a Machine-Aided Translation program needed 200 hours on average.
41. The Human Intervention Solution (Human pre-edits text)
Computer translates
Human post-edits
42. Examples of daily use of MT (with post-editing) EC use SYSTRAN since 1976 (since 1990, around 260 000 pages a year)
SAP uses METAL extensively for translations from German to English (technical texts)
Ericsson Language Services (ELS) use LOGOS for the translation of technical documentation from English to French, Spanish and German
METEO
43. Translation tools: proven track record Dictionary look-up
Term extraction/translation
Translation memory
Bilingual concordancer
44. Memory-based Translation Mainly for professional translators
Uses a database of previously translated texts and compares how much a current sentence matches previously translated ones
Ensures that no sentence need be translated twice
Ensures consistency
Responds to the industrial need for high-quality and ‘high-speed’ translations
45. When to use Translation memory Translation of repetitive texts
Translation of voluminous texts
? most suitable for technical manuals
46. TRADOS A Translation Memory is a linguistic database that collects all your translations and their target language equivalents as you translate.
A Translation Memory is a database that collects all your translations and their target language equivalents as you translate.
Match 87%
linguistic ??linguistische
47. A case study (Webb 1998) Client saves 40% money, 70% time
Translator / translation agency saves 69% money, 70% time
48. Bilingual concordancer A tool that allows the user/translator to see how a word, phrase or technical term is used throughout a source language text and how it has been translated in a target language
Snapshot of ParaConc
49. The Statistical, Corpus-based and "Text as strings" Solution Words, sentences regarded as strings of symbols, without any meaning
50. Lecture (1) A discourse with educational purpose
51. Example (2): Lecture Topic:
Lecturer:
Audience:
Venue:
Date:
Time:
52. Example (3): Lecture Lecture is a sequence of 7 symbols:
The symbols are "L", "e", "c", "t", "u", "r", "e"
53. Simple and sophisticated statistical, corpus techniques Frequency
Pattern matching
Fuzzy matching (Neural network-based matching of strings)
Probabilistic theories (e.g. Baysian approach on the basis of evidence)
54. Language Identification Identifying a language on the basis of frequency and combination of words.
55. Automatic Abstracting The production of an abstract from a longer document
Using surface clues, keywords and statistical frequencies to assign weights to sentences
Sentences with highest aggregate score are extracted as the most important one
56. Literary texts? Statistical methods for identification of authorship
57. What is Computational Linguistics / Natural Language Processing? The study of computer systems for understanding, producing and in general, for processing natural languages
Typical applications:
Machine Translation
Automatic abstracting
Question Answering
Information Extraction
Textual entailment
For more details see Mitkov R. (2003, 2005) The Oxford Handbook of Computational Linguistics, Oxford University Press.
58. My current topics of interest Anaphora resolution
Automatic generation of multiple-choice tests
Automatic identification of cognates and false friends
Centering
Memory-based translation
NLP applications in medicine and education
59. Anaphora resolution Anaphora resolution: the automatic identification of references in text
Examples:
Sophia Loren says she will always be grateful to Bono. The actress revealed that the U2 singer helped her calm down when she became scared by a thunderstorm while travelling on a plane.
If Peter Mandelson had been in Tony Blair’s shoes he would have demanded his resignation the day the Prime Minister forced him to leave the Cabinet.
For more details see Mitkov. R . 2002. Anaphora resolution. Longman.
60. Research Group in Computational Linguistics 1 professor
1 lecturer
4 research fellows
4 PhD students
61. Research topics/projects Anaphora resolution
Text summarisation
Information extraction
Question answering
Term extraction
Multilingual NLP
Lexical acquisition
Translation memory
Corpus construction and annotation
NLP pre-processing tools
Textual entailment
Generation
Machine Translation (resources)
Named Entity Recognition
62. Externally-funded projects BIRD (ESRC-funded)
CAST (AHRB-funded)
Automatic translation of emails (industry-funded)
Automatic generation of multiple-choice tests (NBME-funded)
Projects funded by the British Academy, British Council, international organisations
63. Implementations, resources and demos The Research Group is also well-known for the variety of tools, resources and demos developed
They are available on the web site of the group and can be accessible by all researchers
64. CONCLUSIONS Computers find it very difficult to understand human languages
Practical, less ambitious solutions have proved to be more successful in the short term
Increased interest, growing number of projects and large investments are promising
Computers are getting more and more able in understanding languages
65. RECOMMENDATIONS FOR TRANSLATORS Use MT for casual translation
Use MT for gisting (indicative translation) before you send document to professional translators
Use MT in conjunction with (pre-editing and) post-editing
Use MT in controlled languages
Use MT in sublanguages
Use TM for professional translation of repetitive and voluminous texts
Use translation tools widely in translation projects
66. A FINAL WORD Translators are not an "endangered species“!
Computers are not trying to replace humans. They are just trying to help.
Computers do not have the creativity and imagination of humans. But they are good at routine jobs.
67. Contact details Ruslan Mitkov’s web page: http://www.wlv.ac.uk/~le1825
Research Group’s web page: http://clg.wlv.ac.uk