E N D
1. Anusaaraka: An Approach to Machine Translation Akshar Bharati, Vineet Chaitanya1, Amba kulkarni2
1Chinmaya International Foundation
stationed at Rashtriya Sanskrit Vidyapeetha, Tirupati
vc@iiit.ac.in
2Department of Sanskrit Studies,
University of Hyderabad, Hyderabad
apksh@uohyd.ernet.in
2. Anusaaraka is - An Incremental Machine Translation
Layered output
Successive layers more and more close to MT
4. Machine Translation: Current Trends Techniques being used: Statistical
Statistical methods: Inherent limitation
Can never give a 100% reliable system
End user can never be sure about the Correctness.
Current MT systems CAN NOT give a system for users who want to ACCESS a text in other languages
8. Human beings use World Knowledge,
Context,
Cultural knowledge and
Language conventions
to decipher it.
9. Anusaaraka Generalizes the problem from TRANSLATION to ACCESS
Anusaaraka is a Language Accessor
10. What is an Accessor ? Gist Terminal is a concrete example of SCRIPT ACCESSOR
(Developed by IIT Kanpur, and marketed by C-DAC)
One can access any text in any Indian script through
-- enhanced Devanagari script.
12. Salient Features Faithful representation
Reversibility
No loss of information
Text in other script is accessible with a little extra training
14. Special feature of Anusaaraka Distribute the load between man and machine
Machine Rote Memory + Logic
Man World knowledge, Common sense
Cultural knowledge, Domain knowledge, ......
However, there is a coupling between the two loads.
15. Urdu-Hindi example ??? ??? ?????? ???? ?????? ????? ?? ???? ???
User needs to learn some features of Urdu script
A Typical Urdu text does not contain short vowels
Example: a word 'asii' may be read as usii/isii depending on the context
16. Anusaaraka is A tool for overcoming language barriers
An application of concepts from Panini's ashtadhyayi to contemporary problems.
An exploration of the information dynamics in language
A better approach for building Machine Translation systems
A Workbench for NLP students
An opportunity for the masses to be IT contributors rather than mere IT consumers
22. Technology helps in reducing barriers Example:
Railway Network: Reduces the distance barrier
==>Time to cover the distance is reduced
Since the inception of computers,
Machine Translation is being attempted.
Can computers help us in reducing the language barrier?
26. Anusaaraka Anybody with an aptitude for 'language analysis' can contribute to the development of a Machine Translation system even without any exposure to the formal linguistic training.
37. Language Conventions Vary for Encoding Information Word level:
Labelling and Packaging of concepts
Sentence Level:
Expressing relations between constituent words
42. How is information coded? (Implicitly or Explicitly?) rAma dUdha pIkara skUla gayA
'Ram' 'milk' 'having drunk' 'school' 'went'
Who drank the milk?
I want him to go.
What do I want?
Mohan dropped the melon and burst
Who/What burst?
54. Information Flow For example
He scratched a figure on the rock (made)
She scratched the figure on the rock (erased)
He went to school (simple past)
He went to school everyday (habitual)
I want him to go
I want a pen to write
55. Information Dynamics : Applications Rule Preparation
Psuedo Compounds
Mirror Principle
56. Pseudo Compound For example
Simple noun phrase in English (The black box)
English has post nominal modification (The man in blue shirt )
Adjectives occur prenominally
Adjectives do not inflect
Adjectives cannot occur without a noun, unlike Hindi (lAla ne kAloM ko mArA - )
Adjectives form a separate grammatical category in English
57. Mirror Principle English Hindi word order
The word order of the predicate in Hindi is exactly the opposite of English
I met (the man) in (blue shirt) near (my house)
1 2 3 4 5 6 7 8 9 10
mEM (apane ghara) ke_pAsa (nIlI kamIza) vAle (0 AdamI) se milI
1 9 10 8 6 7 5 3 4 2
This does not work for the adjectives
58. Rule Preparation contd Capturing topic, emphasis, focus etc.
For example,
From where are you coming ?
Where are you coming from ?
Is the preposition stranded to put place emphasis on 'where' ?
60. Rule Preparation contd Are 'subject to subject raising', 'tough movement' etc special devices for 'topicalization' ?
61. Information Dynamics : Applications For Automatic Word Alignment
Match the anusaaraka output at the LWG level with
Hindi translation ignoring certain idiosyncratic postpositions such as Hindi 'ne'
62. Information Dynamics : Applications Use Anusaaraka for
Gradual Progression towards MT
Maintaining Reversibility
63. Anusaaraka Philosophy No Loss of Information
No efforts should go wasted
Users contribute towards the development
66. Anusaaraka is Robust
Clear cut separation of the resources that are in principle reliable from those that involve probabilistic component.
Graceful Degradation
In case of failures it produces a 'rough' translation. It is not 'rough' in the sense that is not accurate or precise, but in the sense that it requires some human effort to understand the text. (compare with 'rough journey' where you are taken to the destination, but the journey is not comfortable.)
67. Anusaaraka is - Completely Transparent
The whole process of Machine Translation is transparent even to a layman
68. Human Understandable Outputs For example
Chunking: Color Scheme
Parsed output: Modifier-Modified Tree
69. Anusaaraka MT Differences
70. Differences contd ...
71. Consequences
72. Consequences contd ...
73. Suitable Environment for Contributors
74. Suitable Environment for Contributors