1 / 82

Anusaaraka: An Approach to Machine Translation

iolana
Download Presentation

Anusaaraka: An Approach to Machine Translation

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


    1. Anusaaraka: An Approach to Machine Translation Akshar Bharati, Vineet Chaitanya1, Amba kulkarni2 1Chinmaya International Foundation stationed at Rashtriya Sanskrit Vidyapeetha, Tirupati vc@iiit.ac.in 2Department of Sanskrit Studies, University of Hyderabad, Hyderabad apksh@uohyd.ernet.in

    2. Anusaaraka is - An Incremental Machine Translation Layered output Successive layers more and more close to MT

    4. Machine Translation: Current Trends Techniques being used: Statistical Statistical methods: Inherent limitation Can never give a 100% reliable system End user can never be sure about the Correctness. Current MT systems CAN NOT give a system for users who want to ACCESS a text in other languages

    8. Human beings use World Knowledge, Context, Cultural knowledge and Language conventions to decipher it.

    9. Anusaaraka Generalizes the problem from TRANSLATION to ACCESS Anusaaraka is a Language Accessor

    10. What is an Accessor ? Gist Terminal is a concrete example of SCRIPT ACCESSOR (Developed by IIT Kanpur, and marketed by C-DAC) One can access any text in any Indian script through -- enhanced Devanagari script.

    12. Salient Features Faithful representation Reversibility No loss of information Text in other script is accessible with a little extra training

    14. Special feature of Anusaaraka Distribute the load between man and machine Machine Rote Memory + Logic Man World knowledge, Common sense Cultural knowledge, Domain knowledge, ...... However, there is a coupling between the two loads.

    15. Urdu-Hindi example ??? ??? ?????? ???? ?????? ????? ?? ???? ??? User needs to learn some features of Urdu script A Typical Urdu text does not contain short vowels Example: a word 'asii' may be read as usii/isii depending on the context

    16. Anusaaraka is A tool for overcoming language barriers An application of concepts from Panini's ashtadhyayi to contemporary problems. An exploration of the information dynamics in language A better approach for building Machine Translation systems A Workbench for NLP students An opportunity for the masses to be IT contributors rather than mere IT consumers

    22. Technology helps in reducing barriers Example: Railway Network: Reduces the distance barrier ==>Time to cover the distance is reduced Since the inception of computers, Machine Translation is being attempted. Can computers help us in reducing the language barrier?

    26. Anusaaraka Anybody with an aptitude for 'language analysis' can contribute to the development of a Machine Translation system even without any exposure to the formal linguistic training.

    37. Language Conventions Vary for Encoding Information Word level: Labelling and Packaging of concepts Sentence Level: Expressing relations between constituent words

    42. How is information coded? (Implicitly or Explicitly?) rAma dUdha pIkara skUla gayA 'Ram' 'milk' 'having drunk' 'school' 'went' Who drank the milk? I want him to go. What do I want? Mohan dropped the melon and burst Who/What burst?

    54. Information Flow For example He scratched a figure on the rock (made) She scratched the figure on the rock (erased) He went to school (simple past) He went to school everyday (habitual) I want him to go I want a pen to write

    55. Information Dynamics : Applications Rule Preparation Psuedo Compounds Mirror Principle

    56. Pseudo Compound For example Simple noun phrase in English (The black box) English has post nominal modification (The man in blue shirt ) Adjectives occur prenominally Adjectives do not inflect Adjectives cannot occur without a noun, unlike Hindi (lAla ne kAloM ko mArA - ) Adjectives form a separate grammatical category in English

    57. Mirror Principle English Hindi word order The word order of the predicate in Hindi is exactly the opposite of English I met (the man) in (blue shirt) near (my house) 1 2 3 4 5 6 7 8 9 10 mEM (apane ghara) ke_pAsa (nIlI kamIza) vAle (0 AdamI) se milI 1 9 10 8 6 7 5 3 4 2 This does not work for the adjectives

    58. Rule Preparation contd Capturing topic, emphasis, focus etc. For example, From where are you coming ? Where are you coming from ? Is the preposition stranded to put place emphasis on 'where' ?

    60. Rule Preparation contd Are 'subject to subject raising', 'tough movement' etc special devices for 'topicalization' ?

    61. Information Dynamics : Applications For Automatic Word Alignment Match the anusaaraka output at the LWG level with Hindi translation ignoring certain idiosyncratic postpositions such as Hindi 'ne'

    62. Information Dynamics : Applications Use Anusaaraka for Gradual Progression towards MT Maintaining Reversibility

    63. Anusaaraka Philosophy No Loss of Information No efforts should go wasted Users contribute towards the development

    66. Anusaaraka is Robust Clear cut separation of the resources that are in principle reliable from those that involve probabilistic component. Graceful Degradation In case of failures it produces a 'rough' translation. It is not 'rough' in the sense that is not accurate or precise, but in the sense that it requires some human effort to understand the text. (compare with 'rough journey' where you are taken to the destination, but the journey is not comfortable.)

    67. Anusaaraka is - Completely Transparent The whole process of Machine Translation is transparent even to a layman

    68. Human Understandable Outputs For example Chunking: Color Scheme Parsed output: Modifier-Modified Tree

    69. Anusaaraka MT Differences

    70. Differences contd ...

    71. Consequences

    72. Consequences contd ...

    73. Suitable Environment for Contributors

    74. Suitable Environment for Contributors

More Related