1 / 24

1st Global WordNet Conference: Uniting Words, Dividing Nations

Join us for the 1st International Conference in association with CIIL-Mysore, IIT-Mumbai, and IIIT-Hyderabad. Explore the power of words to unite people and divide nations, with discussions on linguistic harmony, language technology, and the development of WordNets. Discover the Indian linguistic scene and the initiatives of the Central Institute of Indian Languages (CIIL). Don't miss this opportunity to engage with word-smiths, word-mongers, and word-lords in a global exchange of ideas.

jesseallen
Download Presentation

1st Global WordNet Conference: Uniting Words, Dividing Nations

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. 1st International Conference In association with CIIL-Mysore, IIT-Mumbai, IIIT-Hyderabad

  2. Words unite people. Words can divide nations – they indulge in ‘war of words’… Word-smiths fashion texts Word-mongers talknineteen to the dozen Word-lords don’t tell you that they ‘double-speak’ Word-poets open the inner abyss of lanes & bye-lanes of meaning And so doWordNets Which is why we are all here!

  3. First, I shall tell you a little about what the Indian linguistic scene is like, and what we at CIIL have been doing Then, we will offer our suggestions on what we in India could do in WordNet Welcome to 1st Global WordNet Conference MY ADDRESS HAS TWO PARTS

  4. CENTRAL INSTITUTE OF INDIAN LANGUAGES ^maVr` ^mfm g§ñWmZ{ejm {d^mJ, ^maV gaH$ma Initiatives in LANGUAGE TECHNOLOGY

  5. CIIL in the first three decades: Equipping Language teachers and Analysts technologically

  6. 1. An Apex Institution under Languages Division, MHRD • In July 2001, 32 years completed • This 287-people institution works for development of Indian languages. • CIIL hasfive Centers with Research Groups (16) and Service Groups (6). • 7 Regional Language Centers are at Bhubaneswar, Guwahati, Lucknow, Mysore, Patiala, Pune, & Solan.

  7. 2. Four Main Objectives 1. Develops languagesby creating content, corpus, techniques and technologies. • 2. Protects & Documents Minority & Tribal languages • 3. Creates linguistic harmony by teaching 15 Indian tongues to non-native learners. • 4. Above all, advices both Central and State governments on matters related to language.

  8. Although the mainstay are Indian Languages & Linguistics, the focus of all projects and programmes is on developing materials & products – in print, audio, video and computational. In addition, there is enough interest in Comp. Lit, Education, Language Technology & NLP, Folklore, Geography, Statistics Psychology,Sociology& Translation 3. Functionality and Multi-disciplinarity

  9. 4. Coverage of CIIL - sizable • Archived 118 lgs data • Creating Voice Corpora • Studied 80 Tribal lgs • 35 grammars on-line soon • Published 490 books • Cassette Courses in : Assamese, Urdu, Bengali Kashmiri & Marathi • Radio courses in Hindi through Kannada

  10. 5. Major Publications – 490+ books all produced in-house • 22 Grammars • 30 Intensive Courses • 24 2nd Lg Textbooks • 5 Common Vocab. • 18 Dictionaries • 49 Apni Boli (KVS) • 15 Pictorial Glossaries • 16 Literacy Books • 12 Folklore • 9 Bibliographies 12 Rhymes/Lg Games 16 Proceedings

  11. 6. The Challenge before CIIL: Enormous

  12. A truly plural world of languages • 1,576 rationalized mother-tongues; • 1,796 other mother-tongues; • 114 languages with 10,000+ speakers; • Large variation: Hindi (337 m) to Maram of Manipur with 10,144; • Large non-scheduled lgs - Bhili (6 m) and Santali (5 m); • 146 radio lgs/69 school lgs /35 lg dailies.

  13. 7. Programs - Modes of Delivery • 10 months L2 teaching: 8000 teachers trained • Distance Courses in Tamil/Telugu/Bengali/Urdu • On-line Programs in 15 Indian languages • Kannada for officials in Karnataka • Radio courses with AIR’s collaboration • 3-months Courses in Communication • Orientation for Mother-tongue teachers • Refresher Courses in Linguistics • NLP Training modules

  14. 8. Language Technology –Further Goals • Enlargement of 3-million word Corpora: • 100 m word corpora for Hindi-Urdu • Multilingual multidirectional E- Dictionaries • On-line Administrative Glossaries • Lexical databases for MT Programs • Tagging & Corpus Tools • E-Zines and E-Journals • Language Information Services • Anukriti: Web-based Translation services

  15. 9 IndianLgs & IT at CIIL • 132-node LAN set up • V-SAT through STPI • Brousing centre • Has 2400 E-Journals & 350 paper journals. • Collaborating with Schoolnet for electronic materials • New generation Lg Labs • Focus: Visual Phonetics

  16. 10. LIS-India Website Type Language Name: Type Area Name: • Home or http://www.ciil.org/ • General Information • Language/ Area Profile: Geolinguistic; Sociolinguistic; Cultural; Literary • Language/Area History: Genealogical; Archaeological; Cultural; Textual • Language Vitality: Attitudinal; Utilitarian; Socio-political; Referential • Grammatical Information: Phonetic; Graphemic; Phonological; Morphological; Lexical; Syntactic;Semantic;Stylistic Biblio search

  17. WEB-BASED SERVICE SITEcalledANUKRUTI. To be maintained with NBT/Sahitya Akademi E-journals Technological Tools Electronic lexicon Corpus & tools Parallel corpora Cultural Glossaries Thesauri Word finders WordNets 11. AnukritiA Translation with NBT/SA

  18. Sahitya Akademi Sangeet Natak Academy All India Radio Doordarshan National Library National Archive National Book Trust Major TV Channels Films Division Major Newspaper houses Numerous Foundations Individual writers Heirs of writers Personal libraries Little magazines This rich manuscriptorium will display plural literary and linguistic landscape of India. 12. Bhasha Bharati Project To be set up in collaboration with

  19. 13. Doctoral Programs under planning Already available through 22 Universities: Linguistics & Psychology Now being planned in NLP Folklore/Communication Translation Indian Gram.Tradition

  20. 14. Future Programs • Dip in Experimental Phonetics • Masters by Research in Field Linguistics • Courses in Statistical Linguistics • Diploma in Translation Studies • Dip in Folklore/Comp. Lit. & Semiotics • Internship in Linguistic Geography • Internship in NLP & Corpus Linguistics

  21. WHAT COULD WE DO TO CREATE AN

  22. Working on WordNet, therefore, should come naturally to us. Efforts have already begun as we see in Hindi, Tamil, Oriya and a few other languages. There does not seem to be any academic coordination, however. Early 20th century Indian linguistics was dominated by studies on sound-system and etymologies Mid-20th C focussed on word-formation patterns Late 20th C emphasized on syntax India has already had a strong lexicographical tradition

  23. We haven’t so far worked seriously on Lexical Semantics • While Sociolinguistics was a favourite, serious Psycholinguistics was almost absent • Formal Syntax was highly valued, but intricacies of Semantics were not so attractive. • Making of Dictionaries continued throughout, but major concerted efforts in each language were highly individualistic or had happened long ago. • While writing softwares or applying them means money, and is hence a crowded field, Language Technology has so far been neglected.

  24. So, what do we need to do now? • Create an Indian WordNet Association • Work coordinatedly • Remember to focus on areal semantic features because with so much linguistic & cultural diversity, India is ideal to test and validate the concept of WordNet.

More Related