240 likes | 263 Views
Join us for the 1st International Conference in association with CIIL-Mysore, IIT-Mumbai, and IIIT-Hyderabad. Explore the power of words to unite people and divide nations, with discussions on linguistic harmony, language technology, and the development of WordNets. Discover the Indian linguistic scene and the initiatives of the Central Institute of Indian Languages (CIIL). Don't miss this opportunity to engage with word-smiths, word-mongers, and word-lords in a global exchange of ideas.
E N D
1st International Conference In association with CIIL-Mysore, IIT-Mumbai, IIIT-Hyderabad
Words unite people. Words can divide nations – they indulge in ‘war of words’… Word-smiths fashion texts Word-mongers talknineteen to the dozen Word-lords don’t tell you that they ‘double-speak’ Word-poets open the inner abyss of lanes & bye-lanes of meaning And so doWordNets Which is why we are all here!
First, I shall tell you a little about what the Indian linguistic scene is like, and what we at CIIL have been doing Then, we will offer our suggestions on what we in India could do in WordNet Welcome to 1st Global WordNet Conference MY ADDRESS HAS TWO PARTS
CENTRAL INSTITUTE OF INDIAN LANGUAGES ^maVr` ^mfm g§ñWmZ{ejm {d^mJ, ^maV gaH$ma Initiatives in LANGUAGE TECHNOLOGY
CIIL in the first three decades: Equipping Language teachers and Analysts technologically
1. An Apex Institution under Languages Division, MHRD • In July 2001, 32 years completed • This 287-people institution works for development of Indian languages. • CIIL hasfive Centers with Research Groups (16) and Service Groups (6). • 7 Regional Language Centers are at Bhubaneswar, Guwahati, Lucknow, Mysore, Patiala, Pune, & Solan.
2. Four Main Objectives 1. Develops languagesby creating content, corpus, techniques and technologies. • 2. Protects & Documents Minority & Tribal languages • 3. Creates linguistic harmony by teaching 15 Indian tongues to non-native learners. • 4. Above all, advices both Central and State governments on matters related to language.
Although the mainstay are Indian Languages & Linguistics, the focus of all projects and programmes is on developing materials & products – in print, audio, video and computational. In addition, there is enough interest in Comp. Lit, Education, Language Technology & NLP, Folklore, Geography, Statistics Psychology,Sociology& Translation 3. Functionality and Multi-disciplinarity
4. Coverage of CIIL - sizable • Archived 118 lgs data • Creating Voice Corpora • Studied 80 Tribal lgs • 35 grammars on-line soon • Published 490 books • Cassette Courses in : Assamese, Urdu, Bengali Kashmiri & Marathi • Radio courses in Hindi through Kannada
5. Major Publications – 490+ books all produced in-house • 22 Grammars • 30 Intensive Courses • 24 2nd Lg Textbooks • 5 Common Vocab. • 18 Dictionaries • 49 Apni Boli (KVS) • 15 Pictorial Glossaries • 16 Literacy Books • 12 Folklore • 9 Bibliographies 12 Rhymes/Lg Games 16 Proceedings
A truly plural world of languages • 1,576 rationalized mother-tongues; • 1,796 other mother-tongues; • 114 languages with 10,000+ speakers; • Large variation: Hindi (337 m) to Maram of Manipur with 10,144; • Large non-scheduled lgs - Bhili (6 m) and Santali (5 m); • 146 radio lgs/69 school lgs /35 lg dailies.
7. Programs - Modes of Delivery • 10 months L2 teaching: 8000 teachers trained • Distance Courses in Tamil/Telugu/Bengali/Urdu • On-line Programs in 15 Indian languages • Kannada for officials in Karnataka • Radio courses with AIR’s collaboration • 3-months Courses in Communication • Orientation for Mother-tongue teachers • Refresher Courses in Linguistics • NLP Training modules
8. Language Technology –Further Goals • Enlargement of 3-million word Corpora: • 100 m word corpora for Hindi-Urdu • Multilingual multidirectional E- Dictionaries • On-line Administrative Glossaries • Lexical databases for MT Programs • Tagging & Corpus Tools • E-Zines and E-Journals • Language Information Services • Anukriti: Web-based Translation services
9 IndianLgs & IT at CIIL • 132-node LAN set up • V-SAT through STPI • Brousing centre • Has 2400 E-Journals & 350 paper journals. • Collaborating with Schoolnet for electronic materials • New generation Lg Labs • Focus: Visual Phonetics
10. LIS-India Website Type Language Name: Type Area Name: • Home or http://www.ciil.org/ • General Information • Language/ Area Profile: Geolinguistic; Sociolinguistic; Cultural; Literary • Language/Area History: Genealogical; Archaeological; Cultural; Textual • Language Vitality: Attitudinal; Utilitarian; Socio-political; Referential • Grammatical Information: Phonetic; Graphemic; Phonological; Morphological; Lexical; Syntactic;Semantic;Stylistic Biblio search
WEB-BASED SERVICE SITEcalledANUKRUTI. To be maintained with NBT/Sahitya Akademi E-journals Technological Tools Electronic lexicon Corpus & tools Parallel corpora Cultural Glossaries Thesauri Word finders WordNets 11. AnukritiA Translation with NBT/SA
Sahitya Akademi Sangeet Natak Academy All India Radio Doordarshan National Library National Archive National Book Trust Major TV Channels Films Division Major Newspaper houses Numerous Foundations Individual writers Heirs of writers Personal libraries Little magazines This rich manuscriptorium will display plural literary and linguistic landscape of India. 12. Bhasha Bharati Project To be set up in collaboration with
13. Doctoral Programs under planning Already available through 22 Universities: Linguistics & Psychology Now being planned in NLP Folklore/Communication Translation Indian Gram.Tradition
14. Future Programs • Dip in Experimental Phonetics • Masters by Research in Field Linguistics • Courses in Statistical Linguistics • Diploma in Translation Studies • Dip in Folklore/Comp. Lit. & Semiotics • Internship in Linguistic Geography • Internship in NLP & Corpus Linguistics
Working on WordNet, therefore, should come naturally to us. Efforts have already begun as we see in Hindi, Tamil, Oriya and a few other languages. There does not seem to be any academic coordination, however. Early 20th century Indian linguistics was dominated by studies on sound-system and etymologies Mid-20th C focussed on word-formation patterns Late 20th C emphasized on syntax India has already had a strong lexicographical tradition
We haven’t so far worked seriously on Lexical Semantics • While Sociolinguistics was a favourite, serious Psycholinguistics was almost absent • Formal Syntax was highly valued, but intricacies of Semantics were not so attractive. • Making of Dictionaries continued throughout, but major concerted efforts in each language were highly individualistic or had happened long ago. • While writing softwares or applying them means money, and is hence a crowded field, Language Technology has so far been neglected.
So, what do we need to do now? • Create an Indian WordNet Association • Work coordinatedly • Remember to focus on areal semantic features because with so much linguistic & cultural diversity, India is ideal to test and validate the concept of WordNet.