370 likes | 493 Views
language technology @ciil. language technology @ciil. Prof. Udaya Narayan Singh DIRECTOR. Central Institute of Indian Languages. Set up on July 17, 1969 Located in Mysore, Karnataka. Overall Structure. Functions under the Department of Secondary & Higher
E N D
language technology @ciil language technology @ciil Prof. Udaya Narayan Singh DIRECTOR
Central Institute of Indian Languages Set up on July 17, 1969 Located in Mysore, Karnataka
Overall Structure Functions under the Department of Secondary & Higher Education, Ministry of Human Resource Development Guided by a Governing Committee chaired by the Hon’ble HRM Headed by a Director Assisted by seven Deputy Directors Supported by Seven Principals of RLCs Administered with the help from an Assistant Director (Administration)
Main Objectives • Advices and Assists both Central & State Govts in the matter of language • Promotes all Indian languages by creating content and corpus • Protects and Documents Minor, Minority and Tribal languages
CCCK program for officials in Karnataka • Radio courses in Hindi for listeners • Offers 3-months Courses in Communication • Orientation Courses for Mother-tongue teachers • Refresher Courses under Academic Staff College • Organizes more than 100 Int’l & national • seminars/workshops
Regional Language Centres Promote Linguistic harmony by teaching 15 Indian languages to non-native learners • 10 months L2 teaching: 8000 teachers trained • National Integration Camps and Refresher courses • Distance Courses in Tamil/Telugu/Bengali/Urdu • Originally conceived of only four RLCs in four corners of India with following aims • NRLC at Patiala to handle Kashmiri, Urdu & Panjabi
Regional Language Centres • SRLC, Mysore to handle all four Dravidian languages • WRLC at Pune to handle Marathi, Sindhi & Gujarati • ERLC to handle Oriya Bengali & Assamese • Later two more were added in 1973, UTRC at Solan & in 1981, UTRC at Lucknow. • Latest addition being the NERLC at Gauhati, 1999
Human Resource • Language Specialists 88 • Information Scientists 12 • Hardware Persons 05 • Software Persons 21 • Engineers/LLTs 07 • Supporting Staff 125
Own printing press with all the facilities Published 515 books • 22 Grammars • 30 Intensive Courses • 24 L2-Textbooks • 5 Common Vocab. • 18 Dictionaries • 49 Apni Boli (for KVS) • 15 Pict. Glossaries • 16 Literacy • 12 Folklore • 12 Rhymes/Lg-Games • 18 Proceedings • 9 Bibliographies, etc.
Some other achievements Archived data of 118 languages Studied 80 Tribal/Border languages Cassette Courses in Four Language Kashmiri on the net Link Radio courses in Hindi through Kannada
Hardware • 150-node LAN set up at CIIL and separate 10 node • LANs at NRLC and ERLC • Itanium Web server and database server at CIIL for launching sites • High speed V-SAT connection through STPI • Analog audiotick computerized lab at SRLC and ERLC • Digital audiotick computerized labs at NRLC • 2400 Electronic Journals acquired for CIIL & RLCs • Browsing section in the library
Web based language resources Spoken language corpus Speech Science lab has following Hardware and Software Computerized Speech Lab. Model 4100Developed by: Kay Elemetrics Corp. Lincoln Park, N. J. 07035-1488. Software (dependent on CSL Hardware)
Web based language resources Spoken language corpus 1.Computerized Speech Lab Main Programme Version 2.5.22.Real-Time Spectrogram, Model 5129, Version 2.5.2 3.Video Phonetics Program and Database, Model 5150, Version 2.5.24.Multi-Dimensional Voice Program, Model 5105, Version 2.5.25.Multi-Dimensional Voice Program Advanced, Model 5105, Version 2.5.26.Real-Time Pitch, Model 5121, Version 2.5.27.Analysis Synthesis Laboratory, Model 5104, Version 2.5.2
Web based language resources Spoken language corpus Software (without any hardware dependency) 1.Multi-Speech Signal Analysis Workstation, Model 3700, Version 2.5.22.Real-Time Spectrogram, Model 5129, Version 2.5.2 3.Video Phonetics Program and Database, Model 5150, Version 2.5.24.Real-Time Pitch, Model 5121, Version 2.5.25.Analysis Synthesis Laboratory, Model 5104, Version 2.5.2 CD-ROMSpeech Production and Perception (CD-ROM Developed by Sensimetrics)
Web based language resources Spoken language corpus Branches of study in Speech Science • Articulatory Phonetics • Experimental Phonetics • Biological & Clinical Linguistic • Speech Technology • Forensic Phonetics
Web based language resources Spoken language corpus Phonetic Readers Angami , Ao-Naga , Balti ,Bengali , Brokskat, Gojri , Gujarati ,Kashmiri , Khasi , Kota , Kurux , Kuvi, Ladakhi, Lotha ,Manipuri , Mishmi , Mundari Sema , Shina ,Tangkhul-Naga ,Thaadou ,Tripuri
Web based language resources Spoken language corpus Major Events • International institute of phonetics • Seminar Cum Workshop On Voice Modulation • And Culture • Workshop On Aspiration • Seminar On Voice Quality • Workshop On Nasalization • Workshop On Multilingual Speech Analysis • And Synthesis • Instrumental Analysis Of Phonetic Features Across • Major Indian Languages • Analysis Of Retroflex Sounds etc
Web based language resources Spoken language corpus Training / orientation programmes in phonetics for the teachers from • Tamil Nadu • Uttar Pradesh • Arunachal Pradesh • Bihar • Haryana • Himachal Pradesh • Jammu & Kashmir • Madhya Pradesh • Rajasthan www.ciil-spokencorpus.net
Web based language resources Text corpora in major and minor Indian languages http://www.ciilcorpora.net Web based Indian Languages Grammars http://www.ciilgrammars.org Web based Indian Language Courses http://www.bangla-online.info/ Web based books and journals http://www.ciil-ebooks.net/
Web based Translation services http://www.anukriti.net/ In collaboration with Sahitya Akademi & NBT Eelectronic journal - Translation Today and Tools for translation • Electronic dictionaries • Annotated corpus & tools • Parallel corpora • Translational dictionaries • Cultural Glossaries • Thesauri • Word finders • Technical terminologies
Linguistic Data Consortium for Indian Languages (LDC-IL) Takes advantage of the giant strides in Information Technology Model: Linguistic Data Consortium (LDC) hosted by the University of Pennsylvania, USA. Budget: One crore per year and ten crore for ten years. Funds: by the Ministry of Human Resource Development Preliminary discussion held in: International Workshop on Creation of Linguistic Data Consortium for Indian Languages on August 16-17, 2003. Meeting of the lead institutions to create LDC-IL on August 18, 2003 at IISc, Bangalore.
LDC-IL will focus on: Becoming a repository of linguistic resources in all Indian languages in the form of text, speech and lexical corpora. Facilitating creation of such databases by different member organizations. Setting standards for data collection and storage of corpora for different research and development activities. Supporting development and sharing of tools for data collection and management.
LDC-IL Facilitating training through workshops, seminars etc. in technical as well as process related issues. Creating and maintaining the LDC-IL website that would be the primary gateway for accessing LDC-IL resources. Designing or providing help in creation of appropriate language technology for mass use. Providing the necessary linkages between academic institutions, individual researchers and the masses
LDC-IL Major areas of languages covered: • Speech corpora • Handwritten corpora • Text corpora including parallel corpora • Natural Language Processing • Several by-products like lexicon, thesauri etc.,
LDC-IL Participating Institutions: • Indian Institute of Science, Bangalore, • Indian Institute of Technology, Bombay, • Indian Institute of Technology, Madras, • International Institute of Information Technology, Hyderabad ISI Calcutta; TIFR Mumbai; HP Labs India; BM; C-DOT; C-DAC; Tata InfoTechAll other IITs; KHS; NCPUL; Rashtriya Sanskrit Sansthan; TDIL, MIT
LDC-IL All academic institutes, research organizations and Corporate R&D groups from India and abroad working on Indian languages will be encouraged to participate in LDC-IL.: Different Indian Universities with major departments of Linguistics and computer science/Artificial Intelligence
Web Based Language Information Services • General Information • Language/ Area Profile: • Geolinguistic; Sociolinguistic; Cultural; Literary • Language/Area History: • Genealogical; Archaeological; Cultural; Textual • Language Vitality: • Attitudinal; Utilitarian; Socio-political; • Referential • Grammatical Information: • Phonetic; Graphemic; Phonological; • Morphological; Lexical Syntactic; Semantic; • Stylistic • Biblio search Link to LIS site
Website for Modern Indian Literary Classics in Translation • In collaboration with Sahitya Akademi and NBT • To promote the celebrated Indian fiction writers during • the last 150 years both within • the country and abroad through a series of initiatives. • A library of 100 major contemporary fiction writing • in English and several • Other European languages.
Digital Library and Manu scriptorium Special Library with linguistics and allied disciplines as focus • Over 65000 books • Subscription to over 270 journals • Subscription to 4200 online journals • Back volumes of all the journals • RLC 7 libraries with collection in Indian languages • Has CDs (worth 50 lakhs) in Indian languages • in digital form • Library automation through VTLS package
Bhasa-Bharati will have display galleria as well as • scanned copies of writings. • Audio and video tapes of interviews, • Lectures notes and recordings • Their own as well as professional recitations. • Films and tele-films and serials. • Documentaries.
Website for Modern Indian Literary Classics in Translation Bhasha Bharati will also house and create hyper-texts of Indian languages classics. It will provide a service to common people who may either visit here actually or virtually and seek answers to their questions and queries. It will handle questions on different topics, ranging from knowledge and interpretation of a literary or religious text, or to seek information on a speech group or even on a word or an expression.
Website for Modern Indian Literary Classics in Translation Web based information on Indian Scripts Linguistic Integration Project of India Aim: LIPIKA will promote greater understanding among Indian people, produce useful learning materials, create web-based information. LIPIKA will show unity in India's apparently diverse writing systems. LIPIKA will also help generate softwares with necessary tools like spell-checkers and grammar checkers. 25
Website for Modern Indian Literary Classics in Translation Task .1 Preparation of a brief history of various writing systems of India, such as Brahmi, Kharosthi, etc.; a learners' manual (aimed at both foreigners and Indians) into the structure of syllabic writing systems as prevalent in India, including a comparison of apparently divergent scripts used by Indian languages today.
Task.2 (a) Preparation of a CD/Video version of the Learners' manual, based on the expertise of C-DAC/NCST/CIIL (b) Making the learning software in the public domain, for propagation of Indian writing systems. Task.3 (a) Creation of new fonts and images in respect of Deva-nagari and a few other major Indian writing systems through a series of workshops (i) calligraphists, (ii) print making experts, (iii) computer experts, (iv) creative persons
Website for Modern Indian Literary Classics in Translation Some of the important collaborators of CIIL • All IITs, IIIT Hyderabad, IISc., • Government of Karnataka • Andaman & Nicobar Administration • Government of Singapore • Lancaster University • SASNET • SIDA • MGI-CIIL from Mauritius • SchoolNet • NCPUL and many more
Website for Modern Indian Literary Classics in Translation • Sahitya Akademi • Konkani Academy • Dogri Sansthan • Karnataka Nataka Rangayana • CHD • HP Labs • NSOU • University of Hyderabad • NEHU • Delhi Univ- • NBT Director’s Speech