360 likes | 543 Views
Knowledge Management & Linguistic Pluralism Rajeeva Ratna Shah Secretary Government of India Ministry of Communications & Information Technology Department of Information Technology Secretary@mit.gov.in. A CASE OF COMMUNICATION GAP. Wing Commander to Squadron Leader
E N D
Knowledge Management & Linguistic Pluralism Rajeeva Ratna ShahSecretaryGovernment of IndiaMinistry of Communications & Information TechnologyDepartment of Information TechnologySecretary@mit.gov.in
Wing Commander to Squadron Leader At 9 O'clock tomorrow there will be an eclipse of the Sun, something which does not occur every day. Get the men to fall out in the Lal Bahadur Shastri Marg in their uniform so that they will see this rare phenomenon, and I will explain it to them. In case of rain, we will not be able to see anything, then take the men to the gymkhana.
Squadron Leader to Flying Officer By order of the Wing Commander, tomorrow at 9 O'clock there will be an eclipse of the Sun, if it rains you will not be able to see it from the Lal Bahadur Shastry Marg, So then in uniform, the eclipse of the Sun will take place in gymkhana, something that does not occur every day.
The Flying Officer to Sergeant By order of the Wing Commander in uniform tomorrow at 9 O'clock in the morning, the inauguration of the eclipse of the Sun will take place in the gymkhana. The Wing Commander will give the order if it should rain, something, which occurs everyday. Sergeant to Corporal Tomorrow at nine the Wing Commander in uniform will eclipse the sun in the gymkhana; as it occurs every day, if it is a nice day; if it rains, then in the Lal Bahadur Shastri Marg.
Corporal To Lance Corporal Tomorrow at nine the eclipse of the Wing Commander in uniform will take place because of the Sun. If it rains in the gymkhana, something which does not take place every day, you will fall out in the Lal Bahadur Shastri Marg. COMMENTS AMONG ALL IN THE UNIT Tomorrow, if it rains, it looks as if the sun will eclipse the Wing Commander in the gymkhana. It is a shame that this does not occur every day.
The Broadening sphere of Information Technology DATA INFORMATION KNOWLEDGE Computation Communication Cognition
NETWORKS NEURONS Knowledge of the 21st Century STHULA-JAGAT SOOKSMA-JAGAT Macrocosm Microcosm NANOTECH ATOMS Building Blocks & Knowledge Tools of 21st Century BIOTECH COMPUTERS GENES BITS
Erosion of Knowledge base due to loss of language • Technologies – transformations in the Societies – Increase in Knowledge. BUT…….. • From an estimated 10,000 language in 1900, the world has about 6,700 languages surviving today. • 33% in Asia & 19% in Pacific • Only 50 percent of those surviving ones are being taught to children. • Half the current languages will be effectively extinct within a single generation. • Is there gain in knowledge or loss of Knowledge?
Sprawling digital divide Rough sketch of global digital –divide among script Latin Alphabet users : 39 % of the global population enjoy 84% of access to the Internet Hanzi-Chinese-Ideograph users in China/ Japan/ Korea: 22% in global population enjoy 13% of Internet access Arabic script users: 9% of the population have 1.2 % of the Internet Access Indic scripts users: occupy 22 % of the World population have just 0.3 % of Internet access. Is the technology to divide or to unite?
Exponential Growth Trends in Computer Performance Tera PC 1638400 819200 Doubling every 15 months 409600 204800 100G PC 102400 51200 25600 Doubling every 2 years 10G PC 12800 M 6400 I P 3200 S 1600 GigaPC 800 400 200 100 2014 2015 2016 2018 2017 2019 2020 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 Year
Future Direction : Information Interspace • Third wave in the ongoing evolution of the Global Information Infrastructure • Computing technology will transform the Internet into Interspace. • In future the Information Infrastructure will support semantic indexing and concept navigation across widely distributed community repository. • Concept Navigation will become standard function in the Interspace Document Browsing in INTERNET (1985-2000) Concept Navigation in INTERSPACE (2000-10) E-mail in ARPANET (1965-85)
Linguistic Pluralism in India Eighteen constitutional Indian Languages & their scripts
Data on Information Technology Indicators Country Population in million PPP IT/GDP in percent IT per capita Nominal US$ Tel. Density Tel. per 100 persons PC Penetration per 100 persons Internet Host per 10,000 Persons Mobile phones users per 100 Persons USA 282 34260 5.2 2792.1 66.45 62.25 3714.01 44.42 Germany 82 25010 4.1 1699.9 63.48 33.60 294.58 68.29 France 59 25000 3.8 1706.6 57.35 33.70 132.94 60.53 China 1261 3940 4.9 37.9 13.81 1.93 0.69 11.17 India 1027 2390 3.5 15.4 3.38 0.58 0.81 0.63 Source: ITU-2001 and IMF world economic review 2001.
Language Technology Mission • Vision : Digital unite and knowledge for all. • Mission:Communicating without language barrier & moving up the knowledge chain. • Objectives: • To develop information processing tools to facilitate human machine interaction in Indian languages and to create and access multilingual knowledge resources/content. • To promote the use of information processing tools for language studies and research. • To consolidate technologies thus developed for Indian languages and integrate these to develop innovative user products and services.
Major Initiatives • Knowledge Resources • (Parallel Corpora, Multilingual Libraries/Dictionaries, lexical resources) • 2. Knowledge Tools • (Portals, Language Processing Tools, Translation Memory Tools) • 3. Translation Support Systems • (Machine Translation, Multilingual Information Access, Cross Language Information Retrieval) • 4. Human Machine Interface System • (OCR, Voice Recognition Systems, Text-to-Speech System) • 5. Localization • (Adapting IT Tools and solutions in Indian Languages) • 6. Language Technology Human Resource Development • (in NLP & Computational Linguistics) • 7. Standardization • (ISCII, Unicode, XML, INSFOC, MPEG, Terminology, etc.)
Industry Involvement Through CoIL-tech To catalyze the Language Technology innovation and productization in industry and to foster interaction with academia, MAIT has nucleated a consortium named Consortium on Innovation & Language Technology (COILTech)with members from industry and research organizations.
Major Achievements of TDIL Programme of DIT • OCRs Developed • Hindi • Marathi • Bangla • Tamil • Telugu • Punjabi • (with 97% accuracy) • OCRs under Development • Gujarati • Assamese • Oriya • Malayalam • Hindi • Marathi • Bangla • Tamil (Spell checkers Developed) • Telugu • Punjabi • Malayalam
Machine Aided Translation System (MAT) • The Anglabharati MAT Technology with high accuracy has been developed by IIT Kanpur • Text-to-Speech integrated with MAT system has also been demonstrated • On-line MAT system can be accessed on the web at: www.anglahindi.iitk.ac.in • Speech Recognition • Continuous Speech Recognition System for Hindi is being developed by IBM Research Lab India. • Parallel Corpora • Development of One Million pages Parallel Corpora (Gyan-Nidhi)for knowledge Repository has been undertaken. • The Parallel Corpora can act as a test-bed for the OCR and EBMAT (Example Based Machine Aided Translation) systems.
Language Technology Products in Public Domain For widespread proliferation, a number of the freely downloadable softwares are available on the TDIL web-site: http://tdil.mit.gov.in. These include fonts with Keyboard drivers, e-mail client, bilingual Word processors, Glossaries, Corpora and Classic contents. Open Source Software INDIX (Indian Language Interface) supports Indian languages on Linux. This will ensure affordability of IL software based on Linux. Open Source Software approach will ensure faster localization and low cost software.
Standardisation • Standardization of 8 bit ISCII (Indian Script Standard Code for Information Interchange) was developed in 1988 & is a subset of the Unicode • DIT (Govt. of India) is a voting member of the Unicode consortium • Feedback on revision of UNICODE 3.0 for all Indian languages has been finalised • International UNICODE Conference 2003 in India Proposed • Draft Standard for - • Display codes in the form of INSFOC (Indian Standard for Font Code) is ready • Indian Script to Roman Transliteration (INSROT) is ready • Multi–lingual lexical format has also been proposed
Media Lab Asia Programme - Major Project Areas TOMORROWS TOOLS: PDS for ANMs, water, power, schools, crafts, GIS OLS DIGITAL VILLAGE: Community Connection Village Voice WORLD COMPUTER: Low cost computing devices Linux CE, Village Interfaces, Village Info Systems BITS FOR ALL: Wi-Fi nets, DakNet
Low cost Computing Devices - Choice of Technologies • High-bandwidth option: IEEE 802.11B/802.11A • Typical transfer rates: • 11 Mbps @180m • 1 Mbps @500m • Prices still falling: • Access point, <US$180 • Transceiver, <US$80 • Peer-to-peer supported
Low Cost Computing Devices • Ruggedized terminals with medium functionality and low cost < US$100 (also has smart card port and musical keyboard)
E-Learning Vidya Vahini Gyan Vahini
Proposed Setup for Vidya Vahini INTERNET LANPRINTER SERVER ROUTER SERVER LAN HUB U P S COMPUTER LAB TV
Pilot Project – Vidya Vahini • 200 schools in select districts • Systems at school:- • One Server (P-IV based), 256 MB memory, 40 GB Hard Disk Drive • Network Printer • Multimedia Personal Computer with Web Camera • Colour TV 27”/29” • 2 KVA UPS with 50/30 minutes power back-up • Software (MS Office full suite, Education software with Multi-lingual support, Course Curriculum software, Filtering software and School Administration Software) • Internet access of 128 Kbps – to be increased gradually
Technology Class Room • A classroom in every school will be converted into a Technology Class Room. • The Technology Room will have • 29” Flat-Screen TV connected with a PC which will further be connected to the Server • Computer-aided techniques will be used to impart teaching basic course curriculum • Vidya Vahini Schools as Anchor Schools • Using VSAT based Internet Connectivity (8 mbps) • Using a transceiver, dissemination upto 11 mbps of bandwidth in a radius of 4 to 5 Kms
Training - Teacher Empowerment • The Teacher Empowerment Programme forms the heart of “Vidya Vahini”. The programme covers training of teachers in: • Use of Computers • Effective Teaching Techniques • Creating Lessons • Building Teaching Tools • Usage of Technology in class rooms • Training on Educational Software • 7 Computer Labs equipped with 1 Server, Printer, 10 PCs, TV, Educational software tools are proposed in collaboration with Industry in the Pilot Project
Knowledge Portal A Knowledge Portal will be hosted which have • Education material • Programming Tools • Software tools for teachers • Software tools for students • Language tools • Filtering Software • CBSE Course curriculum • Web pages of all schools • Circulars/Notices/directives issued by the Central Board of Secondary Education and different Boards throughout the country Students will be able to access, harness and manage knowledge through the Portal
Gyan Vahini • Phase I • Set up IT infrastructure and connect all Govt. funded Universities (including Deemed Universities), Engineering Colleges and Medical Colleges in the country • Phase II • Set up IT infrastructure and connect all Polytechnics, Degree and Dental Colleges across the country
Typical Campus Wide Network for High End Institutions INTERNET Switch Switch Compusers Compusers Residential Quarters Existing PBX Internet and LAN Servers Switch Compusers Different Student Hostels Router Cum RAS Central Switch Switch Switch Switch Compusers 100Mbps Computers Fibre Optic Cable Hostel Block Switch Various Departments Switch Switch CAT 5 cabling Computers Switch Computers Computers Administrative Block Computers Cat 5 Cable Academic Block Fibre Optic Cable
New Initiatives under Consideration • e – Content (including Digital Library) • Speech- to- Speech translation • Open source software