160 likes | 181 Views
Explore aAQUA, an online forum answering grassroots questions, bridging ICT gaps with multilingual, multimedia support. Learn about its technical perspective and Unicode encoding.
E N D
BuildingDatabase-backended Multilingual, Multimedia Data Repositories:The aAQUA Experience
Introduction • aAqua’s (almost All questions answered) • An online forum for answering questions from the grassroots by the experts in the field. • Bridges gaps in use of ICT • Usability • Availability • Multi-Linguality • Multi-media Support • Multi-Lingual Storage and Retrieval • Reusability
aAqua Server aAqua Offline aAqua Server Mobile network Internet HTTP Crop Doctor aAqua Bhav Puchiye Keyword Browser SMS Crop Recommendation aAqua Mobile Gateway
aAQUA- a technical perspective • Employs three tier web architecture • Uses mvnforum which is based on the MVC architecture. • Lucene used as search engine. • Compatible with any servlet container which supports JSP1.2 and Servlet2.3 • Runs on tomcat • Works with unicode UTF-8 compliant Oracle 9i as well as mysql database • Is integrated with open source digital library software
Query in Hindi UNL graph UNL Document UNL Document Result in Hindi UNL Document Multi-lingual Storage and Retrieval …The plants blossom but the flowers scorch… and(blossom(icl>develop(obj>thing)):0S.@entry.@custom, scorch(icl>dry(obj>thing)):2E.@contrast.@custom) obj(blossom(icl>develop(obj>thing)):0S.@entry.@custom, plant(icl>organism):04.@def.@pl) obj(scorch(icl>dry(obj>thing)):2E.@contrast.@custom, flower(icl>reproductive structure):1P.@pl.@def) “flowers Scorch” Info repository
Unicode • Computers store letters and other characters by assigning a number for each. • Hundreds of different encoding systems for assigning these numbers. • Before unicode, no single encoding could contain enough characters. • Universal encoded character set • Enables information from any language to be stored using a single character set. • Provides a unique code value for every character, regardless of the platform, program, or language.
Unicode standard • UTF-8 encoding • Popular with html • A way of transforming all Unicode characters into a variable length encoding of bytes. • The Unicode characters corresponding to the familiar ASCII set have the same byte values as ASCII • UTF-8 can be used with much existing software without extensive software rewrites. • UTF-16 encoding • UTF-16 used when efficient access to characters is needed with economical use of storage. • Most of the heavily used characters fit into a single 16-bit code unit, while all other characters are accessible via pairs of 16-bit code units. • Better compatibility with Java
Unicode Encodings Characters UTF-8 UTF-16 c 63 0063 á C3 00E1 91 t 74 0074 E6 84 80 6100 ED A0 81 D801 DC02 B0 d 64 0064 ö C3 B6 00F6 A4 D0 0424
Unicode and the Web • Preferred encoding form for Unicode characters on the web is UTF-8 • HTTP header of a document should contain the line • Content-Type: text/html; charset=utf-8 (for HTML files) • Content-Type: text/plain; charset=utf-8 (for TEXT files) • Or in a HTML document, add the following line under HEAD the element < META http-equiv=Content-Type content="text/html; charset=UTF-8" >
Creating unicode databases • Mysql/Oracle • CREATE DATABASE database_name CHARACTER SET character_set • CREATE DATABASE confluence CHARACTER SET utf8; • Oracle 9i supports UTF 16 also. (CHARACTER SET : AL16UTF16 ) • Postgres • CREATE DATABASE database_name WITH ENCODING 'UTF8';