1 / 16

Building Database-backended Multilingual, Multimedia Data Repositories: The aAQUA Experience

Explore aAQUA, an online forum answering grassroots questions, bridging ICT gaps with multilingual, multimedia support. Learn about its technical perspective and Unicode encoding.

janewillis
Download Presentation

Building Database-backended Multilingual, Multimedia Data Repositories: The aAQUA Experience

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. BuildingDatabase-backended Multilingual, Multimedia Data Repositories:The aAQUA Experience

  2. Introduction • aAqua’s (almost All questions answered) • An online forum for answering questions from the grassroots by the experts in the field. • Bridges gaps in use of ICT • Usability • Availability • Multi-Linguality • Multi-media Support • Multi-Lingual Storage and Retrieval • Reusability

  3. Usability

  4. A Sample Thread

  5. aAqua in Operation

  6. aAqua Server aAqua Offline aAqua Server Mobile network Internet HTTP Crop Doctor aAqua Bhav Puchiye Keyword Browser SMS Crop Recommendation aAqua Mobile Gateway

  7. aAqua Demo

  8. aAQUA- a technical perspective • Employs three tier web architecture • Uses mvnforum which is based on the MVC architecture. • Lucene used as search engine. • Compatible with any servlet container which supports JSP1.2 and Servlet2.3 • Runs on tomcat • Works with unicode UTF-8 compliant Oracle 9i as well as mysql database • Is integrated with open source digital library software

  9. Multi-Linguality

  10. Query in Hindi UNL graph UNL Document UNL Document Result in Hindi UNL Document Multi-lingual Storage and Retrieval …The plants blossom but the flowers scorch… and(blossom(icl>develop(obj>thing)):0S.@entry.@custom, scorch(icl>dry(obj>thing)):2E.@contrast.@custom) obj(blossom(icl>develop(obj>thing)):0S.@entry.@custom, plant(icl>organism):04.@def.@pl) obj(scorch(icl>dry(obj>thing)):2E.@contrast.@custom, flower(icl>reproductive structure):1P.@pl.@def) “flowers Scorch” Info repository

  11. Unicode • Computers store letters and other characters by assigning a number for each. • Hundreds of different encoding systems for assigning these numbers. • Before unicode, no single encoding could contain enough characters. • Universal encoded character set • Enables information from any language to be stored using a single character set. • Provides a unique code value for every character, regardless of the platform, program, or language.

  12. Unicode standard • UTF-8 encoding • Popular with html • A way of transforming all Unicode characters into a variable length encoding of bytes. • The Unicode characters corresponding to the familiar ASCII set have the same byte values as ASCII • UTF-8 can be used with much existing software without extensive software rewrites.   • UTF-16 encoding • UTF-16 used when efficient access to characters is needed with economical use of storage. • Most of the heavily used characters fit into a single 16-bit code unit, while all other characters are accessible via pairs of 16-bit code units. • Better compatibility with Java

  13. Unicode Encodings Characters UTF-8 UTF-16 c 63 0063 á C3 00E1 91 t 74 0074 E6 84 80 6100 ED A0 81 D801 DC02 B0 d 64 0064 ö C3 B6 00F6 A4 D0 0424

  14. Unicode and the Web • Preferred encoding form for Unicode characters on the web is UTF-8 • HTTP header of a document should contain the line • Content-Type: text/html; charset=utf-8 (for HTML files) • Content-Type: text/plain; charset=utf-8 (for TEXT files) • Or in a HTML document, add the following line under HEAD the element < META http-equiv=Content-Type content="text/html; charset=UTF-8" >

  15. Creating unicode databases • Mysql/Oracle • CREATE DATABASE database_name CHARACTER SET character_set • CREATE DATABASE confluence CHARACTER SET utf8; • Oracle 9i supports UTF 16 also. (CHARACTER SET : AL16UTF16 ) • Postgres • CREATE DATABASE database_name WITH ENCODING 'UTF8';

  16. Thank You

More Related