400 likes | 488 Views
The goal of FAOs WAICENT programme: “ Fighting Hunger with Information ” But how to get the information? The Web Search experience (Google etc.) : you get not what you are you are looking for you get a lot of irrelevant stuff. Number of Relevant Documents Identified. Precision.
E N D
The goal of FAOs WAICENT programme: “Fighting Hunger with Information” But how to get the information? The Web Search experience (Google etc.) : • you get not what you are you are looking for • you get a lot of irrelevant stuff
Number of Relevant Documents Identified Precision Total Number of Documents Identified Number of Relevant Documents Identified Recall Number of Relevant Documents in the Collection The Search Problem How to evaluate Search Results? Both parameters are ranking low today!
The Role of Bibliographical Databases • The State of the AGRIS network • Enforcing Metadatastandards • The Agricultural Ontology Service
Since 1999 the AGRIS database has accumulated nearly 200.000 new records But what is the real value of these records?
The new paradigm Everyone wants to find everything on the WWW, But there are several constraints • The information must be available on the web • The information must be identified • The information must be accessible Is AGRIS (bibliographical databases) helpful in overcoming these restrictions? If so, how?
Availability Someone has to publish the information on a web server In research there exist cultural and economic reasons for the unavailability of knowledge: • Not everyone likes to share (especially scientists) • The necessary infrastructure is lacking • Some publications are career relevant and only foreseen for career relevant places to publish • Other publications might have a market value
Identification and Accessibility You need to know what you are looking for! It is estimated that more than 60 % of all searches are not well defined! You need to define your range of knowledge If you are looking for a very well defined information object then – Google will generally do it for you
From catalogues you can extract knowledge If, for example, we request : “all the information about the “Rinderpest” in Africa that has been published in the year 2000 and has been presented at conferences” No full text search engine would give a satisfactory answer! AGRIS can! But AGRIS will give you “only” bibliographical records…….
How to get the real stuff…. • In pre-Web times, bibliographies gave you a call number; AGRIS, the address of the centre • In Web times you can get a link to the URL • But the URL is the most unstable part of a metadata record, physical locations will never be stable! • To use the physical URL as identifier for a document is a sure path to future failure; • PURL databases are no solution, because no one wants to maintain them; • At the moment there is no solution agreed for an “Unique Identifier”
…better the bibliography than the link….. • A bibliographical record as entity defines an information object very well; • (Title, author, Identifiers, subject keywords…) • If these metadata are of high quality, there is no need for a direct link to the full text; • The document can be anywhere on the WWW … • The bibliographical Ontology identifies what you are looking for; • Google (or any quality full text search) will do the rest
Example (title contains “diazinon” and ACROVOC contains “toxicity”, 17 records in Agris)
“Effects of diazinon on large outdoor pond microcosms” 1997topics - [ Traduci questa pagina ]... Giddings, JM, et al. 1996. Effectsofdiazinononlargeoutdoorpondmicrocosms.Environmental Toxicology and Chemistry 15:618-629. 4/8 Computer models. ...www.cnr.colostate.edu/~danb/seminar/1997topics.htm - 6k - Copia cache - Pagine simili Volume 15, Number 5 - [ Traduci questa pagina ]... Multivariate Analysis (pp. 608-617) JL Shaw, JP Manning. EffectsofDiazinononLargeOutdoorPondMicrocosms (pp. 618-629) JM Giddings ...www.ruf.rice.edu/~etcj/155.html - 7k - Copia cache - Pagine simili TOP 5 CHEMICALS - [ Traduci questa pagina ]... Does it bioaccumulate? References. Giddings, JM, RC Biever, MF Annunziato andAJ Hosmer. 1996. EffectsofDiazinononLargeOutdoorPondMicrocosms. ...www.science.mcmaster.ca/Biology/4S03/LB5.HTM - 9k - Copia cache - Pagine simili Methods in Aquatic Toxicology - [ Traduci questa pagina ]... Chem. 13(3): 453-460. Giddings, JM., RC Biever, MF Annunziato, and AJ Hosmer. (1996).Effectsofdiazinononlargeoutdoorpondmicrocosms. Env. Tox. Chem. ...www.science.mcmaster.ca/Biology/4S03/SB7.HTM - 14k - Copia cache - Pagine simili 9 Reference List, Danish Environmental Protection Agency - [ Traduci questa pagina ]... Giddings J, Biever RC, Annunziatio MF & AJ Hosmer. 1996. Effectsofdiazinononlargeoutdoorpondmicrocosms. Environmental toxicology and chemistry. ...www.mst.dk/udgiv/Publications/2001/ 87-7944-634-5/html/kap09_eng.htm - 21k - Copia cache - Pagine simili
The main problem of the semantic web is not of technical nature but is creating the semantic context = Creating metadata
Conclusions • Bibliographical databases guarantee knowledge organization andespecially retrieval; • The collection of “bibliographical” data is the single, most important factor in the retrieval of electronic information • The better – and more versatile – our metadata, the easier the task of accessing knowledge objects on the web will become;
The State of the AGRIS network
AGRIS has unique features! • Translated Metadata from otherwise inaccessible material (China, Japan, Thailand) • AGRIS is the unique source for references from many national systems (China, India, Thailand, Japan) • The AGRIS methodologies have assured quality in the capture of metadata for two decades
AGRIS is used • About 70.000 bibliographical records/year are sent to FAO: • ca. 30 % with abstracts • ca 1% with link to the full text • The AGRIS website has about 9,000 users/month; 2,000 are regular users from institutions
AGRIS: the comparative advantage • Research Material • Grey Literature • South – South Transfer • Promotion of Standards • Existing documentation centres
The Next Challenges • New Standards (electronic publishing) • Better coverage (more material) • Better and easier access to the collected knowledge • More participation of the information producers • Better tools
The main problem of the semantic web is not of technical nature but is creating the semantic context = Creating metadata
Why do we need common metadata? • The AGRIS community had never doubts about the necessity of Metadata standards • Now this is a hot issue in the development of the Semantic Web • We are able to say that in a given application dc:title means “A name given to the resource” • And for example, not…A title given to a person, such as “Sir” or “Ms.” • Standardization ensures that we are talking about the same thing!
Metadata – why we need them so urgently • Common metadata allows us to : • Give lexical words a meaning • Facilitate easy exchange between systems • Facilitate resource discovery and request access for it • Recombine content to be used for different purposes • inventory across databases • Ex. send an email using all <ags:email> fields • Reduction of cost by using standardized tools • AgMES document • DC.Dot
Query to search title element for “Shrimp Production in Thailand” wrapper wrapper wrapper arc:title best:title my:title DB-A DB-B DB-C Example: Common Environment • Parallel search queries to search across systems
Query to search dc:title tag for “Shrimp Production in Thailand” dc:title dc:title dc:title DB-A DB-B DB-C Example: Ideal Environment • Single search across systems
Query to search dc:title tag for “Shrimp Production in Thailand” arc:title best:title my:title DB-A DB-B DB-C Example: Achievable Compromise • Single search across systems AgMES compliant XML mapping
The AGStandards Initiative • The Agricultural Metadata Element set (AGMES) http://www.fao.org/agris/agmes • The new AGRIS application Profile draft • Improving and Creating Semantic Standards participating from AGROVOC
The main problem of the semantic web is not of technical nature but is creating the semantic context = Creating metadata
Problems we want to solve (1) • No cross navigation between applications • Full text search engines based on statistical text analysis are imprecise • Systems based only on “machine intelligence” do not show too promising results • Web crawlers and harvesters do good jobs only on already structured information sources. • Cataloging and indexing are labor-intensive processes, requiring special training. Tools for automating or semi-automating these processes are much in demand. • Recognition of meaning (semantic analysis) by machines is only possible by using using structured meta-information and formal knowledge description • Agreed metadata schemas • Controlled vocabularies, Taxonomies • We need reusable parts of web searchlets and portlets
Problems we want to solve (2) • Topic Trees from categorization schemes and thesauri are rigid and not very expressive • Machine produced clusters are “flexible”, but imprecise and at times out of context
Knowledge Organization Systems: Vocabularies • Insufficient subject + language coverage Existing Thesauri and Knowledge Organization Systems (KOSs) Dedicated KOSs e.g., ASFA thesaurus e.g., the Multilingual Forestry Thesaurus • Only very simple encoding of semantic relations e.g., the Sustainable Development website classification • Common concepts are not declared e.g., biological taxonomies such as NCBI and ITIS • No or very limited interoperability Other thematic thesauri Non-dedicated KOSs • Very limited machine readability CABI Thesaurus AGROVOC NAL Thesaurus • Severe maintenance problems GEMET
The solution we propose - Domain Ontologies • An ontology is a formal knowledge organization system • A formal description of the application knowledge • It contains concepts and their definitions • Relations between concepts • Possibility for machine processing
Benefits from Ontologies • Semantic Organization of websites • Knowledge maps • Guided discovery of knowledge • Easy retrievability of information without using complicated Boolean logic • Text processing by machines • Text Mining on the Web (meaning-oriented access) • Automatic indexing and text annotation tools • Full text search engines that create meaningful classification (FAO-Schwartz not related to FAO) (semantic clustering) • Intelligent search of the Web • Building dynamical catalogues from machine readable meta data • Cross Domain Search • Natural Language processing • Better machine translation • Queries using natural language
The Collaborative Approach We Want to Adopt • Only agreed semantic standards guarantee knowledge discovery between different applications. • Developing Knowledge Organization Systems is resource intensive and requires stakeholder’s agreement and participation. • Hence, FAO started initiatives to bring interested partners together • The AGStandards initiative was launched in October, 2000 to agree on agricultural metadata standards • The Agricultural Ontology Service (AOS) concept paper was publicized in July 2001. • 3 AOS workshops have been organized up to now • A consortium of partners is to be established
Collaboration between FAO/GIL and Katsetsart University • Enforcing National AGRIS activities • Better coverage of Thai literature • Better access to references important for Thai Agriculture • Participation in AOS consortium • Agrovoc development • Partner in a consortium • AOS workshop in Thailand