210 likes | 280 Views
The AGRIS Search System - Some Ideas for a Semantics- and Knowledge Network enabled simple search system. L J Haravu Kesavan Inst. of Information and Knowledge Management. Problems. The human dimension is missing
E N D
The AGRIS Search System - Some Ideas for a Semantics- and Knowledge Network enabled simple search system L J Haravu Kesavan Inst. of Information and Knowledge Management
Problems • The human dimension is missing • The availability of generic search engines has resulted in end-users sacrificing quality for ease of access; relevance for speed and convenience • A domain specific search system such as AGRIS must differentiate itself from the generic ones • Searching skills of users vary widely but all of them are treated alike • Multilingual systems such as AGRIS pose other problems as well, e.g., need for language analyzers, language-specific stemming algorithms
Challenges in Designing Search Systems • Users need help at various stages of their interactions with the search system • They need guidance to make effective use of thesauri • The possibility to use web-based ontologies opens new means to improve search effectiveness transparently to the user • The addition of a human dimension would add greatly to the satisfaction of real needs • A search engine, if possible, should go beyond just providing a list of 'hits'. • All intervention by the system should be seen as helping the user
Use environments • Use environment: the nature of end-users • their preferences, • searching behaviours, • purposes sought to be achieved • Understanding these factors could guide the design and implementation of a search system
AGRIS use environments • Very heterogenous • Many experineced researchers, teachers • Also many inexperienced, e.g., students • difficulties in articulating searches • searches formulated too generically or too specifically • Multilingual and hence search system should permit searching and retrieval in not just the 5 UN langauges • Work In relatively remote locations • Access to peers not always there or possible. • The possibility to interact with knowledgeable peers after obtaining the search results is an important element contributing to the satisfaction of an end-user.
AGRIS use environments • Poorly formulated searches result in frustration instead of elucidation because of the excessive 'noise' in the retrieval. Help on how to modify and re-submit the query to obtain better results is obviously desirable. • Search results do not always resolve uncertainties, they may in fact add to them, e.g., the presence of two papers with conflicting findings, dated information. • most users prefer to work with the simple search interface. The use of a simple text box into which users can enter a word or phrase before submitting it to the search engine has become the preferred choice of users by default.
Some ideas for design of a simple search system for AGRIS • Allow the user complete freedom to articulate his needs in his own words. • Any intervention (human or machine) at the first formulation stage is counter productive. • The user's need must be captured, however inadequately defined it may be.
Some ideas for design of a simple search system for AGRIS • If a single word is entered in the simple search interface: • make a search in the Lucene index and if the number of hits exceeds a maximum threshold: • show the first results page (sorted by relevance) and suggest to the user: • 1. the use of one or more specific terms (that are automatically displayed from the AGROVOC or ontology), or 2. the use of other terms in conjunction with the term entered. • If term entered is not an AGROVOC descriptor and if the number of hits exceeds a maximum threshold: • show results page and suggest that: • 1. Use one or more of a list of terms shown from Lucene index that are orthographically close in the language of the term entered, or 2. the use of other terms in conjunction with the term entered.
Some ideas for design of a simple search system for AGRIS • The semantics of an Ontology may also be explored, e.g., if the user enters the string “sorghum diseases” • it should be possible for the search system to infer that sorghum belongs to the class “Cereals, and that “sorghum” and “diseases” belong to mutually exclusive classes which calls for the search formulation, viz., sorghum and diseases. • If hits are more than a maximum threshold: • Suggest that the user should look at the narrower terms (sub-classes) of Sorghum and add one or more of these to the search. Also present the narrower (sub classes) of Diseases and ask the user to select one or more of these to be added to the search expression.
Some ideas for design of a simple search system for AGRIS • If a single word is entered, (descriptor or not), and if the number of hits is nil, then suggest that user may consider: • 1. The use of one or more broader terms from AGROVOC that are shown (if the term is a descriptor) or 2. the use of other terms either orthographically close to the term in AGROVOC and/or chosen from the Lucene index. • If a string, sentence or phrase is entered, parse the string to identify potential single or compound AGROVOC terms that might be searched. Hyperlink each of these to the semantic network in AGROVOC. • if the number of hits exceeds a maximum threshold, show the records retrieved via the Lucene index but also suggest that the user may decide to add one or more terms that he selects by clicking the hyper-linked terms along with the terms he has entered.
Some ideas for design of a simple search system for AGRIS • All requests/suggestions to the user should be optional -- the user may or may not use these • One option also is suggest to the user to identify the purpose of his search from a drop down. This could determine the kind of recall/precision user is looking for
Query modification and reformulation • Search systems implicitly assume a user's need is fulfilled after showing the results. • In reality, the user, after looking through his search results, and even after he uses the help and suggestions to help him formulate his query, may find that the results achieved are not entirely satisfactory.
Issues in Query modification, reformulation • How can the search system help the user reformulate his query to obtain better results? • Can it simulate an interaction with an information specialist or a more experienced peer who then helps in reformulating the query? • Can the search system use the inherent semantics in a thesaurus or ontology and if so how? • Can the system provide a means for the end-user to actually enter into a dialogue with a knowledgeable peer to obtain a more meaningful interpretation of the results? • Can the search system help the user make use of leads that the search results have provided.
Ideas for Query modification, reformulation • Allow selection of one or more result records and ask for 'More like these' • This would need using terms in the selected records and the relationships of these with other terms in the thesaurus. • Inferences using an ontology may point (along with other user input) to other terms (from titles, author, geographic area) that could be used in a reformulated search
Ideas for Query modification, reformulation • In the reverse case Allow selection of one or more result records and ask for 'Not like these'. • The possibility with ontologies to infer the broad categories into which the entered search terms belong. • In agriculture: thing (e.g., plant, crop, species, soils), action (e.g., breeding, harvesting, measurement), condition/property (e.g., diseases), agent (e.g., bacteria, viruses), space (geographic areas, countries, regions), and time (e.g., seasons). This is a facet analytical approach to analysis of a query. • Such an analysis could lead to transparent expansion (and or restirction) of a reformulated query • The system presents the categorized (expanded query) and asks which concepts must be present, which may be present and which should not be in results.
Failed Searches • One or more terms are incorrectly spelled. Terms orthographically close to the entered term taken from the Lucene index are shown. • search was too tightly formulated. • The use of a synonymous or near synonymous term to an AGROVOC descriptor.
Human dimension in search systems • Information, in general, is entropic. Information alone is not enough for a user to take action he needs to (e.g., redo an experiment or revise procedures). • There are situations in which the user does not feel better after searching and getting results. He needs a more experienced peer to guide him.
Human dimension in search systems • Many information use surveys point out that more than information from a database, it is the one-on-one interaction with a senior or more experineced peer that helps. • If a search system can build a human interface, many users might benefit. They will be using a knowledge network. • The search system would be going beyond 'hits' alone
Human dimension in search systems • One way of doing this is to build a volunteer group of information specialists and subject experts in different areas and sub-areas of agriculture. • If a user is not happy with his search results, or his own knowledge is insufficient for him to resolve uncertainties, he could be given the option to seek synchronous or asynchronous interaction with a member of the volunteer group.
Human dimension in search systems • If user exercises the option, the system searches for an appropriate volunteer who might be in the best position to guide the user by making available the email, telephone, or online chat of the volunteer and put the end-user in touch with such a volunteer. • This would open the doors to knowledge exchange and elucidation and not end with the provision of information alone.
Human dimension in search systems • Feedback analysis • the collection and analysis of feedback (collected automatically as well as via user input) on user experiences with the search system. • the search system can automatically create a log of search terms (single terms, phrases, strings, etc.) entered by users; the language of the search term used; the instances where users actually used the help/prompts/suggestions that the search system offered and the nature of the help/prompts/suggestions used; the number of hits that a search resulted in; the country or region of the searcher • Feedback provided directly by the user, aggregated and analyzed over time together with automatically collected feedback provides a knowledge base that is valuable in fine-tuning the search system.