470 likes | 592 Views
“Informatie vinden” Conclusie. Paul.Nieuwenhuysen@vub.ac.be Vrije Universiteit Brussel Pleinlaan 2, B-1050 Brussels, Belgium Prepared for a presentation at the one-day conference organised by NVB-WB in KB, Den Haag, Nederland 27 April 2006. Basic difficulties in information retrieval.
E N D
“Informatie vinden”Conclusie Paul.Nieuwenhuysen@vub.ac.be Vrije Universiteit Brussel Pleinlaan 2, B-1050 Brussels, Belgium Prepared for a presentation at the one-day conference organised by NVB-WB in KB, Den Haag, Nederland 27 April 2006
Basic difficulties in information retrieval Difficulty: A word or phrase is not the same as a concept. This may cause a low recall. Word Word Concept L
Basic difficulties in information retrieval (continued) • When the user needs information related to a particular concept or a combination of more elementary concepts, then the user should formulate a query that covers these concepts well, by using not just a single word or term to cover each concept, but by using several words and/or terms, including synonyms, spelling variations, narrower terms, related terms, translations, and so on. • The aim is mainly to increase the recall of the search action, by covering the concept better, but also to increase the precision by including the most appropriate words and/or terms in the query.
Basic difficulties in information retrieval (continued) Difficulty: Many words suffer from ambiguity of meaning. This may cause low precision. Word Relevant concept Irrelevant concept NOT wanted L
Basic difficulties in information retrieval (continued) • Many words and/or terms from some natural language suffer from ambiguity, because natural languages have evolved spontaneously, not strictly controlled. • An example is the word “pascal”, which can have several meanings: • the philosopher Blaise Pascal, • the programming language Pascal, • the physical unit of pressure, and • the name of many persons…
“Vinden van informatie”Conclusie: Stelling • Subject descriptions should be adapted to thelibrary = context = user community! • In other words: A “general, typical user of all libraries” does not exist.
“Vinden van informatie”Conclusie: Stelling • The subject description system (classification and/or thesaurus and/or…) should be • clearly visible and usable • well-integrated with the formal descriptions of documents • well explained to the user! • In most systems this is NOT well implemented.
“Vinden van informatie”Conclusie: Stelling • Even better: The system is invisible, but works well in the background (for example: automatic expansion of queries) • This is NOT done in most systems.
“Vinden van informatie”Conclusie: Stelling • Level of subject descriptions must depend on the resources (budget and personnel)that is available by the library! • For example: No money no subject descriptions.
“Vinden van informatie”Conclusie: Stelling • Folksonomy will be accepted and implemented.
“Vinden van informatie”Conclusie: Syllogisme • Merging of different subject description systems is impossible or very expensive. • Collections with different subject description systems will be merged one day. • Therefore: Forget about subject descriptions.
“Vinden van informatie”Conclusie: Syllogisme • Federated searching = meta-searching = one-stop searching is coming up. • This federated searching hinders exploitation of subject descriptions. • Therefore: Forget about subject descriptions!
“Vinden van informatie”Conclusie: Truc • Offer independent, external, horizontal, general thesaurus systems to users, so that they can find relevant terms. • Then it is desirable to link thesaurus terms into the local catalogue for searching.
Horizontal thesaurus systems for natural human language • Furthermore, Google Web Search offers also a more direct, automatic expansion of query words, at least for the English language. • This requires an explicit request through the Google command language by the user to implement this, in fact by preceding a particular search query word in a query by a tilde like in “~queryword”. • However, this is probably not known by most users. A more user-friendly implementation would be welcome.
“Vinden van informatie”Conclusie: Truc • Offer users a view on words that are related to a word used in a first query, so that they can find other relevant words to search or more relevant words for searching.
System based on words present in the context of the first query • For instance, AquaBrowser Library software shows the query words of a user in the context of a selection of other words that occur in the document collection. • More information is available from their WWW site http://www.medialab.nl/ • We can read there: “When you type in a word, you get a 'word cloud' that contains different associations and shades of meaning of that word. You click on the ones that most closely match your interest, and it will help you find the library resources you need. It’s a lot of fun to use, too."
“Vinden van informatie”Conclusie: Truc • Advanced tool for retrieval of items about a subject: Clustering
Automatic topical clustering today • The ambiguity of words and terms from natural languages lowers the precision of searches executed with relatively classical, simple retrieval software, as mentioned above. • This problem can be tackled by topical clustering of search results on the basis of the words included in those results, hoping that this will result in clusters of documents about similar, semantically related concepts/topics/subjects.
Clusty • http://clusty.com/ • This is an internet meta-search engine that offers not only a conventional ranked list of search results but also search results clustered by topics or sources or URLs. • The system is produced by the same company that produces the Vivisimo WWW meta-search system that is also mentioned further below. Both use the ‘Vivisimo Clustering Engine’.
Grokker • http://www.grokker.com/A public access implementation of Grokker software offers federated searching free of charge through the • Yahoo! WWW search engine database, • the Amazon Book database, and • the ACM Digital Library • The results are offered in an outline, a list of categories (and --if wanted-- also in the more graphical form of an interactive map).
Vivisimo • http://vivisimo.com/ • A public access implementation of Vivisimo software offers federated searching free of charge through many WWW search engine databases.Then it clusters results in an outline, a list of categories. (Clusty mentioned above uses the same ‘Vivisimo Clustering Engine’)
Wisenut • http://www.wisenut.com/ • Wisenut offers searching free of charge through WWW pages and clusters the results in an outline, a list of categories.
“Vinden van informatie”Conclusie: Truc • Advanced tool for retrieval of items about a subject: Visualization
Information visualization: introduction • Visualization can help users to interpret complex data sets so that better decisions can be made faster. • On the one hand the maps created by the system should help users to interpret and analyse a set data, but on the other hand they bring their own cognitive load. In other words, before the user can interpret the data set, first the type of visualization must be understood. • Some mapping technique will probably prove to be useful and widely acceptable in the near future.
Visualization of the information source available • It may be useful to visualize some aspects of information sources to a user, to give the user a better idea of what is available. • Visualization of what is available can already be applied in the case of the hard disk on personal computers. Obviously it is interesting to get a clear view on the contents of a hard disk. Some utility programs are available that can be installed and applied for this purpose.
Visualization in a system that helps the user to formulate a query • For instance the Thinkmap Visual Thesaurus can show relations among words in English in a graphical map on the computer display that is obviously 2-dimensional. • Furthermore the map is dynamic: it moves to reveal and show the underlying 3-dimensional, spatial map of the related words and phrases. • The software exploits the open access WordNet thesaurus (which is mentioned also above). • http://www.visualthesaurus.com/
Visualization in a system that helps the user to formulate a query • Another example: As mentioned and illustrated above, the AquaBrowser Library software visualizes relations between a user’s query and other words that are present in the information items that a library makes available and that may be relevant in the context of the query.
Visualization of the characteristics of query result sets • In a next phase, when a user has formulated a query and has executed the search, then the set of search results are presented in most cases as a simple list of references, ordered or ranked in some way or another. • Some systems go further and offer results in clusters(as outlined above). • Moreover, some programs do not offer the results merely with text only, but they can visualize the results in the form of a map.
Visualization of the characteristics of query result sets • For instance:Kartoo software can be applied • to search • to cluster/categorize the search results • and furthermore, to visualize these clusters in a map. • A public access site offers meta-searching in several WWW search engines, free of charge, through http://www.kartoo.com/
Visualization of the characteristics of query result sets • Another example: Grokker software • can execute federated searches through several databases in 1 action, • can cluster/categorize results from search actions, and • can then visualize these in a map • A public access implementation allows anyone to perform a WWW search based on the Yahoo! database of WWW pages: http://www.grokker.com/ • Has already been implemented in a university library.
“Vinden van informatie”Conclusie: Truc • First use an external, more advanced or more specialised database; then check local availability using a catalogue database.Or: use a catalogue database, find a document and then find more information in another external, more advanced or more specialised database. • In this procedure, it is desirable to link/integrate both databases. • This is feasible. • For example: OpenURL linking from a local library catalogue deep into the Amazon book database.
Conclusie van deze conclusie:“Het laatste woord is nog niet gezegd”“We zitten nog een tijdje met dit probleem”
You are free to copy, distribute, display this work under the following conditions: • Attribution: You must mention the author. • Noncommercial: You may not use this work for commercial purposes. • No Derivative Works: You may not change, modify, alter, transform, or build upon this work. • For any reuse or distribution, you must make clear to others the license terms of this work.