1 / 242

Internet en WWW voor het opsporen van informatie

Internet en WWW voor het opsporen van informatie. Paul.Nieuwenhuysen@vub.ac.be Vrije Universiteit Brussel, Pleinlaan 2, B-1050 Brussel. februari 2004 VUB-IDLO. The slides are available from http://www.vub.ac.be/BIBLIO/nieuwenhuysen/courses/ (note: BIBLIO and not biblio).

moya
Download Presentation

Internet en WWW voor het opsporen van informatie

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Internet en WWW voor het opsporen van informatie Paul.Nieuwenhuysen@vub.ac.be Vrije Universiteit Brussel, Pleinlaan 2, B-1050 Brussel. februari 2004 VUB-IDLO

  2. The slides are available from http://www.vub.ac.be/BIBLIO/nieuwenhuysen/courses/ (note: BIBLIO and not biblio)

  3. Planning van de dag: voormiddag • Over “informatie” • Informatiemarkt • Information retrieval • Thesaurussen (+ oefenen van query-formulering) • Netwerken en Internet i.h.b. • World-Wide Web (+ oefenen van “browsing” + “saving”) • LUNCH

  4. Planning van de dag: namiddag (deel 1) • Online toegankelijke informatiebronnen! • Globale Internet directories (+oefenen) • Internet indexes (+ oefenen) • Boek-databases (+ oefenen) • Te betalen databases • Databases met titels van tijdschriftartikels • Vinden van illustraties/beelden/foto’s (+ oefenen)

  5. Planning van de dag: namiddag (deel 2) • Evaluatie van informatiebronnen • Vrij zoeken volgens eigen interesse, met assistentie

  6. -Interruptions -Questions -Remarks -Discussions are welcome

  7. About “information” Information concepts

  8. The flow of documentary information with primary and secondary sources Author / Creator / Sender Primary sources / systems:mainly Journal articles / Books / Electronic mail / Online sources /... Reader /User / Receiver Secondary sources / systems: mainly Reference works (printed, CD-ROM, online) Library catalogues, including OPACs...

  9. The role of secondary information sources • The secondary information flow is generated on the basis of the primary flow, mainly because the great amounts of primary information lower the chance to retrieve and use the appropriate information item. • Secondary information tries to bring some order in the great chaos.

  10. Various categorisations of documentary information sources Information sources can be categorised in various ways. For instance: • Books • Serials • Primary • Secondary • Text • Image • Sound • Animation/video • Software • Data • Interactive • Hard copy /not digital • Digital • Offline • Online

  11. Retrospective searching versus current awareness: scheme Past Now Future Retrospective searching Current awareness

  12. Information retrieval: evolution of storage and distribution media • 1450 printing with reusable characters/fonts • 1975 + online access databasesfrom the 1970s growing Internet • 1985 + CD-ROM • 1990 + World-Wide Web (based on the Internet)

  13. Information retrieval: end user or information intermediaries End-userInformation intermediary (Broker or library or ...)Information

  14. End user versus information intermediary • People can retrieve information themselves, directly as so-called “end-users”. • However, • the information landscape is complex, • it may cost a lot of the time to find the right information, • it may be costly to search for information • Therefore it may be wise to obtain the assistance of an expert information intermediary, such a a reference librarian or an information broker.

  15. About “information” Computer- and network-based information

  16. Information: from bits to meaningful information Digital computer data = bits Information = “documents”, meaningful for and to be interpreted by human beings 01 or Program code, meaningful for and to be interpreted / executed by a suitable / compatible computer

  17. Information: digitally stored and managed information Categories of digital, computer readable information / data, forming electronic “documents”, understandable by human beings. 01 text numbers images video sounds multimedia +

  18. Information: types of digital information • Digital information • Multimedia / Hypermedia Sound Linear text Hypertext Static images Video Programs for computers 01

  19. Some publication media compared Online / Networked Update speed CD-ROM Printed Volume

  20. Scientific publishing in Utopia: an ideal scheme Many authors author = reader in science Many editors / publishers Online remote access multimedia database server Many database search clients and user interfaces one global , international computer data communication network Many readers / users

  21. ?? Question ?? Indicate the differences between reality and that simplified, ideal schemeof the information flow.

  22. ?? Question ?? Which basic problems/difficulties hinder people to find / access / use information?

  23. Information retrieval: basic difficulties (Part 1) • In many cases it is not completely clear to the user of an information retrieval system which information is in fact needed, required. • In many cases the need for information cannot be expressed completely in the form of a query. One of the reasons is that the complete context of the information need should ideally be expressed, including the knowledge and background of the searcher.

  24. Information retrieval: basic difficulties (Part 2) • Computer systems are artificial, but nevertheless most use human language in their interface with the human users, for instance in database search systems. This may cause difficulties related to language and vocabulary in particular. Some examples: • People use different languages and different terms (vocabularies) to describe a similar concept. • Concepts, vocabularies and meanings of words and terms may change over time. • Meanings of words / terms may depend on their context.

  25. Information retrieval: basic difficulties (Part 3) • Many different and imperfect retrieval systems should or must be used. • To retrieve and access the information that is in principle available, many different retrieval systems must be available and be mastered. • Furthermore, a perfect information retrieval software does not (yet) exist; scientific and technological evolution is fast in the domain of information retrieval software since about 1970.

  26. Information retrieval: basic difficulties (Part 4) • Information overload Users are often overwhelmed by the amount of available information and by the large influx of new information.

  27. Information retrieval: basic difficulties (Part 5) • The price (or inaccessibility) of particular information A lot of information cannot be obtained or at least not free of charge.

  28. The information industry and the information market The components of the information industry

  29. The components of the information industry • Authors • Publishers • Distributors • Users • Related organizations

  30. The information industry and the information market Overview and evolution

  31. Increase in the number of scientific and technical serial publications

  32. The information market: growth in the database industry Source: Williams, in: Gale Directory of Databases, 1998.

  33. The information industry / market: future trends (Part 1) • Growth in the production of databases. • Less analogue / hard-copy production = more digital production, storage, and distribution of information. • More integration of information types into multimedia and hypermedia.

  34. The information industry / market: future trends (Part 2) • Growth in the number of • producers and distributors, • end-users searching databases due to easier use and lower costs of information technology

  35. Databases and computerized information retrieval Introduction

  36. What is a database? A database is a collection of similar data records stored in a common file (or collection of files).

  37. Types of databases: examples Examples: The databases that form the basis for • catalogues of books or other types of documents • computerized bibliographies • address directories • a full text newspaper, newsletter, magazine, journal+ collections of these • WWW and Internet search engines • intranet search engines • ...

  38. Information retrieval: the basic processes in search systems Information problem Text documents Representation Representation Query Indexed documents Evaluation and feedback Comparison Retrieved, sorted documents

  39. Databases and computerized information retrieval Text retrieval and language

  40. Text retrieval and language: a word is not a concept (a) Problem: A word or phrase or term is not the same as a concept or subject or topic. Word Word Concept L

  41. Text retrieval and language: a word is not a concept (a’) So, to ‘cover’ a concept in a search, to increase the recall of a search, the user of a retrieval system should consider an expansion of the query; that is: the user should also include other words in the query to ‘cover’ the concept. L

  42. Text retrieval and language: a word is not a concept (a’’) • synonyms!(such as : Latin names of species in biology besides the common names, scientific names besides common names of substances in chemistry…) L

  43. Text retrieval and language: a word is not a concept (a’’’) • narrower terms, more specific terms (such as particular brand names);including terms with prefixes(for instance: viruses, retroviruses, rotaviruses,...) • spelling variations (such as UK English versus US English);possible variations after transliteration L

  44. Text retrieval and language: a word is not a concept (a’’’’) • singular or plural forms of a noun (when this is used as a search term) • (relevant) related terms • various forms of a verb (when this is used in the query) • broader terms (perhaps) L

  45. 45 ?? Question ?? Which problems in text retrieval are illustrated by the following sentences? L

  46. Examples Time flies like an arrow. Fruit flies like a banana. ?

  47. Examples Timeflies like an arrow. Fruitflies like a banana.

  48. Examples Timeflies like an arrow. Fruitflieslike a banana. OK!

  49. Text retrieval and language: ambiguity of meaning (a) • Problem: A word or phrase can have more than 1 meaning.Ambiguity of the meaning of a word is a problem for retrieval. This decreases the precision of many searches.The meaning can depend on the context. The meaning may depend on the region where the term is used. L

  50. Example Text retrieval and language: ambiguity of meaning (a’) • Example of a word: • Pascal the philosopher • Pascal the computer language L

More Related