1 / 24

Web sources and library & information services

Explore the similarities and differences between web searching, information retrieval, and reference services. Learn about search strategies, sources organization, search engine coverage, and more.

farhani
Download Presentation

Web sources and library & information services

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Web sources and library & information services Finding, evaluating and using a variety of Web sources for searching and reference © Tefko Saracevic, Rutgers University

  2. Similarities between Web searching & IR & reference • Basic principles to approach the same • human-human interaction - interview - • social, organizational, cognitive, affective aspects to explore including task, need … • preparation of search concepts, terms, logic • determination of range, restrictions • estimation of relevance © Tefko Saracevic, Rutgers University

  3. Differences • Vastly different sources • as to contents, authority, reliability persistence • variation in amounts, depth, breadth • Very different organization • little standardization, few if any fields • Quite different search engines & capabilities -basic & advanced • also different from engine to engine • Differing search strategies needed © Tefko Saracevic, Rutgers University

  4. Also: invisible Web • Materials that general search engines cannot or WILL not include in their collection of Web pages (indexes) • You cannot find through general search engines • Contains a vast amount of information • much of it authoritative, qualitative © Tefko Saracevic, Rutgers University

  5. Why search engines miss? • Size: Web is huge, cannot cover all • Economics: associated costs are high • also pay per crawl & rank • Technical: still limited capabilities • Spam: eliminating bad also looses good • Restrictions: some site do not let in • Deep structure: some sites complex © Tefko Saracevic, Rutgers University

  6. Needed for Web searching • Knowledge & competencies • variety of Web sources • their organization • search engines • Web search strategies • search dynamics, feedback • Keeping up & up & up • constant updates, changes, innovations • many domain/subject specific © Tefko Saracevic, Rutgers University

  7. Web size - who knows? • Estimated over 16 million web servers Lawrence & Giles, 1999 • But only a fraction of direct search relevance • Domains of sites • 83% commercial, 6% scientific or educational; 3% health • 2.5% personal; 2% societies; 1.5% government, • about 1% each community, religion • 1.5% pornographic • Web Characterization Project - OCLC • statistics, trends, report, links … for 2001 reports 8.5 mill web sites • http://wcp.oclc.org/ © Tefko Saracevic, Rutgers University

  8. Organization of sources • No standardization across sources • Major approaches in search engines • classification: many directory types used • statistical analyses of terms, links • Metatags in sources • to enable retrieval by fields • HTML “keywords”, “description” • 34% of sites use them • Dublin core - .3% sites use • Organization: hindrance to retrieval • also faked contents to force retrieval © Tefko Saracevic, Rutgers University

  9. Sources & search engines • Indexed by search engines (publicly indexed) • by terms, selection, links, registration • Not publicly indexed • many domain sources will not be found e.g digital libraries, online journals, reference • many commercial sites will hardly be found • Differing approaches to inclusion/selection • mostly automatic; also generic source providers • increasingly added human evaluation & selection © Tefko Saracevic, Rutgers University

  10. Search engine coverage • No engine covers more than 16% of WWW • In respect to combined coverage of 11 top: • Northern Light 38.3% ; Snap 37.1; AltaVista 37.1 HotBot 27.1 MS 20.3 Infoseek 19.2, Google 18.6, Yahoo 17.6 Excite 13.5, Lycos 5.9, EuroSeek 5.2 • HotBot MS, Snap & Yahoo use Inktomi as search provider, but have different filtering & Inktomi databases • Northern Light has ‘special collection’ - documents not part of publicly indexabable web • Hard to discern & compare coverage • Many national search engines - own coverage © Tefko Saracevic, Rutgers University

  11. Search features among engines • Some search features the same across all but details differ - particularly in advanced • Boolean available • but sometimes AND sometimes OR default • Differences may be found in: • phrases, proximity, truncation, case sensitivity, relevance feedback, field searching, special features • term expansion to concepts (latent semantic indexing) © Tefko Saracevic, Rutgers University

  12. Search strategies & outputs • Geared toward very short searches • big majority of searches 2-3 terms (av. 2.5) • in IR av. 7-14 - making a big difference • Directory browsing a big component - not in IR • Geared toward limited top outputs • Ranking output by relevance predominates • relevance calculation differ & proprietary (secret) • except Google - they published their method • affects search strategy - you guess how is done © Tefko Saracevic, Rutgers University

  13. Meta search engines • Search engines that cover search engines – many around e.g. • All4one http://all4one.com/ • four windows - good for comparison • CDNET Search.com http://www.search.com/ • meta engine of meta engines - customization • Search Engines Worldwide • 174 countries, over 1300 engines http://www.twics.com/~takakuwa/search/search.html • More on the horizon & differing © Tefko Saracevic, Rutgers University

  14. Specialized meta engines • Selective with directories & large number of databases & search engines • Complete Planet http://completeplanet.com • Invisible Web http://invisibleweb.com • U.S. federal information via Government Printing Office Access http://www.gpo.gov/gpoaccess • Federal Bulletin Board (file libraries for download from many agencies): http://fedbbs.access.gpo.gov © Tefko Saracevic, Rutgers University

  15. Reference (expert) services • Reference services - several models • Q&A, directories, email answers etc. – e.g. • Martindale’s Reference Desk - comprehensive http://www-sci.lib.uci.edu/~martindale/Ref.html • Ask Jeeves! – most popular http://www.ask.com/ • Ask ERIC – education questions- email answers http://www.askeric.org/Qa/ • Information Please - almanac type questionshttp://www.infoplease.com/ • Academic libraries developing reference models - new service area © Tefko Saracevic, Rutgers University

  16. Libraries as Web sources • Academic libraries providing open collections & services; models vary • Rutgers libraries - big long term effort http://www.libraries.rutgers.edu/ • various sources & links involved • for domain information& sources go to: • Electronic Reference Sources; Subject Research Guides: Social Sciences & Law; Library & Information Science • University of California, Berkeley - a most elaborate effort together with Sun Corporation http://sunsite.berkeley.edu/ © Tefko Saracevic, Rutgers University

  17. Virtual libraries on the Web • Libraries emerging only on the Web • More & more libraries & organizations involved Examples of academic & public libraries • Virtual Library - Switzerland, US, UK & other countries – ‘oldest virtual library on the Web’ • http://vlib.org • Toronto Public Library • http://vrl.tpl.toronto.on.ca/ • Internet Public Library, Michigan • http://www.ipl.org/ © Tefko Saracevic, Rutgers University

  18. Domain sites • Many domain/issue specific sites • rich & often unique coverage & services • different approaches & requirements • Examples in health related domains: • Medscape - registration required http://www.medscape.com/ • Rxlist - The Internet Drug Index http://www.rxlist.com/ • Mayo Clinic HealthOasis http://www.mayohealth.org/ © Tefko Saracevic, Rutgers University

  19. Societies, organizations , publishers • Great many rich sources for searching • differences in requirements, depth, richness Examples from variety of organizations: • Assoc. for Computing Machinery http://www.acm.org/ • Digital Library; subscription or registration • State department http://www.state.gov/ • about the U.S & other countries • R.R. Bowker http://www.bowker.com/ • Free Resources from Bowker; Library Resource Guide • Genealogy:http://www.familysearch.org/ © Tefko Saracevic, Rutgers University

  20. Language barriers on the Web • English still the major language • but declining, now slightly over 50% • Multilingual retrieval search engines • Euroseek – searches 40 languageshttp://www.euroseek.com/ • All the Web – 45 languages http://www.alltheweb.com/ • in both, search in different languages covers primarily their language sources © Tefko Saracevic, Rutgers University

  21. Language barriers: translations • A number of translation sites • machine aided – i.e. plug in terms, phrases, sentences in one & review in the other language , but effectiveness??? • Free Translations http://www.freetranslations.com • Babel Fish http://babelfish.altavista.com/tr • Travlang – great for travelers – phrases http://www.travlang.com © Tefko Saracevic, Rutgers University

  22. Key professional competencies • Knowledge of SOURCES in area of interest • search engines not enough • not too helpful in finding these other sources; structure hard to discern • Evaluation of sources • a key professional skill! • standard criteria: quality, veracity, coverage etc • plus Web criteria: authority; accuracy; currency (timeliness); objectivity; coverage,persistence, usability • http://www.otterbein.edu/learning/libpages/subeval.htm © Tefko Saracevic, Rutgers University

  23. competencies … • Knowledge of users & use • Knowledge of searching • Use of technology • Adaptability, flexibility • Integration with other resources • Teaching others • Constant learning & update © Tefko Saracevic, Rutgers University

  24. Web is still a mystery! © Tefko Saracevic, Rutgers University

More Related