240 likes | 288 Views
Explore the similarities and differences between web searching, information retrieval, and reference services. Learn about search strategies, sources organization, search engine coverage, and more.
E N D
Web sources and library & information services Finding, evaluating and using a variety of Web sources for searching and reference © Tefko Saracevic, Rutgers University
Similarities between Web searching & IR & reference • Basic principles to approach the same • human-human interaction - interview - • social, organizational, cognitive, affective aspects to explore including task, need … • preparation of search concepts, terms, logic • determination of range, restrictions • estimation of relevance © Tefko Saracevic, Rutgers University
Differences • Vastly different sources • as to contents, authority, reliability persistence • variation in amounts, depth, breadth • Very different organization • little standardization, few if any fields • Quite different search engines & capabilities -basic & advanced • also different from engine to engine • Differing search strategies needed © Tefko Saracevic, Rutgers University
Also: invisible Web • Materials that general search engines cannot or WILL not include in their collection of Web pages (indexes) • You cannot find through general search engines • Contains a vast amount of information • much of it authoritative, qualitative © Tefko Saracevic, Rutgers University
Why search engines miss? • Size: Web is huge, cannot cover all • Economics: associated costs are high • also pay per crawl & rank • Technical: still limited capabilities • Spam: eliminating bad also looses good • Restrictions: some site do not let in • Deep structure: some sites complex © Tefko Saracevic, Rutgers University
Needed for Web searching • Knowledge & competencies • variety of Web sources • their organization • search engines • Web search strategies • search dynamics, feedback • Keeping up & up & up • constant updates, changes, innovations • many domain/subject specific © Tefko Saracevic, Rutgers University
Web size - who knows? • Estimated over 16 million web servers Lawrence & Giles, 1999 • But only a fraction of direct search relevance • Domains of sites • 83% commercial, 6% scientific or educational; 3% health • 2.5% personal; 2% societies; 1.5% government, • about 1% each community, religion • 1.5% pornographic • Web Characterization Project - OCLC • statistics, trends, report, links … for 2001 reports 8.5 mill web sites • http://wcp.oclc.org/ © Tefko Saracevic, Rutgers University
Organization of sources • No standardization across sources • Major approaches in search engines • classification: many directory types used • statistical analyses of terms, links • Metatags in sources • to enable retrieval by fields • HTML “keywords”, “description” • 34% of sites use them • Dublin core - .3% sites use • Organization: hindrance to retrieval • also faked contents to force retrieval © Tefko Saracevic, Rutgers University
Sources & search engines • Indexed by search engines (publicly indexed) • by terms, selection, links, registration • Not publicly indexed • many domain sources will not be found e.g digital libraries, online journals, reference • many commercial sites will hardly be found • Differing approaches to inclusion/selection • mostly automatic; also generic source providers • increasingly added human evaluation & selection © Tefko Saracevic, Rutgers University
Search engine coverage • No engine covers more than 16% of WWW • In respect to combined coverage of 11 top: • Northern Light 38.3% ; Snap 37.1; AltaVista 37.1 HotBot 27.1 MS 20.3 Infoseek 19.2, Google 18.6, Yahoo 17.6 Excite 13.5, Lycos 5.9, EuroSeek 5.2 • HotBot MS, Snap & Yahoo use Inktomi as search provider, but have different filtering & Inktomi databases • Northern Light has ‘special collection’ - documents not part of publicly indexabable web • Hard to discern & compare coverage • Many national search engines - own coverage © Tefko Saracevic, Rutgers University
Search features among engines • Some search features the same across all but details differ - particularly in advanced • Boolean available • but sometimes AND sometimes OR default • Differences may be found in: • phrases, proximity, truncation, case sensitivity, relevance feedback, field searching, special features • term expansion to concepts (latent semantic indexing) © Tefko Saracevic, Rutgers University
Search strategies & outputs • Geared toward very short searches • big majority of searches 2-3 terms (av. 2.5) • in IR av. 7-14 - making a big difference • Directory browsing a big component - not in IR • Geared toward limited top outputs • Ranking output by relevance predominates • relevance calculation differ & proprietary (secret) • except Google - they published their method • affects search strategy - you guess how is done © Tefko Saracevic, Rutgers University
Meta search engines • Search engines that cover search engines – many around e.g. • All4one http://all4one.com/ • four windows - good for comparison • CDNET Search.com http://www.search.com/ • meta engine of meta engines - customization • Search Engines Worldwide • 174 countries, over 1300 engines http://www.twics.com/~takakuwa/search/search.html • More on the horizon & differing © Tefko Saracevic, Rutgers University
Specialized meta engines • Selective with directories & large number of databases & search engines • Complete Planet http://completeplanet.com • Invisible Web http://invisibleweb.com • U.S. federal information via Government Printing Office Access http://www.gpo.gov/gpoaccess • Federal Bulletin Board (file libraries for download from many agencies): http://fedbbs.access.gpo.gov © Tefko Saracevic, Rutgers University
Reference (expert) services • Reference services - several models • Q&A, directories, email answers etc. – e.g. • Martindale’s Reference Desk - comprehensive http://www-sci.lib.uci.edu/~martindale/Ref.html • Ask Jeeves! – most popular http://www.ask.com/ • Ask ERIC – education questions- email answers http://www.askeric.org/Qa/ • Information Please - almanac type questionshttp://www.infoplease.com/ • Academic libraries developing reference models - new service area © Tefko Saracevic, Rutgers University
Libraries as Web sources • Academic libraries providing open collections & services; models vary • Rutgers libraries - big long term effort http://www.libraries.rutgers.edu/ • various sources & links involved • for domain information& sources go to: • Electronic Reference Sources; Subject Research Guides: Social Sciences & Law; Library & Information Science • University of California, Berkeley - a most elaborate effort together with Sun Corporation http://sunsite.berkeley.edu/ © Tefko Saracevic, Rutgers University
Virtual libraries on the Web • Libraries emerging only on the Web • More & more libraries & organizations involved Examples of academic & public libraries • Virtual Library - Switzerland, US, UK & other countries – ‘oldest virtual library on the Web’ • http://vlib.org • Toronto Public Library • http://vrl.tpl.toronto.on.ca/ • Internet Public Library, Michigan • http://www.ipl.org/ © Tefko Saracevic, Rutgers University
Domain sites • Many domain/issue specific sites • rich & often unique coverage & services • different approaches & requirements • Examples in health related domains: • Medscape - registration required http://www.medscape.com/ • Rxlist - The Internet Drug Index http://www.rxlist.com/ • Mayo Clinic HealthOasis http://www.mayohealth.org/ © Tefko Saracevic, Rutgers University
Societies, organizations , publishers • Great many rich sources for searching • differences in requirements, depth, richness Examples from variety of organizations: • Assoc. for Computing Machinery http://www.acm.org/ • Digital Library; subscription or registration • State department http://www.state.gov/ • about the U.S & other countries • R.R. Bowker http://www.bowker.com/ • Free Resources from Bowker; Library Resource Guide • Genealogy:http://www.familysearch.org/ © Tefko Saracevic, Rutgers University
Language barriers on the Web • English still the major language • but declining, now slightly over 50% • Multilingual retrieval search engines • Euroseek – searches 40 languageshttp://www.euroseek.com/ • All the Web – 45 languages http://www.alltheweb.com/ • in both, search in different languages covers primarily their language sources © Tefko Saracevic, Rutgers University
Language barriers: translations • A number of translation sites • machine aided – i.e. plug in terms, phrases, sentences in one & review in the other language , but effectiveness??? • Free Translations http://www.freetranslations.com • Babel Fish http://babelfish.altavista.com/tr • Travlang – great for travelers – phrases http://www.travlang.com © Tefko Saracevic, Rutgers University
Key professional competencies • Knowledge of SOURCES in area of interest • search engines not enough • not too helpful in finding these other sources; structure hard to discern • Evaluation of sources • a key professional skill! • standard criteria: quality, veracity, coverage etc • plus Web criteria: authority; accuracy; currency (timeliness); objectivity; coverage,persistence, usability • http://www.otterbein.edu/learning/libpages/subeval.htm © Tefko Saracevic, Rutgers University
competencies … • Knowledge of users & use • Knowledge of searching • Use of technology • Adaptability, flexibility • Integration with other resources • Teaching others • Constant learning & update © Tefko Saracevic, Rutgers University
Web is still a mystery! © Tefko Saracevic, Rutgers University