530 likes | 753 Views
Effective Web Searching. Dr. I.R.N. Goudar Visiting Professor-cum- Library Adviser University of Mysore Refresher Course UGC- Academic Staff College University of Mysore. Organization of the Web. Web is the totality of web pages stored on web servers
E N D
Effective Web Searching Dr. I.R.N. Goudar Visiting Professor-cum- Library Adviser University of Mysore Refresher Course UGC- Academic Staff College University of Mysore
Organization of the Web... • Web is the totality of web pages stored on web servers • Spectacular growth in web-based information sources and services: • Education and research • Entertainment • Business and commerce • Personal home pages • Estimated to contain over 1 billion indexable web pages • Doubling each year • Over 80 million web sites
Finding relevant documents on the Web • Informal: • Browsing (and book marking for later use) • Friends • Print sources • Discussion forums (mailing lists) • Current awareness services (e.g. Scout Report) • Guessing web site addresses! • Formal (using information finding tools) • Web directories/ guides • Web search engines • Meta-search tools • Specialty search engines
Three Types of Internet Searching Tools • Subject Directories or Subject Trees such as Yahoo. • Search Engines such as Google, Teoma, and Alta Vista. • Metasearch Engines such as Dogpile and Mama, andixquick
Limitations • Anyone can put up a web page • Many pages not updated • No quality control • most sites not “peer-reviewed” • less trustworthy than scholarly publications
Web Directories/ Guides • Also called as ‘virtual libraries’ and ‘Internet resource catalogues’ • Organised collection of descriptions and links to Internet sources • Organisation: by subject categories (hierarchical); by resource type (patents, e-journals, institutes, etc.) • Most use human experts for source selection, indexing and classification • Some include reviews/ ratings of listed sites
Web Directories/ Guides... • Examples of general web directories: • Internet Public Library(http://www.ipl.org/) • Britannica’s “Web’s best sites” (www.britannica.com) • Infomine (infomine.ucr.edu) • Scout Report Signpost (www.signpost.org) • BUBL link (bubl.ac.uk/link) • Yahoo (www.yahoo.com) • Magellan (www.mckinley.com) • Galaxy (www.galaxy.com) • Looksmart (www.looksmart.com) • Snap (www.snap.com)
Web Directories/ Guides... • Guides to directories: • WWW Virtual Library (www.vlib.org) • Subject-specific guides (subject gateways): • Intute (http://www.intute.ac.uk/) • IOP Physicsworld.com (http://physicsworld.com/) • Chemcenter(www.acs.com) • Programmers Heaven (www.programmersheaven.com) • Resource type guides: • Patents (www.european-patent-office.org) • Electronic journals (www.publist.com)
Web Search Engines • Web search engines build a full-text index to web pages gathered from web sites and provide a keyword search interface to this index • Spider programs periodically visit web sites and gather the web pages for indexing • Also index web sites submitted by site developers • A brief summary of the indexed web page is also prepared • The index usually contains URLs, titles, headings, and other words from the HTML document
Web Search Engines... • Examples: • Fastsearch (alltheweb.com) • Altavista (www.altavista.com) • Google (www.google.com) • Northernlight (www.northernlight.com) • HotBot (www.hotbot.com) • Excite (www.excite.com) • Teoma (http://www.teoma.com/)
Web Search Engines... • Specialty search engines: • Country-specific search engines • www.khoj.com • www.123india.com • Subject-specific search engines • Chemfinder (www.chemfinder.com) • Engineering Resources Online (www.er-online.co.uk) • MathSearch (www.maths.usyd.edu.au:8000/MathSearch.html) • World Trade Locator (www.intl-tradenet.com) • Resource-specific search engines: • Patents (www.uspto.gov) • Journal articles (www.findarticles.com)
Meta Search Tools • Also know as multi-threaded search engine • Allows the user to search multiple databases simultaneously, via a single interface and return results in a uniform format • Presents a summary of the collected results from other search engines and directories • Do not gather web pages, build indexes, accept URL additions, classify or review web sites • Some features supported: • Duplicate hits removal • Rank results • Selection of search engine(s) to be used
Meta Search Tools... Search using multiple search engines Search using a meta search tool
Meta Search Tools... • Meta search tools (remote sites): • MetaCrawler (www.metacrawler.com) • Ixquick (www.ixquick.com) • Dogpile (www.dogpile.com) • Meta search tools (local, installable software): • Copernic (www.copernic.com) • LexiBot (www.completeplanet.com)
People Finding Tools • Register names and addresses and find e-mail addresses • Examples: • Bigfoot (www.bigfoot.com) • Peoplesearch (www.peoplesearch.net) • Ahoy (ahoy.cs.washington.edu:6060/) • Four11 (www.four11.com) • Switchboard (www.switchboard.com) • Whowhere (www.whowhere.lycos.com/) • Most search engines also support people searches (e.g. Altavista, Google, Yahoo!)
Web Search Strategies • Search steps: • Analyze the search topic and identify the search terms, their synonyms (if any), phrases and Boolean relations (if any) • Select the search tool(s) to be used (meta search engine, directory, general search engine, specialty search engine) • Translate the search terms into search statements of the selected search engine • Perform search • Refine the search based on results • Visit the actual site(s) and save the information (using File-Save option of the browser)
Google (www.google.com) • Enables users to search the Web, images, etc. • Features: PageRank, caching and translation, an option to find similar pages. • The focus is developing search technology. • Ranked #1 in the world
Google • Largest & Most Popular Search Engine • 8 Billion + Pages Indexed • Very Effective Advanced Search Features • Limit searches by domain, ie. Site:edu • Limit searches by format, ie. .pdf, • Specialized Search Tools • Images, Directory, Videos, Books, Scholar, News, Blogger
How Google works • BEFORE you search:“Crawls” pages on the public webCopies text & images, builds database • WHEN you search:Automatically ranks pages in your results • Word occurrence and location on page • Popularity - a link to a page is a vote for it • ~ 200 factors in all!
Limit your search to … • Web page titleintitle:hybridallintitle:hybrid mileage • Website or domainsite:whitehouse.gov “global warming”site:edu “global warming” • File typefiletype:pptsite:edu “global warming” • Definitionsdefine:pixeldefine:“due diligence”
On the results page • Search box (use to modify) • “Cache” • “Related pages” • “Translate this page”
Searching for Pictures • Searching for images is easy! • From the main page of the search engine, select images or pictures before entering your search term.
Google Scholar (scholarly literature=articles, books) • Google Books (books) • Google Directory (handpicked specific topical sites)
Beyond Google • Take advantage of human selectivityLibrarians’ Internet IndexInfoMineGoogle Custom Search Engines (CSE)
Web Directories/ Guides... • Most web directories support searching within categories and descriptions, in addition to browsing • Advantages: • Access to high quality sources • Do not contain redundant links • Faster access to sources • Disadvantages: • One needs to be aware of such directories/ guides • May not be up-to-date • May not be exhaustive • Categories (subject hierarchy) varies across directories
Web Directories/ Guides... • When to use web directories/ guides? • For broad/ general topics where keyword searching on search engines retrieves too many irrelevant sites • When you want a few highly relevant sites and intention is not exhaustive/ comprehensive search • When not to use web directories/ guides? • For concept/ keyword searches • Search terms are distinctive • Effective directory/ guide usage: • Take advantage of the sub-search within categories, supported by most directories/ guides • Join their mailing lists for automatic updates on new sites
Web Search Engines... • The search engines provide a forms-based search interface for entering the queries • Support simple and advanced search interfaces • Search results are returned in the form of a list of web sites matching the query • Some key features supported: • Phrase searching (“…” double quotes) • Boolean searching (AND, OR, NOT) • Implied Boolean: Term inclusion (+), term exclusion (-)
Web Search Engines… • Key features… • Proximity searches (NEAR, ADJ, BEFORE, AFTER) • Use of parentheses to group search terms • Truncation searches (‘industr*’) • Field-specific searching (Title, URL, Text) • Natural language queries (‘Why is the sky blue?’) • Relevance ranking of search results • Number of search terms • Number of times each search term occurs • Proximity of search terms • Location of search terms (title, text)
Web Search Engines… • Key features… • Sub-searching (searching within retrieved records) • Case sensitivity • Limit by language • Limit by age of documents • Limit by audio, video and image type • Translation of search results (title and description) • Limit by domain, host
Web Search Engines... • Example tutorials • Finding Information on the Internet: A tutorial (www.lib.berkeley.edu/TeachingLib/Guides/Internet/FindInfo.html) • How to search the world wide web: A tutorial for beginners and non-experts. David P. Habib and Robert L. Balliot. September, 1999 (204.17.98.73/midlib/tutor.htm)
Web Search Engines... • Advantages of search engines: • Best suited for complex keyword/ concept searches • Control over search: search terms can be combined as required • Searches can be limited to period of time, fields, source type,etc. • Currency of information, made possible by regular addition by web spiders • Exhaustive information can be retrieved (with lots of patience!) • Disadvantages: • Time consuming • False positives • Search engines vary in terms of search techniques/ syntax • Dead links, redundant links (same document gets displayed) • Spamming (‘salting’ of pages) • Higher ranking of paying sites
Web Search Engines... • Limitations of web search engines: • Poor retrieval effectiveness (relevance) as little vocabulary control is exercised by web site developers and the index engines • Different search engines return different search results due to the variation in indexing and search process (40% non-overlap) • None of the search engines come close to indexing the entire web, much less the entire Internet. Content not indexed: • PDF documents • Content that requires log in • Databases searched using CGI programs • Web content on intranets behind fire walls
Top Sites The top sites on the web, ordered by Alexa Traffic Rank. • 1. Google • 2. Facebook • 3. Youtube • 4. Yahoo • 5. Live • 6. Baidu • 7. Wikipedia • 8. Blogger • 9. MSN • 10. Tencent • 11. Twitter
Google - Enables users to search the Web, images, etc. - Features: PageRank, caching and translation, an option to find similar pages. - The focus is developing search technology.- Ranked #1 in the world according to the three- month Alexa traffic rankings.
Yahoo! • yahoo.com • Personalized content and search options. Chatrooms, free e-mail, clubs, and pager. • Ranked #4 in the world • The site is in the “Web Portals” category.
Wikipedia • wikipedia.org • An online collaborative encyclopedia. • Wikipedia is ranked #7 in the world • It has been online for at least nine years. • The site's audience tends to be users who browse from school and work
Meta Search Tools... • When to use meta search tools? • Need to be used cautiously • Good for simple searches, particularly if search terms are distinctive or unique • Good for testing with a few keywords – and find which individual search engine returns good results • Good for ‘quick and dirty searching’ if you are in a hurry and want to find a few relevant sites quickly • For complex searches, involving many search terms, Boolean logic, etc., it is better to use individual search engines
Meta Search Tools... • Advantages: • Query can be run across multiple search engines • User needs to learn only the search interface of the meta search tool • Better results: retrieves top-ranking pages from individual search engines • Disadvantages: • Unique features of individual search engines is lost • Not exhaustive: use only top results returned by search engines
People Finding Tools • Using people finding tools: • Person should have registered in the tool(s) • Searcher should know both surname and first name, else too many names will be retrieved • Bias for U.S. –based people • Often, required e-mail cannot be retrieved through these tools • Alternatively, any search engine may be used (phrase search using person’s name) • If person’s affiliation is known, Yahoo! Directory may be used to locate the institution and e-mail
Web Search Strategies • Tips for effective web searching: • Broad or general concept searches: start with directory-based services (want a few highly relevant sites for a broad topic) • Highly specific or topics with unique terms/ many concepts: use the search tools • Go through the ‘help’ pages of search tools carefully • Gather sufficient information about the search topic before searching • Spelling variations, synonyms, broader and narrower terms • Use specific keywords, rare/unusual words are better than common ones
Web Search Strategies... • Tips for effective web searching… • Prefer phrase & adjacency searching to Boolean (‘stuffed animal’ than ‘stuffed’ and ‘animal’) • Use as many synonyms as possible - search engines use statistical retrieval methods and produce better results with more query words • Avoid use of very common words (e.g., ‘computer’) • Enter search terms in lower case. Use upper case to force exact match (e.g. ‘Light Combat Aircraft’, ‘LCA’) • Use ‘More like this’ option, if supported by the search engine (e.g. Excite, Google)
Web Search Strategies... • Tips for effective web searching… • Repeat the search by varying search terms and their combinations; try this on different search tools • Enter most important terms first - some search tools are sensitive to word order • Use the NOT operator to exclude unwanted pages (e.g.: bio-data, resumes, courses) • Go through at least 5 pages of search results before giving up the scan • Select 2 or 3 search tools and master the search techniques
Sample Web Searches • “Companies dealing with polymers” • Do not use search engines (too many irrelevant hits) • Use directory sources (e.g. www.yahoo.com) • Follow the categories: • Business and Economy • Business-to-Business • Chemicals • Do a sub-search on ‘Polymers’ • Use specialty search engines (e.g. www.bizweb.com)
Guides to Search Tools • www.beaucoup.com (guide to 2,000+ search engines, indices and directories) • www.searchpower.com (a very comprehensive search engine directory - claims over 16,000 search engine listings!) • www.123go.com/drw/search/search.htm (Dr. Webster’s Big Page of Search Engines ) • www.finderseeker.com (The search engine of search engines) • www.virtualfreesites.com (Over 1,000 specialised search engines)
Keeping Current • AskScott (www.askscott.com): Provides a very comprehensive tutorial on search engines • SearchEngineWatch (www.searchenginewatch.com) The site offeres information about new developments in search engines and provides reviews and tutorials. • Botspot (www.botspot.com): Collection and guide to variety of bots (intelligent agents)
Web Search Engines... • Demonstration of search engines: • Fastsearch (www.alltheweb.com) • Altavista (www.altavista.com) • Google (www.google.com) • Northernlight (www.northernlight.com)