540 likes | 1.08k Views
Welcome !. While you are waiting, please… find in your packet: Exercise 6 - Questions for the Final Exercise “What Do You Want Google to Tell You?” begin writing down your questions in three or more categories. Instructor: Joe Barker jbarker@library.berkeley.edu An Infopeople Workshop
E N D
Welcome! • While you are waiting, please… • find in your packet: Exercise 6 - Questions for the Final Exercise “What Do You Want Google to Tell You?” • begin writing down your questions in three or more categories
Instructor: Joe Barker jbarker@library.berkeley.edu An Infopeople Workshop 2005
This Workshop is Brought to You By the Infopeople Project Infopeople is a federally-funded grant project supported by the California State Library. It provides a wide variety of training to California libraries. Infopeople workshops are offered around the state and are open registration on a first-come, first-served basis. For a complete list of workshops, and for other information about the Project, go to the Infopeople Web site at infopeople.org.
Introductions • Name • Library • Position • How do you use Google?
Workshop Overview • Google’s way of “thinking” • Taking charge of the driving • Using limits to find the hard-to-get • Finding information on a subject • Special Google databases and tools • What to do when Google doesn’t work
Go to: bookmarks.infopeople.org • Click on extreme_googling_bk.htm • Make a bookmark of this page • Add to Favorites • Click on extreme_googling_bk.htm • Make a bookmark of this page • Add to Favorites
Exercise 1 How does Google “think” about your searches? Please pause and wait for discussion when you reach a
Excerpt of page with your terms • Matched terms in bold • Which Google database used • Approx. # of hits • Terms actually searched on, as Dictionary links • 2nd page from same site • All Google pages from this site • URL, size, date last crawled • Link to Cached copy • Pages supposedly like this one Don’t believe the number of Results They are approximate, changing, and not comprehensive A Close Look at Google Search Results
Default Matching on Search Terms • Default AND between terms • Google takes aFUZZYapproach • only some of the words if a page is “important” • words may occur only in pages that link to the page • words occur somewhere on the site a page belongs to • Cached reveals the page as Google found it • may differ from the current page • Cached exists if a page is full-text indexed • About 1 billion pages in Google are not cached • Not fully searchable • no Cached if a page owner requests not to be cached
How Can You KnowWhy Google Found a Page ? • Click Cache link toward end of results • top area often explains what was matched
Stemming • Google stems “when appropriate” • automatically detects word stem or root • retrieves with various endings kite flyinggetskite kites kiting fly flying, flyers, flyer’s, flyers’ • to turn off +kite +flying “kite flying” • single word searches not stemmed
Words Google Does Not Search • Common or “stop” words ignored to be or not to be • no list of “common” terms • Google tells you below search box in results • to turn off • +to +be +or not +to +be • “to be or not to be” • single word searches possible on common words
Ranking of Results • Word order matters • favoring phrases (words together) • looks for phrases with something in place of stop words • word repetition and proximity also count • Google ranking is a great mystery • PageRank combines many factors • popularity - links to a page and their importance • “importance” - a value of 0 (low) to 10 (high) • term placement - phrases, proximity, repetition See Cheat Sheet #1
Google Preferences • Interface language • Selected languages for pages • SafeSearch filtering • “moderate” is default • Number of results returned • 20 or 30 is best • Open new browser window for search results Back of Cheat Sheet #1
The Google Toolbar • Search any Google databases • Search within a site • Pop-up blocker • Search history list • Set Google preferences quickly • Customizable in Options • download from toolbar.google.com • Other browsers toolbar • download from • googlebar.mozdev.org
Exercise 2 • Installing the Google Toolbar • Customizing Preferences
Taking Charge of Driving GoogleORGetting the Mostfrom Google’s FUZZY Thinking
Improving Google’s“FUZZY” Default AND • Problems with AND default: • words can occur anywhere in results pages • may have different meanings or contexts • some pages may not contain all of your words • some may not have any of your words • Use quotation marks to require words together • turns common words into unique search terms “working mothers”145,000 5% of working mothers2,680,000 “dry cells” 11,500 1% of dry cells 1,010,000 • Hyphen makes phrases and searches with and without hyphens bite-sizedretrievesbite-sized, bite sized, bitesized
Force“FUZZY”with OR Searches • Singulars and plurals not covered by stemming parent OR parents • Equivalent or synonymous terms parent OR guardian • Misspellings libarian OR librarian • Apostrophes and their misuse april's OR aprils OR april "fools day"
hike, hikers, hiking, hikes • Take advantage of stemming Let stemming handle variant endings: “wild flowers” OR wildflowers hike “point reyes” april OR may OR spring Ask Google to be “FUZZY” • Synonym search ~ immediately before a word • sometimes “thinks” of very broad, related terms ~food recipes, nutrition, cooking ~factsinformation, statistics ~help guide, tutorial, FAQ, manual • Often: Terms appear in links pointing to a retrieved page
Ask for“FUZZY”Number Ranges • Numrange search uses . . (no spaces) babe ruth 1921..1935 results have highlighted dates within this range 3..6 megapixels digital camera most numbers will be associated with megapixels DVD player $250.. can be open-ended -- any number above starting number
The Whole-Word Wildcard:AllowingFUZZYwithin “ ” • Can’t remember the exact wording in a phrase? Who wrote something like, “The stag at night drank his fill”? Try searching: “the stag * * * his fill” OR “the stag * * * * his fill” ANSWER: “The stag at eve had drunk his fill” - in most sources --Sir Walter Scott, “Lady of the Lake” • Construct proximity searches "george bush" "george * bush" "george * * bush" "bush george" "bush * george" • Or try GAPS • www.staggernation.com/cgi-bin/gaps.cgi
Excluding to Control “FUZZIness” You want: Medical info about a pancreatitis diet • Start with:pancreatitis diet172,000 • Eliminate undesirable words in results: pancreatitis diet -cat -dog 132,000 pancreatitis -cat -dog -"support group"128,000 • Select exclusions carefully
Ask Google to be Very “FUZZY”: Related & Similar • Two commands for the same function • click Similar at end of result • search related:www.infopeople.org • Sometimes hard to see how related • linksto and from the target page • major words in and ranking of related pages • Possible uses • comparison shopping • find more sites like a site related:www.econsumer.gov • use to evaluate a suspect page
Exercise 3 • Taking Charge of Driving Google
Limiting: Words in <Title> • intitle: • finds pages concentrated on your term hybrid cars intitle:mileage 7,060 hybrid cars mileage296,000 • with quotes: intitle:”cuban embargo” 581 “cuban embargo” 28,000 • with OR: intitle:”global warming” OR intitle:”greenhouse effect” • Useallintitle:to require all words in title allintitle: hybrid cars mileage 86 • can combine only with site: allintitle: hybrid cars mileage –site:com 11
Exploiting a Page’s URL • Limiting to domain (edu, gov, etc): site:edu OR site:gov OR site:ca.us • complete list at: http://en.wikipedia.org/wiki/List_of_Internet_TLDs • Searching within a Site • site: site:memory.loc.govlincoln “sheet music” • works only in top/first part of URL • omit http:// and final / • makes Google into a search engine for pages that are indexed in Google • inurl: less specific • term may be anywhere in URLs inurl:lincoln “sheet music” • finds “lincoln” anywhere in any URL and “sheet music” somewhere in the pages
Limiting to Types of Documents • filetype: • OR to find more than one form 1040 filetype:pdf - finds forms • -filetype: • exclude certain filetypes form 1040 -filetype:pdf- finds help with forms • View as HTMLlink can be useful • avoids viruses a document might carry if opened • allows viewing without the software or reader
Caveats for Limit Commands • Cannot always be combined • link: similar: must stand alone • allintitle: allintext: allinanchor: allinurl: with site: only • You can mix all other limit commands, usually: inurl:ucla intitle:admissions statistics intitle:”thyroid disease” site:eduOR site:com • Be careful not to ask for the impossible: site:ucla.edu -inurl:edu site:com site:edu site:gov • Some require understanding HTML hypertext links: • inanchor:linkslooks for text in link tags in the HTML code: <a href="http://www.pancreasweb.com”>Pancreatitis links</a> <a href="www.pancreaticdisease.com/links/links.htm”>Links</a> See Cheat Sheet #3
Useful if you want to: Try limiting to pages updated in 3 mos, 6 mos, year Change language of results pages Select from list of filetype formats Change content filtering (also in Preferences) Not useful if you want to: Construct complex searches OR with phrases multiple phrases Use OR for more than one limiter site: filetype: inurl: Use intitle: inurl: only the allin... commands in Advanced Search Advanced Web Search pageRestricted Opportunities I almost never use it
Exercise 4 • Limiting
Finding Directories & Link Lists • EXAMPLE - looking for links or directories about: “women’s history” “middle east” • Use words likely to occur in link-list or directory pages links OR "directory of" OR guide “women’s history” “middle east” “what’s new” OR “what’s cool”“women’shistory” “middle east” • <Title> field limit to focus pages you want intitle:links OR intitle:”directory of” OR intitle:”encyclopedia of” “women’s history” “middle east” intitle:”women’s history” intitle:directory “middle east” • Are there agencies or organizations with links on this topic? inanchor:links society OR association "middle east" "women's studies" Be creative. Substitute database for “directory” to find searchable databases
Google’s Directory • 1.5+ million pages (compare with 8+ billion in web search) • DMOZ Open Directory • Google “importance” ranking within directory • EXAMPLE: women's history middle east OR eastern • Click on useful subject categories for more: Science > Social Sciences > Area Studies > Middle Eastern Studies Society > People > Women > Women's Studies > By Topic Society > Issues > Human Rights and Liberties > Regional > Middle East
Search Google for Weblogs • Current commentary, opinions, misc. musings • Google indexes “important” blogs frequently • more than most web pages • Thorough search impossible blog OR weblog OR “web log”your subject words inurl:blog OR inurl:weblogyour subject words • If you know the software a blog is using: “powered by blogger”your subject words site:blogspot.comyour subject words “powered by geeklog”your subject words • Try searching the Google Directory
Search Google Groups for Info • Usenet news groups back to 1981 • archive of UNevaluated public thoughts, advice & opinions • some not found elsewhere • select threads with more than one article for context • Search differences: • search for a group by name • search within a group • + required for common words even in “ “ “hair loss” OR "loss+of hair" OR balding group:alt.support.thyroid • use Advanced Search to limit by group or date posted • Create new mailing lists with registration
Google as Encyclopedic Glossary • Use the command define:[no space] • Google finds and ranks Web pages with definitions define:internet define:due diligence • Or build searches for pages with definitions: internet “what is” “what is the internet” “internet stands +for” internet ~beginners internet ~FAQ • Also many common facts available: population of japan currency in algeria birthplace of hitler
Exercise 5 Finding Info on a Subject • Brainstorming • How would you approach Google to solve each of the following problems? 3. Where can I find blogs about California and the use of blogs in libraries, particularly blogs to keep in touch with other librarians and libraries in the state and how they’re using blogs? 1. How can I find some good collections of links and information on migraine headaches? 2. I want to find websites directing me to good places for bird watching in Northern California. 4. Where can I find debates, from a wide range of perspectives, about what constitutes a near-death experience? I'm interested in proofs that what people report can be believed. 7. What is the currency of Nepal, and how much of it could $100 US buy as of January 15, 2004? 5. What is the birthplace of Teddy Roosevelt? 6. What is the size of California?
Shortcuts and Services • Shortcuts: • dictionaries and other definitions • phonebooks - white and yellow • movie showtimes • stocks with recent news • maps, weather • converters, math problem calculators, physical constants • number searches • UPS, FedEx, USPS, VIN, UPC codes, area codes, airplane reg. #, patents, more http://www.googleguide.com/shortcuts.html • Translate • click [Translate this page] or URL or enter text at www.google.com/language_tools • Page Info - better to enter a URL @ alexa.com Many search engines offer useful shortcuts & similar tools: See Search Cheat Sheet #4 & Supplement
“Hacking” Google URLs • Structure of a Google search result URL • Your search is for: “web searching” tutorial http://www.google.com/search? Google URL ? indicates query num=20& Number of results per page hl=en& Interface language lr=& Search language blank (ALL) safe=off& SafeSearch off q=%22web+searching%22+tutorial Query search terms %22 means quote mark + joins terms • Will vary according to your Preferences setting • You can modify results by changing values
A “Hack” for Country Searches • Type the search: egypt history 1950..1970 http://www.google.com/search?num=20&hl=en&lr=&safe=off& q=egypt+history+1950..1975 • Append in Address/URL box (no spaces): &restrict=countryEG &restrict=countryEG • General format - capitalized country code: • &restrict=countryXX • Complete country codes list: • http://en.wikipedia.org/wiki/List_of_Internet_TLDs • More countries and pages than in Language Tools search page • www.google.com/language_tools
Google’s Other Proprietary DatabasesBesides Web, Directory, and Groups • Images • 1.3+ billion • SafeSearch filter only works in English language • News • 4,500 news sources • 30 days • international versions - other news slants • Froogle for shopping • shopping sites from Google - a subset • + merchant uploads of catalogs not on the web • no fees, no pay for position • Catalogs (Google Labs still) • scanned mail-order catalogs (not web), text searchable • to navigate within a catalog, click an image and use the special catalogs navigation bar Use Advanced Search forms Useful, specific limit settings
Local Information • local.google.com • “businesses & services” from Google web database + several yellow pages • topic box • address/location box • restrict to 1, 5, 15, 45 miles away • geographic proximity, maps • EXAMPLE: vegetarian restaurants 100 Larkin St, San Francisco, CA • maps.google.com • draggable images, satellite view • local (yellow pages), driving directions • earth.google.com • requires download, 200 MB memory • exotic toy or useful tool?
Google Labs • More upcoming Google services (beta) • Sets - create and explore sequences of things • Suggest - browse possible search terms • video.google.com – some TV programs • My search history – registration and privacy considerations • Print.google.com – search only in Print database • project to make full text books available online • Scholar.google.com– special page to search from • scholarly articles (mostly) on the web • abstracts if full text not available • integrated with OCLC for library holdings • integrated with some college campuses See Cheat Sheet #5
Exercise 6 Where would you look? • Choose ONE or TWO questions to answer • Write down what you did & learned • It’s O.K. to talk, ask questions, and help each other as needed
Other Effective Search Engines • Yahoo Search (3+ billion) • no 10-word limit • accepts ( ) around Boolean OR (“global warming” OR “greenhouse effect”) (site:edu OR site:gov OR site:uk) • pay-for-position sites not identified • Teoma (1+ billion) • popularity within subjects • sometimes finds link collections as Resources
Bookmarklets for Searching • Java Script applications that reside in your Bookmarks or Favorites (Favlets) • Search engine tools: • run a search in another search engine @Teoma @Yahoo! • search highlighted text in a search engine • Information and more about them at searchengineshowdown.com/bmlets