210 likes | 592 Views
You are searching a search engines database. Every result is at least ... Google has begun censoring sites for Internet users in China who are searching the Web...
E N D
Slide 1:Search Engines
Slide 2:WWW
The “Public Web” Currently estimated at over ______ pages Private or “hidden” web: ____ that! No search engine can search it all… Google: covers less than ___ of public web
Slide 3:Search Engines
How do they work? Software programs called “______” or “web ______” Follow all links Record info about pages they visit in enormous ________
Slide 4:Thus, you are NOT searching the web ________
You are searching a search engines database Every result is at least a little bit “stale” Check _________ Settings…
Slide 5:Your browser can contribute to “staleness,” too. Keeps a database of web pages, too: History ___________ Internet Files “_______” Faster to load a saved copy of a web page from _____________than to retrieve a new copy of a web pagefrom _____________
Check Browser Settings…
Slide 6:Coverage
60% of a search engine’s data appears in ___________ search engine’s database. Only about 50% of a search engine’s data appears in “all” of the other search engines’ databases… Thus, 40% of a search engine’s data does not appear in ___________search engine’s database… MORAL? Use one than one search engine…
Slide 7:e.g. Dogpile www.dogpile.com Takes your search terms and searches ________________ for you…
_______-search engines
Slide 8:Ranking
Q: When you search and multiple (!) pages result, what determines which pages get listed first? A: That search engine’s _________.
Slide 9:Common Ranking Schemes
____________ Many factors and algorithms Number of times keywords of the search appear on the page Word appears in title? Word near top of page? _____________ How many OTHER pages link to the page containing the desired search term(s) ? A Combination of these two
Slide 10:Other Ranking Factors
“_____ for Rank” Some search engines let you ___ for prominence… “__________” Tricking the ranking scheme into favoring your web pages… Politics? Google has begun censoring sites for Internet users in China who are searching the Web…
Slide 11:Key Searching Techniques
Slide 12:Phrases
Phrases in quotes e.g. “calvin college” Search on unique phrases Proper nouns “john calvin”
Slide 13:Boolean Operators
AND Usually _____ when you list multiple terms calvin college = calvin AND college
Slide 14:Boolean Operators
OR Usually must be _______ Sometimes also must be in ________ Shakespeare AND (theatre OR theater) for synonyms
Slide 15:Boolean Operators
_______! NOT Calvin NOT hobbes Google _______ sign: - Calvin -hobbes
Slide 16:“_____ words”
Common words ______ by search engine Google has _____ of them… Solution? Put in quotes – sometimes works Better: preceed with a plus: +who +are +you
Slide 17:Field Limiting - ____
Limit results to certain web page domain ____: site:calvin.edu site:.edu
Slide 18:Google
10 word limit Indexes only first 101KB of a Web page
Slide 19:Yahoo!
Must __________ OR, AND, or AND NOT. Also accepts parentheses No 10-word limit.
Slide 20:Teoma
http://www.teoma.com “Subject-Specific Popularity” ranking scheme How many __________ pages reference the page Optional follow-up steps after search results: REFINE: suggested terms to add to your search RESOURCES: communities, possibly directories, related to your search results ___________ database than Yahoo! And Google
Slide 21:__________ Pages
Search Engine Databases created by _________, not by “web crawling” software Thus, smaller databases, but _______ _______. Examples: LIBRARIANS' INDEX http://www.lii.org INFOMINE http://infomine.ucr.edu MEL (Michigan Elibrary) http://www.mel.lib.mi.us Academic Info http://www.academicinfo.net