1 / 28

SearchMagic

Tips for Searching the World Wide Web. SearchMagic. 21 January 2014 Slides at: http://www.colket.org/genealogy/USF/. Syllabus. Tips for Searching the Internet Instructor: Currie Colket Phone : Google Search for: colket 941

neil
Download Presentation

SearchMagic

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Tips for Searching the World Wide Web SearchMagic 21 January 2014 Slides at: http://www.colket.org/genealogy/USF/

  2. Syllabus Tips for Searching the Internet Instructor: Currie Colket Phone: Google Search for: colket 941 Classes 1:00 PM to 2:20 PM Lifelong Learning Academy: University of South Florida 7 January– Overview of Internet, Static & Dynamic Searches 14 January – Search Shortcuts; Advanced Static Searches 21 January – Google Magic; Google Books; Google Scholar; 28 January – Searching Images (Photos); Videos; Maps; Google Earth 4 February – Virtual Travel; Downloading; Safety & Security; 11 February – No Class 18 February – Dynamic Databases; Archive Grid; 25 February – Searching Translations; Researching in other languages Slides at: http://www.colket.org/genealogy/USF/

  3. Google Magic Any sufficiently advanced technology is indistinguishable from magic. * * Arthur C. Clarke, "Profiles of The Future", 1961 (Clarke's third law); English physicist & science fiction author (1917 - )

  4. Magic ExplainedGoogle searches Have: • Crawler (or Spider) • To crawl the web so pages can be analyzed for content • To keep up to date, entire web is crawled every month or so • Indexer (Analyzer) • If word is not in index, puts the word in the index • Provide a link to URLs containing the word • Enters page into cache • Orders the URLs for each • term by relevancy • Search Engine • Searches only the Indexes and Cache • Results are presented in order of relevancy to the Search Words You Enter Search Terms Does the Crawling/Indexing about once a month Indexes frequently updated pages more often (e.g., news, weather, UPS) Disclaimer: Process is proprietary; Explanation is likely simplified Process

  5. Magic ExplainedCrawler (Spider) - 1 Spider Webs Have Web Crawlers Visit Each Node

  6. Magic ExplainedCrawler (Spider) - 2Start with List of URLs Network Solutions maintains Domain Name Lists of Registered Names

  7. Magic ExplainedCrawler (Spider) - 3 Registered URLs 1st URL 2nd URL www.colket.org last-1 URL last URL Index Home Page Index Sub Nodes Index Home Page Index Sub Nodes Index Home Page Index Sub Nodes Index Home Page Index Sub Nodes Index Home Page Index Sub Nodes • Each Home Page is a node; • Each Directory is a sub node. • Visible Nodes in Green • Invisible Nodes in Red aspnet_client cgi-bin doreen genealogy gif log Paige MGS MGS1 MGS2 MGS3 MGS4 MGS5 USF Book GSS http://www.colket.org/genealogy/USF/

  8. Magic ExplainedCrawler (Spider) – 4View of Colket Home Page Directory www.colket.org Nodes Actual Home Page on WWW Actual Home Page On Local Computer

  9. Magic Explained Crawler (Spider) - 3 • Crawls through 172 • Country Domains Arab Emirates Registered URLs Australia Registered URLs Norway Registered URLs Japan Registered URLs United Kingdom Registered URLs United States Registered URLs Andorra Registered URLs Zambia Registered URLs Worldwide Crawler

  10. Magic ExplainedGoogle searches Have: • Crawler (or Spider) • To crawl the web so pages can be analyzed for content • To keep up to date, entire web is crawled every month or so • Indexer (Analyzer) • If word is not in index, puts the word in the index • Provide a link to URLs containing the word • Enters page into cache • Orders the URLs for each • term by relevancy • Search Engine • Searches only the Indexes and Cache • Results are presented in order of relevancy to the Search Words You Enter Search Terms Does the Crawling/Indexing about once a month Indexes frequently updated pages more often (e.g., news, weather, UPS) Disclaimer: Actual process is proprietary; Explanation is likely process

  11. Magic Explained Indexer - 1 • Scans each page for words • Adds/Creates Index entry with reference to URL for each word • Scans each page for hidden words • Title, keywords, etc. • Scans each page for external links • Increments counter associated with each linked URL • Copies each page to a cache Amongst Many Other Things Does the Crawling/Indexing about once a month Google indexes ~8 billion Web pages each month

  12. Magic Explained Indexer – 2Example with Currie’s Home Page Snippet of Index colket list of URLs condo currie’s email guide hot june last list netflick updated user webmail www.colket.org yahoo “ “ “ “ “ “ “ “ “ “ “ “ “ “ “ “

  13. Magic Explained Indexer – 3Example with Source of Currie’s Home Page Snippet of Title & Keyword Index Currie Colket’s Home Page list of URLs Snippet of Image Index flashing.gif list of URLs (these are local links)

  14. Magic Explained Indexer – 4Example with Source of Currie’s Home Page Snippet of Link Counters Increment Counter for mail.colket.org Increment Counter for mail.riversidecondo.com Increment Counter for login.yahoo.com/config/mail Increment Counter for mail.google.com Increment Counter for www.aol.com Increment Counter for www.verizon.net/central/ Increment Counter for www.emailuserguide.com … Internal Link External Links

  15. Magic Explained Indexer – 5Almost the End of Indexer Process Index colket list of URLs condo currie’s email guide hot june last list netflick updated user webmail www.colket.org yahoo 2013 Title & Keyword Index Currie Colket’s Home Page list of URLs Cache http:// www.colket.org www.colket.org/doreen www.colket.org/genealogy www.colket.org/genealogy/book www.colket.org/Paige Image Index flashing.gif list of URLs Link Counters Increment Counter for mail.colket.org Increment Counter for mail.riversidecondo.com Increment Counter for login.yahoo.com/config/mail Increment Counter for mail.google.com Increment Counter for www.aol.com Increment Counter for www.verizon.net/central/ Increment Counter for www.emailuserguide.com Plus many other Artifacts

  16. Magic Explained Indexer – 6End of Indexer Process ***Magic*** Order URLs based on number of counters for each indexed term Index colket list of URLs condo currie’s email guide hot june last list netflick updated user webmail www.colket.org yahoo 2013 No longer in alphabetical order, but in order of greatest number of hits. Link Counters for each Indexed Term Increment Counter for “Colket email”mail.colket.org Increment Counter for “Riverdance”mail.riversidecondo.com Increment Counter for “Yahoo”www.yahoo.com Increment Counter for “gmail”mail.google.com Increment Counter for “AOL” www.aol.com Increment Counter for “Verizon” www.verizon.net/central/ Increment Counter for “User Guide” www.emailuserguide.com Now each list of URLs is ordered by “Relevancy”

  17. Magic Explained Indexer – 7Links Actually my home page has 174 links I am helping to increase the counts for each of these organizations To make their home pages relevant

  18. Magic ExplainedGoogle searches Have: • Crawler (or Spider) • To crawl the web so pages can be analyzed for content • To keep up to date, entire web is crawled every month or so • Indexer (Analyzer) • If word is not in index, puts the word in the index • Provide a link to URLs containing the word • Enters page into cache • Orders the URLs for each • term by relevancy • Search Engine • Searches only the Indexes and Cache • Results are presented in order of relevancy to the Search Words You Enter Search Terms Does the Crawling/Indexing about once a month Indexes frequently updated pages more often (e.g., news, weather, UPS) Disclaimer: Actual process is proprietary; Explanation is likely process

  19. Magic Explained You Activate Search Engine • You go to http://www.google.com • You enter up to 10 search terms and • Hit Enter Currie’s Hot List

  20. Magic ExplainedGoogle searches Have: • Crawler (or Spider) • To crawl the web so pages can be analyzed for content • To keep up to date, entire web is crawled every month or so • Indexer (Analyzer) • If word is not in index, puts the word in the index • Provide a link to URLs containing the word • Enters page into cache • Orders the URLs for each • term by relevancy • Search Engine • Searches only the Indexes and Cache • Results are presented in order of relevancy to the Search Words You Enter Search Terms Does the Crawling/Indexing about once a month Indexes frequently updated pages more often (e.g., news, weather, UPS) Disclaimer: Actual process is proprietary; Explanation is likely process

  21. Magic Explained Search Engine – 1 • Uses primarily the index and cache to search terms • Finds interception of index items for 1st two search terms. Index colket condo currie’s email guide hot june last list netflick updated user webmail www.colket.org yahoo 2013 Note: Each number represents a unique URL 1st 14 34 46 12 19 22 78 2nd 17 35 34 28 22 95 24 3rd New list For 1st & 2nd 34 22 10 74 93 Interception of Index Items for 1st two search terms Note: Each list is ordered by Relevancy

  22. Magic Explained Search Engine – 2 • 3. Finds interception of index items for 1st two search terms with third search term. New list For 1st & 2nd Index colket condo currie’s email guide hot june last list netflick updated user webmail www.colket.org yahoo 2013 1st 7 76 10 93 86 71 69 2nd 10 34 22 93 10 6 74 29 11 93 3rd New list For 1st & 2nd & 3rd Resulting in a list ordered by Relevancy Interception of Index Items for 1st three search terms

  23. Magic Explained Search Engine – 3 • Ditto for up to ten search terms • Results are ordered by URLs with search terms in proximity, best order of terms, with highest link count • Top 10 Results are presented as Search Results along with sponsored links associated with each search term • Cached page is analyzed for snippets of search terms 10 Resulting in a list ordered by Relevancy 93 6 29 11 List For 1st & 2nd & 3rd Index colket condo currie’s email guide hot june last list … 1st Presented with sponsored links associated with search terms 2nd 3rd Steps 1-7 can be done in milliseconds With a very high speed computer Warning: Sponsored Links might only match 1 search term

  24. Magic Explained Search Results – 1 Arrrrrrrrrrrrrggggggggggggghhhhhhhhhhhh Over 1 Million Results

  25. Magic Explained Search Results – 2 Aaaaahhhhhhh Much Better Only 8 hits with Quotes

  26. Magic ExplainedRamifications of Magic • Information presented in Search Results could be out of date • Relevancy is slanted towards first search term; then second … • Index is humongous – unless your Search Results are on the 1st page, you need to find a way to limit the search • Information searched on WWW is ONLY information the web designer is making explicitly available to Web Crawlers • Algorithm used for Relevancy may not be useful to your search Although key web pages are crawled more frequently e.g., Weather, News; UPS Tracking Numbers Very Important Hence if doing a genealogy search, consider putting Surname first John Smith => 104,000,000 hits Smith John => 73,000,000 hits Use Advanced Search Filters Very Important Many important databases are not searched As they are proprietary or simply not made available Still, Google is an extremely powerful Search Tool FYI, Bing.com should operate in a similar manner

  27. Relevancy can be increased with more backlinks Many firms will help you become more relevant

  28. Questions Please Ask Questions if you do not Understand Anything

More Related