1 / 32

Using Google for Genealogical Searches

Learn how to effectively use Google for genealogical searches to find relevant information. This article discusses the history of browsing, the problem of searching, and the solution offered by Google. Discover Google search basics and tips for getting more relevant search results.

Download Presentation

Using Google for Genealogical Searches

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. G Manatee Genealogical Society MGS Computer Special Interest Group (SIG) gle Using Google for Genealogical Searches3 March 2015 4 http://www.colket.org/genealogy/MGS/

  2. Overview • History of Browsing • Problem of Searching • Solution to Search Problem • Google Search Basics • Search Results Using Google for Genealogical Searches3 March 2015

  3. Internet Search Static Searches Dynamic Searches ss Non Indexable Nodes Non Indexable Nodes Indexable Nodes Private Databases Fee/membership (e.g., Ancestry, Professional, News) Many available with Library membership Commercial Databases Shopping Or Limited to employees and customers only Public Databases City, County, State Federal Records Dark Web Use Google, Bing, or other Search Engine Every word on Page Is indexed with web crawler

  4. Static Searches Have Web Crawlers Visit Each Node For “Public Domains”

  5. Who Invented the Internet?

  6. History of Browsing • Early on very cumbersome • Generally login to a desired computer and search based on the directory • Every computer had its own directory structure and search application(s) • In 1980, Tim Berners-Lee proposed and prototyped ENQUIRE, a system to share documents • In 1990, he collaborated with Robert Cailliau on a joint proposal for the World Wide Web (WWW) or W3 project for a protocol to share information using hypertext. • Became HyperText Markup Language (HTML) – defined using text • This allowed people to organize information they wanted to share with Links to the information or files which could then be downloaded • Requires a browser that could read these HTML files using a protocol called: • HyperText Transfer Protocol (HTTP) • Many commercial browsers available today • Internet Explorer (IE), Safari, Netscape, Mozilla Firefox, etc. • Even Google has its own browser called “Google Chrome” • You need a current browser to access latest information

  7. Problem with Searching • Many search applications developed based on HTML BUT • Search on Coke –117,000,000 hits • Many of these are menu items at restaurants – Much useless information • You have hits from every restaurant that has coke on its menu • If you are interested in Coca-Cola headquarters in Atlanta, it may not appear until item 23,672,344 How do you get RELEVANT hits???? How do you get hits ordered so that Relevant Hits are Ordered in a way that facilitates use???? Google found a way to “solve” this problem;

  8. What’s a Google?

  9. Solution to Search Problem - 1 • 1995, Sergey Brin and Larry Page while students at Stanford came up with a concept of using the strength of the Internet community. • Their technology evaluated a site primarily on how many other sites linked to it and ranked search results accordingly. • The technology was called PageRank (named for Larry Page) although, it does rank pages as to which page is most important. • PageRank tended to return results that people found useful, • Resulting in a surprisingly valuable system • PageRank was patented by Stanford University. • In 1997, BackRub was a PageRank application so called because the technology analyzed what was • going on behind the scenes. • Fall, 1997 BackRub became Google • http://infolab.stanford.edu/~backrub/google.html • Sergey Brin and Larry Page purchased the • exclusive licensing rights to PageRank for • 1,800,000 shares of Google from Stanford $1.56B

  10. Solution to Search Problem - 2 • Google is an adaption of googol. A googol is the number 1 followed by 100 zeros (10E100). (from Hitchhikers Guide to the Galaxy). This reflects the number of WWW pages it searches. • In 1998, they dropped out of Stanford to develop Google. • Set up shop in the Menlo Park garage of Susan Wojcicki • 1998, 50 employees. 7 million searches a day. • By 2005, Google was having 250 million web searches per day. • Sergey Brin’s Net Worth is 29.9 Billion Dollars (17th richest in the world in 2014) • Larry Page’s Net Worth is 29.8 Billion Dollars (18th richest in the world in 2014) • Google headquarters, the Googleplex, is located in Mountain View, California. As of March 31, 2009, the company has 19,786 full-time employees; 46,170 by May 2014 - 68 Worldwide locations

  11. Solution to Search Problem - 3 Most Relevant Results First 11

  12. Google Search Basics - 0 • Ready to do some Google Searching • Still a Big Problem • Need to find a way to reduce results • Google Basics Discusses way to do this on Search Query • Google Results discusses ways to do this on Results Page Simple Surname search yields millions of results Colket => 89,600 results Pelot => 477,000 results Reger => 7,650,000 results Sparrow => 63,900,000 results Johnson => 978,000,000 results Smith => 1,500,000,000 results

  13. Google Search Basics - 1 • Google cares about: • Singular versus Plural – “apple” versus “apples” • Order Of Words is Important for Ranking • “brown bear” – things named “Brown Bear” first – 20,800,000 Hits • “bear brown” – emphasis on bears – 87,000,000 Hits • Spelling is Important • Names originating in another alphabet have many valid transliterations • Mohamed, Mohammed Pelot, Pelote, Pelotte • Google does not care about: • Case Sensitivity – Hence “Samuel Pelot” = “samuel pelot” • Little Words Ignored – such as I, where, how, the, of, an, for, from, how, it, in, is, single digits, single letters. If desired, use quotes. • The who Is a Band • Punctuation – MOST PUNCTUATION IS IGNORED. … Exceptions to These Rules Suggest putting Surnames first – Pelot Samuel Sometime Get Spelling Suggestions Sometimes Use Misspelled Queries

  14. Google Search Basics - 2 • – Apostrophes are meaningful • Hence Pauls, Paul’s, and Pauls’ require 3 different searches. • – A “-” before a word excludes terms – later • – A “-” between 2 or more words strongly connects the words: • Example: twelve-year-old dog almost like “twelve year old” • – A “-” by itself is ignored • – A “_” between 2 or more words also strongly connects the words • Underscore when between 2 words as formal name: Quick_Sort • Mary_Beth Underscore treated as a search for • MaryBeth | Mary Beth | Mary_Beth • – Quotes require exact match – later • Exceptions: • Punctuation in proper names: Google+ AB+ C++, A# • $ is understood to be dollars “Nikon $400” ≠ “Nikon 400” • Ditto for ¢, £, ¥. Etc. • @ is understood to be an email address e.g., colket@colket.org • Hashtags are understood to be trending topics • #newenglandpatriots

  15. Google Search Basics - 3 • Exact Order; Exact Phrase – Use quotation marks. • This techniques is especially useful for genealogy – very different results for • Samuel George Pelot versus “Samuel George Pelot” • George Samuel Pelot versus “George Samuel Pelot” • Huh??? Should get the same number – Why??? • What about the middle name? • Some sources report as initial or no middle initial (nmi) • “Samuel Pelot” • “Samuel G Pelot” • “Samuel G. Pelot” • “Samuel nmi Pelot” 11,000 Hits 37 Hits 8,670 Hits 0 Hits Does not exist 231 Hits 24 Hits Most Punctuation is ignored 24 Hits 0 Hits 87,200 Hits with G. 3,390,000 Hits with Graham 410,200 Hits Remember, a search for “Alexander Bell” will miss hits for “Alexander G Bell”

  16. Google Search Basics - 4 • Search Within Site/Domain – Identify site in query: • iraqsite:nytimes.com – returns hits on “Iraq” in NY Times only • iraq site:.gov returns hits only from a .gov domain • iraqsite:.iq returns hits only from an Iraq domain • Good for genealogy research: • Pelotsite:nytimes.com NY Times only • Pelot Worldwide • Pelot site:.fr French Domain • Pelot site:.ch Swiss Domain • Pelot site:.ca Canadian Domain • Pelot site:.us US Domain (not null) • Pelot site:.mil US Military Domain • Pelot site:.gov US Government Domain • Pelot site:.biz US Business Domain 157 Hits 394,000 Hits 14,700 Hits 1,070 Hits 2,900 Hits 2,410 Hits 89 Hits 947Hits 5,480 Hits

  17. Google Search Basics - 5 • Exclude Terms – Use “-” preceded by a blank • Say searching for anti-virus stuff for humans: • anti-virus • includes antivirus, anti virus, and anti-virus” • anti-virus -software • jaguar -cars -football • and for the poor fellow with the surname of “Sparrow” • Sparrow • Sparrow -bird • Sparrow -bird -book Note: “-” is part of the word for “anti-virus” Strongly Connected 132,000,000 Hits 79,100,000 Hits Can use multiple negations 63,400,000 Hits 60,400,000 Hits 45,500,000 Hits Note: Combinations of Search Terms can be effective

  18. Google Search Basics - 6 • OR Operator – Sometimes you want hits for either/or • Use cap “OR” or OR Operator “|” • Tampa Bay Buccaneers 2,620,000 Hits • Tampa Bay Buccaneers 2004 298,000 Hits • Tampa Bay Buccaneers 2005 409,000 Hits • Tampa Bay Buccaneers 2004 2005 206,000 Hits • Tampa Bay Buccaneers 2004 OR 2005 726,000 Hits • Tampa Bay Buccaneers 2004 | 2005 726,000 Hits • Exceptions: Phrases such as “FOR BETTER OR FOR WORSE”

  19. Google Search Basics - 7 • Feeling Lucky– Gives you the first page. • Wild Cards • – Use a “*” – Works on words, not parts of words • – Use a “?” – Single characters (Officially not in Google) • For Questions: “"How often does Halley's comet appear?“ • Pose as: Halley’s Comet appears every * years – it’s 76 years • Also for unknown middle names Samuel * Pelot10,700,000 Hits • Difference for “Samuel * Pelot“ 7,910,000 Hits • Difference for “Samuel ? Pelot“ 624 Hits • Note: For Samuel Pelot801,000 Hits • and For “Samuel Pelot“ 616 Hits • Ten Word Limit – Search terms over 10 are ignored

  20. Google Search Basics - 8 • Misspellings – Try alternative spellings • thousands of Web sites mention Arnold Schwarznegger70,000 Hits • though the governator spells his name "Schwarzenegger” 34,500,000 Hits Google recognizes some misspellings and provides alternatives New since Mar 2010

  21. Google Search Basics - 9 Not Advertised Google Tool, But Common Search Tool (e.g., Archive Grid) – Seems to be Useful With Google • Proximity Search Proximity Search “Samuel Pelot”~3 Hits for: Samuel Pelot 801,000 Hits “Samuel Pelot” 616 Hits “Samuel George Pelot” 27 Hits “Samuel G Pelot” 73 Hits “Samuel Pelot”~2 351 Hits (catch initial) “Samuel Pelot”~3 190 Hits “Samuel Pelot”~4 158 Hits “Samuel Pelot”~7 126 Hits “Samuel Pelot”~10 173 Hits

  22. Google Search Basics - 10 • Keep Search Terms Simple • Most Queries do not require advanced operators or unusual syntax • Simply enter name, place, product, or concept, • Simple is good • Think of terms likely to be on result pages • Don’t use My Head Hurts • Instead use Headache {term likely found on medical page} • Describe what you want in as few words as possible • Use Weather Cancun • Instead of Weather Report for Cancun Mexico • Choose Descriptive Terms • Use Celebrity Ringtones • Instead of Celebrity Sounds

  23. Google Results - 1 Start Search Search Term(s) Advanced Search (Controls For Advanced Search Options) Filters Result Statistics Link Sponsored Links Uniform Resource Locator (URL) Sometimes Similar Pages Cached Pages Result Links Snippet 23

  24. Google Results - 2 • Ordered By Relevance [Indented same site, less relevant] • Also sponsored links, links to news stories, Ads • True, unpaid results are on the lower left • Ads are on the right (no more than 10 per page) • Sponsored Links on top (Ads, at a higher rate; colored background) • True Unpaid Search Results => • Title • Text from site with Snippets of your search terms (in bold) • URL => Uniform Resource Locator • Size • Date – NOT created/updated, but when last crawled • Dataset in Jul crawl of 2014 is over 266TB containing 4.05 billion webpages • Indication if Cached – Good place to go if Page Removed • URL goes to current page • Cached link goes to cached page – handy if page deleted or link broken • Cached version is used to highlight key words • File Format • .html use browser • .pdf – read with Adobe’s free reader at www.adobe.com • .doc – read with Microsoft’s free reader at www.microsoft.com • .ppt – read with Microsoft’s free reader at www.microsoft.com • Similar Results

  25. Google Results - 3 Location Feature – Sets default for searches Location auto-detected - by IP Address - or entered into Google Toolbar Can be changed, if you are looking for stuff in a different location **Only works in your selected country** Manually set location is stored in a “Cookie” Can also be turned off Type of Content – Limit results to a particular type of web content: Images, Videos, News, Shopping, Books, Discussions, Places, Blogs, Real-time (e.g., updates from Twitter) or select the default – Everything Called Filters This is a big recent change Five years ago one had to search each database --- The databases were not integrated --- They are now ---

  26. Note on URLs • Results of Google Search provided as a • Uniform Resource Locator (URL) • URL Format: • http://www.google.com.uk • Domain Names: http://www.networksolutions.com/whois/index.jsp • URL for my domain name is: http://www.colket.org • Domain name extensions include: • .com .mobi .mil .gov .edu .net .info .org .biz .bz .tv • Domain Name Extensions (including Country): http://www.networksolutions.com/glossary/glossary-d.jsp#domainnameextensions • Domain Name Country Extensions – • .be .ca .cn .de .es. ru.com se.com .us Domain Name Extension World Wide Web URL Uniform Resource Locator HyperText Transfer Protocol Domain Name Domain Name Country Extension

  27. Note on IP Addresses • Every URL maps into a Unique Number called an IP (Internet Protocol) Address • http://www.google.com => 216.239.51.99 • IPV4 in format of xxx.xxx.xxx.xxx (e.g., 208.77.188.166) • 232 can handle 4,294,967,296 addresses • Expected to run out in early 2000s • IPV6 in format of x:x:x:x:x:x:x:x in late 1990s • (e.g., 2001:db8:0:1234:0:567:1:1) • 2128(or 340,282,366,920,938,463,463,374,607,431,768,211,456 ) addresses • IP addresses still work as IPV4 addresses all map to IPV6 • Operating systems are migrating to IPV6 • (e.g., Vista uses IPV6; XP uses IPV4) • Go to help/support on your computer searching for IPV6 Google crawls Over 8,000,000,000 Pages each month Need Current Browser

  28. Static versus Dynamic Searches - 1 “Relevancy” might not be relevant to Researchers and Genealogists. Google’s use of Relevancy is not useful for doing many types of searches: • Dynamic Databases • Genealogy Searches on family surnames • Obscure information • Much non-business oriented information • Rather unique information

  29. Static Searches Dynamic Searches Non Indexable Nodes Indexable Nodes Private Databases Fee/membership (e.g., Ancestry, Professional, News) Many available with Library membership Commercial Databases Shopping Or Limited to employees and customers only Public Databases City, County, State Federal Records Dark Web Dynamic Searches Use Google, Bing, or other Search Engine Every word on Page Is indexed with web crawler

  30. Static Versus Dynamic Searches - 2 • Desired Information is in a Separate Database • Auction Sites: Ebay | Craig’s List | UBid | Bid Start | Ebid | US Seek • Web Pages are Private and Not Available for Google • Most businesses have a public web site and a private web site • Only data companies want to share is available via Google • Limited Access Web Sites – Typically for profit sites, e.g., • ACM’s Digital Library – No Google access at all • Ancestory.com – Google provides “Teaser” results to entice membership • Chicago Tribune – Get “Teaser” hits on Google, but have to pay to access data Many Models Later We will discuss: The dark web Archive Grid New York Times Database

  31. Future Plans Future Plans for Computer SIGs: • Finding Pictures of Your Ancestor on the Internet – 3 Feb 2015 • Using Google for Genealogical Searches – Scheduled for 3 March 2015 • Manipulating Photos for Genealogy – Scheduled for April 2015 • Using Ancestry.com requested by Dunham Swift – Maybe November 2015 • Need Inputs see sheet What else would you like to have addressed at future Computer SIG Meetings?????

  32. MGS Computer Special Interest Group (SIG) WHAT IS IT?       A meeting of genealogists interested in using their personal computers to enhance their research. WHEN?                  Monthly -- On the first Tuesday of the month (October through May) following main topic speaker. TIME:                     About 11:15 AM to 12:15 PM, following the meeting break period after the main MGS speaker. PLACE:                  The Central Library Auditorium, Bradenton, FL (same location as our MGS monthly meeting) WHO:                     Open to all those interested in using their personal computers to enhance their genealogical research. PROGRAM:         Each month we will discuss and view what's new in genealogy on the Internet.  We'll have demonstrations of software and hardware that will facilitate our research.  Tips and techniques will be shared by and among those attending each meeting.  Genealogically related computer, Internet, digital photography and research questions will be fielded during the sessions.  We'll look at the newest technology but will keep the discussions as low tech as possible. What topics would you like to hear??????

More Related