700 likes | 1.05k Views
Search Engines: Exploring Google. By Habib.ur.Rehman Assistant librarian & Coordinator Lincoln Corner Central Library, University of Peshawar. What is search engine?.
E N D
Search Engines:Exploring Google By Habib.ur.Rehman Assistant librarian & Coordinator Lincoln Corner Central Library, University of Peshawar
What is search engine? A computer program that retrieves documents or files or data from a database or from a computer network (especially from the internet) worldweb dictionary A software program that searches a database and gathers and reports information that contains or is related to specified terms. A search engine is a website that searches the Internet for pages and documents relevant to the search terms given. Search engines use robots known as spiders to 'crawl' the web for new content to add to the possibilities for search results.
A search engine is a website that searches the Internet for pages and documents relevant to the search terms given. Search engines use robots known as spiders to 'crawl' the web for new content to add to the possibilities for search results. A SEARCH engine is software designed specifically to allow you to find anything you want on the Internet. There are many search engines available. All you need to do is pick the one that you would like to use, insert the search string (what you are looking for) and start the search. You will get back a list of entries matching your entry and then all you need to do is double click on the lines that interest you and you will be taken to that homepage. http://www.Ask.com
Three Types of Search Engines • Crawler-based search engines • Human-powered directories • Hybrid search engines
Crawler-based search engines, such as Google (http://www.google.com), create their listings automatically. They "crawl" or "spider" the web, then people search through what they have found. If web pages are changed, crawler-based search engines eventually find these changes, and that can affect how those pages are listed. Page titles, body copy and other elements all play a role. • The life span of a typical web query normally lasts less than half a second, yet involves a number of different steps that must be completed before results can be delivered to a person seeking information. The following graphic (Figure 1) illustrates this life span (from http://www.google.com/corporate/tech.html):
Crawler-based search engines The life span of a typical web query 1. The web server sends the query to the index servers. The content inside the index servers is similar to the index in the back of a book - it tells which pages contain the words that match the query. 3. The search results are returned to the user in a fraction of a second. 2. The query travels to the doc servers, which actually retrieve the stored documents. Snippets are generated to describe each search result.
Human-powered directories • A human-powered directory, such as the Open Directory Project (http://www.dmoz.org/about.html) depends on humans for its listings. (Yahoo!, which used to be a directory, now gets its information from the use of crawlers.) A directory gets its information from submissions, which include a short description to the directory for the entire site, or from editors who write one for sites they review. A search looks for matches only in the descriptions submitted. Changing web pages, therefore, has no effect on how they are listed. Techniques that are useful for improving a listing with a search engine have nothing to do with improving a listing in a directory. The only exception is that a good site, with good content, might be more likely to get reviewed for free than a poor site.
Hybrid search engines • Today, it is extremely common for crawler-type and human-powered results to be combined when conducting a search. Usually, a hybrid search engine will favor one type of listings over another. For example, MSN Search (http://www.imagine-msn.com/search/tour/moreprecise.aspx) is more likely to present human-powered listings from LookSmart (http://search.looksmart.com/). However, it also presents crawler-based results, especially for more obscure queries.
Top Search Engines Top Search Engines for 2010 (By Volume) Year Google Yahoo! Bing Ask Total 2010-03-06 71.07% 14.46% 9.55% 3.0% 98.09% 2010-02-06 71.35% 14.60% 9.56% 2.55% 98.06% 2010 01 71.61% 14.76% 9.13% 2.66% 98.16% Source: http://www.seoconsultants.com/search-engines/
Top Search Engines Top Search Engines for 2010 (By Visit) Source: http://www.hitwise.com
20 SEARCH The Web's Best Search Engine List! http://www.20search.com/
Recommended Search Strategy • 1. Analyse your topic to decide where to begin • 2. Select key words and phrases that are relevant to your topic • 3. Pick the right starting place to begin your research (search engine, directory, or invisible web) • 4. Use the “advanced search” screen and “search tips” advice available on most good websites • 4. Learn as you go and vary your approach with what you learn. Don’t persist with any strategy that doesn’t work
Why use the ‘Recommended Internet Links’ page • All websites have been selected for their quality and authority from reliable sources • The directory focuses heavily on websites that are of particular relevance to the Pacific Islands and Vanuatu • The directory focuses heavily on websites that are of particular relevance to the USP teaching programme.
Google Facts You DONT Know • Google started in January, 1996 as a research project at Stanford University, by Ph.D. candidates Larry Page and Sergey Brin when they were 24 years old and 23 years old respectively. • The name “Google” was an accident. A spelling mistake made by the original founders who thought they were going for “Googol”. • Google is the largest American company (by market capitalization) • The Google search engine receives about a billion search requests per day. • Google has the largest network of translators in the world. • Google consists of over 450,000 servers • Number of languages in which you can have the Google home page set up, including Urdu and Latin : 88 • The infamous “I m feeling lucky” button is nearly never used. However, it costs Google $110 Million a Year
Recommended general search engines Popular search engines • Google / Google scholar • Ask.com • Yahoo!Search • *Google alone is not sufficient. Even though probably the biggest search engine, studies show that less than half the searchable web is fully searchable in google. • Tip 1: Use the ‘advanced search’ option with ALL search engines to refine your search • Tip 2: Become familiar with your search engine by looking at the ‘Advanced Search Tips’ page
Googling to the max • Google is the biggest search engine database in the world • Google ranks pages on three criteria: • Popularity – based on the number of links to a page and the importance of the pages that link • Importance – traffic, quality of links • Word proximity and occurrence in results
Google tips • Use the advanced search screen • Use quotations when searching a phrase • Type your search terms as a statement not as a question. (Google will try to match the words that you have entered.) • Google can search more than just documents. Explore the Google site to search for maps, news, images and photographs, books, music, videos, blogs etc.
Google “special” searches and shortcuts • “ “ : always use quotations to search a phrase • - hyphen: always hyphenate a word that is sometimes hyphenated eg. same-sex searches same-sex, samesex and same sex • ~ synonyms: let google “think” of synonyms eg. ~youth finds youth, juvenile, adolescent • Intitle: Requires terms to appear in the title of the document eg.intitle:”global warming” • Allintitle: Requires all terms to appear in the title of the document eg. Allintitle: traditional knowledge intellectual property pacific
More google shortcuts.. • Site: used to search within a particular site eg. Site:un.org “discrimination against women” • Inurl: requires terms to be in the url eg. Inurl:usp forsyth will find all references to Forsyth on websites with usp in the url. • Filetype: only searches particular types of documents eg. Filetype:ppt “legal research” will locate power point presentations on legal research. • Movie: only searches movie reviews!
And more google shortcuts! • Use google as a calculator eg. 6*2 • Use google to find maps eg. Map:”port vila” • Use google as a dictionary eg. Define:”mens rea”
Searching the “invisible” web • The “invisible” web is estimated to be two to three times bigger than the “visible” web • The invisible web consists of a vast amount of documents that are contained within searchable specialised databases that are not themselves linked web pages • You need to identify these databases and search on them rather than via google. These databases include subscription only databases (eg. Lexis) but also freely available databases (eg. The Emalus Library’s ‘Pacific Law Journal Index’) • Identify ‘databases’ on the invisible web by using specialist subject directories (such as the Emalus ‘Recommended Internet Links’ page, Sosig, Weblaw etc), studying major internet sites in your area of interest or by including the word ‘database’ or ‘index’ in your search. Eg. “human rights” database
Evaluating web pages • What can the URL tell you? – is this a personal website? • What type of domain does it come from? eg. .com indicates it is a commercial site whilst .edu indicates it is an educational site • Is there any information on the webpage on who published the materials or information about the authors of the website itself? – can you trust the information on the website? • Is the page dated? Is it up-to-date? • What are the authors credentials on the subject? – are they an expert? • Are sources documented with footnotes and verifiable references? • Are there well annotated links to other sources on the topic? • Do other reputable sites link to this webpage?
Final comments • “garbage in garbage out” – computers do not think so how you structure your search will determine the effectiveness of your search results • Be critical of what you find on the internet and verify the authority, reliability and currency of all materials • Do not rely solely on one search tool such as Google. Make use of 2 or more search engines, relevant internet subject directories and explore the invisible web • Search the databases in the invisible web that you have access to via your USP library ‘Law Resources’ website (Encyclopedia Britannica, Oxford Reference Library, Westlaw, Lexis, Pacific Law Journal Index etc.)
What has Google ever done for us? • Google Scholar • Google Finance • Google News • Google Calendar • Google Docs • Google Drive • Google Checkout • Google Mobile • Google Gmail • G-phone • Google knol - wiki
Don’t ever store user information. 30% Give users access to and editing permission over the data they keep. 18% Empower users to manage and improve the relevancy of their own search results. 15%Be transparent about filtering they use to display results, capture information and disclose biases.14% Give users the opportunity to opt out at will. 10% Have regular open conversations with users on the use of user data. 6%None of the above 5% Give users the tools to curate and prune search history. 4%
Advanced Features of Google • Fill in the Blanks – “*” • Diacritics • Query modifiers • GAPS (proximity search)
Fill in the Blanks – “*” You can replace unknown words with an asterisk - “*”. Google returns results substituting the “*” with words most frequently used in the context of the query.
Busted! Possible Uses for “*” Searching out suspected plagiarism.
Possible Uses for “*” Including results with common misspellings. All spellings of a word will be found.
Possible Uses for “*” Finding common variations.
Fun with “*” Finding parodies.
…Google will treat the characters on either side of the “*” like separate keywords. Fill in the Blanks – “*” Replacing a character. If you try to use “*” to fill in a letter, number, or symbol…
Diacritics If you search for a word with a diacritic (distinguishing mark) in it… …will Google return results with or without the diacritic?
Diacritics The answer is: both! A search for unité produces results for both unite and unité.
Diacritics To limit your search to only unité, add a “-” followed by unite.
Diacritics Notice how the number of results has decreased.
Query Modifiers • Use these commands in the search window. • intitle: Find sites with one search term in the title. • allintitle: Find sites with all search terms in the title. • inurl: Find sites with one search term in the URL. • allinurl: Find sites with all search terms in the URL. • site: Limit your search to a specific web site. • filetype: Specify a type of document to search.
Query Modifiers – intitle: Find sites with one search term in the title.
…and ingredient anywhere in the document. Query Modifiers – intitle: This search returns sites with the word shampooin the title… Find sites with one search term in the title.
Query Modifiers – allintitle: Find sites with ALL search terms in the title.
Query Modifiers – allintitle: Notice fewer “hits” when shampoo AND ingredient must be found in the title of the page. Find sites with all search terms in the title.
Query Modifiers – inurl: Find sites with one search term in the URL.
…and ingredient anywhere in the document. Query Modifiers – inurl: This search returns sites with the word shampooin the URL… Find sites with one search term in the URL.
Query Modifiers – allinurl: Find sites with ALL search terms in the URL.
Query Modifiers – allinurl: Notice fewer “hits” when shampoo AND ingredient must be found in the title of the page. Find sites with all search terms in the URL.
Query Modifiers – site: Limit your search to a specific web site.