290 likes | 518 Views
Google Session. About MIT’s Google Search Appliance (GSA) Adding Google search to your web site Customizing search results Tips on improving a site’s rankings Q&A – actually, ask questions anytime!. MIT's Google Configuration. MIT license is for 3M documents
E N D
Google Session • About MIT’s Google Search Appliance (GSA) • Adding Google search to your web site • Customizing search results • Tips on improving a site’s rankings • Q&A – actually, ask questions anytime!
MIT's Google Configuration • MIT license is for 3M documents • Two collections of 1.5M documents each • MIT has over 1M web pages on 1,000 web servers • Google follows links from the MIT Home Page • web.mit.edu – crawled three times a week • Other MIT web servers – crawled twice a week
MIT Google does • Performs twice as well as Inktomi in a “blind test” • Indexes 220 different file formats • Provides control over our own crawling schedule • Allows user customization of search results format • Indexes certificate-restricted content(not implemented yet)
MIT Google does NOT • Cache old pages • Index image files (our decision) • Index image ALT tags (Google’s decision) • Allow us to fiddle with the relevancy algorithm • Tell you “who’s linking to my page” because the GSA does not share that information across collections. When your pages move, we recommend using a 301 redirect.
MIT Google does NOT index Java, Perl, Python documentation Debian, GNU/Linux mirrors URLs containing these strings: sipb.mit.edu dev.mit.edu net.mit.edu lees.mit.edu ops.mit.edu classics.mit.edu hypermail pipermail Certificate protected pages No robots sites, no index pages Dynamically generated pages containing ‘?’ except by request URLs containing cgi-bin URLs containing /afs/
Telling Google not to index • No robots in server • No robots in locker/directory • No robots in html file • No index, follow
Avg. daily views - January 2005 Total queries Jan 1 - 26: 340,656
Sample search code 1. <form method='get' action='http://gb-server.mit.edu/search'>2. <input type='text'name='q' size='32' maxlength='255' value=''/>3. <input type='submit' name='btnG' value='Search'/>4. <input type='hidden' name='site' value='mit'/>5. <input type='hidden' name='client' value='mit'/>6. <input type='hidden' name='proxystylesheet‘ value='http://web.mit.edu/xsl/google-mit.xsl'/>7. <input type='hidden' name='output' value='xml_no_dtd'/>8. <input type='hidden' name='as_dt' value='i'/>9. <input type='hidden' name='as_sitesearch' value= 'web.mit.edu/newsoffice'/>10.</form> Doc
Restrict to one directory tree • name='as_sitesearch' value='<yoururl>'use web.mit.edu/newsoffice not web/newsoffice • The slash / mattersweb.mit.edu/newsofficeto include sub-directoriesweb.mit.edu/newsoffice/to exclude sub-directories • as_sitesearch allows allows you to specify one directory (and all its sub-directories) as the domain to be searched—you cannot specify multiple disparate directories using this option • If you want the search feature on your site to search the entire MIT web site, delete this parameter. Doc
Restrict tomultipledirectories or servers • Contact google@mit.edu and we will create a subcollection for you. • A subcollection is a list of URL patterns that can be referred to by a single name, such as "Library". Doc
Gooogle Custom Results You can customize the look and feel of Google’s search results by providing a stylesheet.
Customizing results Your HTMLheader/footer Google Results Data • You provide the header and footer (HTML) wrapper, and any desired content formatting • Google provides the raw data (XML)
How customization works Search Query = MIT-Google Index <XML/> <XSLT> MIT-Google Index MIT-Google Index MIT-Google Index HTMLResults Search Results Stylesheet • The form points to an XSLT stylesheet • Google returns results to query in XML • An XSLT document translates the XML into your custom HTML +
Notes • It is not necessary to customize the results. • You can place a search form on your site, and Google will use the site-wide MIT XSLT stylesheet. • Updates to the Google service may require you to make changes in your stylesheet. • Subscribe to google-partners@mit.edu • WCS will provide fee-based production services for custom search results.
How to customize the results • Plan how you want the results to look • Copy the MIT Google XSLT stylesheet http://web.mit.edu/xsl/google-mit.xsl • Save it to web readable space, naming it google-mysite.xsl
Point to your XSL • Update your search form to point the MIT-Google server to your custom XSLT style sheet. <form method='get' action='http://gb-server.mit.edu/search'> <input type='text' name='q' size='32' maxlength='255' value=''/> <input type='submit' name='btnG' value='Search'/> <input type='hidden' name='site' value='mit'/> <input type='hidden' name='client' value='mit'/> <input type='hidden' name='proxystylesheet' value='http://web.mit.edu/my_dept/google-mydept.xsl'/> <input type='hidden' name='output' value='xml_no_dtd'/> </form>
Step-by-step customization See http://web.mit.edu/ist/google/stylesheets.html
Documentation • http://web.mit.edu/ist/google/(Includes the “official” Google documentation, including their XML specification; also XSLT tips.) • Search Engine Submission Tipshttp://searchenginewatch.com/webmasters/Using SS for an • Effective SEO Campaignhttp://www.alistapart.com/articles/seo/
Support HTMLResults • The MIT Google team will support your creating a Google search form and answer queries sent to google@mit.edu • WCS offers fee-based production services for custom search results