300 likes | 471 Views
Upgrade and new features Jon Warbrick University of Cambridge Computing Service jw35@cam.ac.uk. Site-wide Search. Site-wide search. web-search.cam.ac.uk. Site-wide search. web-search.cam.ac.uk Ultraseek, from Infoseek. Site-wide search. web-search.cam.ac.uk
E N D
Upgrade and new features Jon WarbrickUniversity of Cambridge Computing Servicejw35@cam.ac.uk Site-wide Search
Site-wide search • web-search.cam.ac.uk
Site-wide search • web-search.cam.ac.uk • Ultraseek, from Infoseek
Site-wide search • web-search.cam.ac.uk • Ultraseek, from Infoseek -> Inktomi
Site-wide search • web-search.cam.ac.uk • Ultraseek, from Infoseek -> Inktomi -> Verity
Site-wide search • web-search.cam.ac.uk • Ultraseek, from Infoseek -> Inktomi -> Verity -> Autonomy
Site-wide search • web-search.cam.ac.uk • Ultraseek, from Infoseek -> Inktomi -> Verity -> Autonomy • Currently indexing • ~600 servers • ~1.2 million documents • ~2.5 million URLs
Site-wide search • Indexes 'more-or-less official' servers
Site-wide search • Indexes 'more-or-less official' servers • Maintains two indexes • 'internal' and 'external' • automatically routes queries
Site-wide search • Indexes 'more-or-less official' servers • Maintains two indexes • 'internal' and 'external' • automatically routes queries • Services for University Webmasters • Add/delete/re-index • Packaged searches
2006 Upgrade • Improved resilience
2006 Upgrade • Improved resilience • Case-inSenSITIVE matching
2006 Upgrade • Improved resilience • Case-inSenSITIVE matching • Quick Links
2006 Upgrade • Improved resilience • Case-inSenSITIVE matching • Quick Links
2006 Upgrade • Improved resilience • Case-inSenSITIVE matching • Quick Links • Passage-based summaries
2006 Upgrade • Improved resilience • Case-inSenSITIVE matching • Quick Links • Passage-based summaries
2006 Upgrade • Improved resilience • Case-inSenSITIVE matching • Quick Links • Passage-based summaries • Grouping by location
2006 Upgrade • Improved resilience • Case-insensitive matching • Quick Links • Passage-based summaries • Grouping by location
2006 Upgrade • Improved resilience • Case-inSenSITIVE matching • Quick Links • Passage-based summaries • Grouping by location • [ All terms matching ]
2006 Upgrade • More indexing (dynamic pages + https + JavaScript)
2006 Upgrade • More indexing (dynamic pages + https + JavaScript)
2006 Upgrade • More indexing (dynamic pages + https + JavaScript)
2006 Upgrade • More indexing (dynamic pages + https + JavaScript) • Sources of indexing requests • s1.web-search.cam.ac.uk - s6.web-search.cam.ac.uk • an address in the range 192.153.213.0-255
2006 Upgrade • More indexing (dynamic pages + https + JavaScript) • Sources of indexing requests • s1.web-search.cam.ac.uk - s6.web-search.cam.ac.uk • an address in the range 192.153.213.0-255 • Backup search engines • Add URL, Revisit Site, etc.
Frames hiding real URL Junk path info 'Success' error pages Lack of Last Modification time stamp Inconsistent URLs Problems with dynamic content • Randomly permuted query arguments • Gratuitously-varying detail • Variant pages • Calendars linking to other pages • Cache-busting headers
Further information • Notes for webmasters:http://www.cam.ac.uk/cs/web-search/ • Details of recent changes:http://www.cam.ac.uk/cs/web-search/changes-200608.html • Help and advice:web-support@ucs.cam.ac.uk
“Why don't you use Google?” I wonder if anyone will ask...