200 likes | 345 Views
Jon Warbrick University of Cambridge Computing Service jon.warbrick@ucs.cam.ac.uk. Search Engine Developments. web-search.cam.ac.uk. Indexes 400 more-or-less "official" web servers Provides 'packaged searches' for Ucam websites Uses 'Verity Ultraseek'
E N D
Jon WarbrickUniversity of Cambridge Computing Service jon.warbrick@ucs.cam.ac.uk Search Engine Developments
web-search.cam.ac.uk • Indexes 400 more-or-less "official" web servers • Provides 'packaged searches' for Ucam websites • Uses 'Verity Ultraseek' • Currently a single search engine accessed by internal and external users
Current architecture CUDN & cam.ac.uk Rest of the world
The Problem • The Search engine is inside the CUDN boundary • ... and so can see 'ucam-only' material • ... which it will tell external users about • ... so we advise webmasters not to index it • ... but that means that internal users can't find it • It's also tricky to get the restrictions right
Things for you to do • Consider giving the internal search engine access to material currently excluded from indexing • User Agent isUltraseek (internal search; webmaster@ucs.cam.ac.uk) • Set-up 192.153.213.0 – 192.153.213.255 as being 'outside the University' – the external search engine will use one of these addresses and will NOT have a 'cam.ac.uk' name • Details athttp://www.cam.ac.uk/cs/web-search/developments.html
Other possible enhancements • Spell checking • Passage-based summaries • Wild card queries • Thesaurus and smart 'no hits' page • Quick links • Following links in JavaScript • Page expert
The Raven question • Should the search engine be able to index Raven-protected pages?