1 / 20

Search Engine Developments

Jon Warbrick University of Cambridge Computing Service jon.warbrick@ucs.cam.ac.uk. Search Engine Developments. web-search.cam.ac.uk. Indexes 400 more-or-less "official" web servers Provides 'packaged searches' for Ucam websites Uses 'Verity Ultraseek'

tovah
Download Presentation

Search Engine Developments

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Jon WarbrickUniversity of Cambridge Computing Service jon.warbrick@ucs.cam.ac.uk Search Engine Developments

  2. web-search.cam.ac.uk • Indexes 400 more-or-less "official" web servers • Provides 'packaged searches' for Ucam websites • Uses 'Verity Ultraseek' • Currently a single search engine accessed by internal and external users

  3. Current architecture CUDN & cam.ac.uk Rest of the world

  4. Current architecture

  5. Current architecture

  6. Current architecture

  7. Current architecture

  8. The Problem • The Search engine is inside the CUDN boundary • ... and so can see 'ucam-only' material • ... which it will tell external users about • ... so we advise webmasters not to index it • ... but that means that internal users can't find it • It's also tricky to get the restrictions right

  9. Future architecture

  10. Future architecture

  11. Future architecture

  12. Future architecture

  13. Future architecture X

  14. Future architecture

  15. Future architecture

  16. Future architecture

  17. Things for you to do • Consider giving the internal search engine access to material currently excluded from indexing • User Agent isUltraseek (internal search; webmaster@ucs.cam.ac.uk) • Set-up 192.153.213.0 – 192.153.213.255 as being 'outside the University' – the external search engine will use one of these addresses and will NOT have a 'cam.ac.uk' name • Details athttp://www.cam.ac.uk/cs/web-search/developments.html

  18. Other possible enhancements • Spell checking • Passage-based summaries • Wild card queries • Thesaurus and smart 'no hits' page • Quick links • Following links in JavaScript • Page expert

  19. The Raven question • Should the search engine be able to index Raven-protected pages?

  20. Any more questions?

More Related