1 / 27

Searching uPortal with a third party Search Engine

Searching uPortal with a third party Search Engine. Katya Sadovsky University of California, Irvine Administrative Computing Services katya@uci.edu. Agenda. Our goals Our current setup Built-in vs. Third Party Search Engine Dynamic vs. Static Content

mariko
Download Presentation

Searching uPortal with a third party Search Engine

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Searching uPortal with a third party Search Engine Katya Sadovsky University of California, Irvine Administrative Computing Services katya@uci.edu

  2. Agenda • Our goals • Our current setup • Built-in vs. Third Party Search Engine • Dynamic vs. Static Content • Issues in combining uPortal with a search engine • Demonstration • Questions & Answers

  3. Our goals • Use the portal as a “gateway” to information • Allow users to search for pertinent portal content • Present users with integrated search results (portal and non-portal content) • Aid the search engine in weighing the results (meaningful page title, metadata, etc.)

  4. Our current setup • uPortal 2.0.3 • Verity Ultraseek Search Engine (formerly Inktomi) • Tomcat 4.0.6

  5. Built-in vs. Third Party Search Engine • Pros to using a built-in search engine: • Ensure generation of correct links to content • Present users with customized (user-specific) result sets • Ability to fully utilize channel metadata • Employ portal’s authorization infrastructure

  6. Built-in vs. Third Party Search Engine • Pros to using a third party search engine • Well tested mature functionality • Well developed dictionary and thesaurus • Ability to search content beyond uPortal and present users with integrated search results • URL filtering capabilities • Useful but optional: nice administrative GUI, quick link definitions

  7. Dynamic v.s. Static Content • uPortal generates dynamic content that depends on user's preferences, security level, browser and operating system • Most search engines are designed to work with static content: • Search engines index content on a periodic basis and use cached/stored index to present user with search results • Search results are not user-specific • Only public content is indexed

  8. Issues/Areas of difficulty • User Agent setting • Filtering out certain URLs • Deciding what to search: • Search index/start page • Searchable v.s. non-searchable content • Generating links to channels using: • global (published) vs. instance (subscribed) ID • functional names • Page title used in search results

  9. User Agent • Issues: • uPortal needs to know the mapping between a user agent and a MIME type/output type • When user agent is not recognized, uPortal will display a screen allowing users to choose a profile to use • Solutions: • If you know the user agent reported by the search engine – add a mapping to the UP_USER_UA_MAP table • Choose a search engine that allows you to specify a user agent

  10. Example: setting a search engine user agent

  11. Filtering out certain URLs • Issues: • A search engine may follow a link that includes a channel option or command • uPortal URL tags: • Dynamically generated for each URL hit • Tags, other than 'idempotent' make search result senseless • While indexing content, a search engine may enter a loop referencing the same page with different tags

  12. Filtering out certain URLs (cont’d) • Solutions: • acquire a search engine that allows URL filtering and filter out all “offending” URLs • If available with the search engine, use advanced URL “de-duping”

  13. Example: Filtering out certain URLs

  14. Example: using URL filters

  15. What to search: index/start page • Issues: • A user layout may not be used as a starting point for a search engine: a typical layout doesn't contain all the channels • Need a page with 'idempotent' links to all the searchable channels • Solutions: • Searchable Channel Index channel

  16. What to search: searchable v.s. non-searchable content • Issue: • not all channels needed to be included in the search • Solution: • added a 'searchable' attribute to all the channels

  17. CSearchRegistry channel

  18. CSearchRegistry: stylesheet

  19. Generating links to channels • Problem: channel instance (subscribed) IDs vary from user to user, so the search result links are inconsistent • Solutions: link to channels using • global (published) IDs -- involves code changes • functional names (fname) -- this is a new functionality, available in CVS (Concurrent Versions System)

  20. Linking to channels via their published IDs: implementation plan • Modified org/jasig/portal/UserInstance.java to recognize that user is asking for a published channel that may not be in user’s layout • Create a temporary hidden folder in user’s layout to store “temporary” channels (make sure to delete this folder before layout is saved to the database) • Add XML channel definitions to this hidden folder • Proceed to render as usual

  21. Page titles used in search results • Issues: • Out of the box, uPortal has a statically set page title (no matter what channel is viewed) • Search engines generally use page titles (or other metadata) for: • search result titles • result ranking • de-duping • Users have to be trained to enter meaningful page titles when creating documents/channels (e.g. do not start each page title with UCIrvine)

  22. Page titles used in search results • Solution: • when channels are rendered in 'focused’ or ‘detached’ mode, add channel title to the default page title (following is a fragment of webpages/stylesheets/org/jasig/portal/layout/tab-column/nested-tables/nested-tables.xsl): • <xsl:templatematch="layout_fragment">... <title><xsl:value-of select="$windowTitle"/> <xsl:value-of select="concat(': ',content//channel/@description)"/> </title> ...</xsl:template><xsl:templatematch="layout">... <title><xsl:value-of select="$windowTitle"/> <xsl:iftest="//focused"> <xsl:value-ofselect="concat(': ',//focused/channel/@description)"/> </xsl:if> </title>...</xsl:template>

  23. Example: page titles

  24. Conclusions • There are tradeoffs when using either a built-in or a third-party search engine • We have yet to address the following issues: • searching restricted content • creating META data tags to help the search engine with content ranking • Overall, our portal project could not succeed without a search function

  25. Links • UC Irvine’s uPortal installation (SNAP): http://snap.uci.edu • This presentation: http://snap.uci.edu/PortalDocs/uPortal_Search.ppt

  26. Demo

  27. Questions?

More Related