270 likes | 345 Views
Searching uPortal with a third party Search Engine. Katya Sadovsky University of California, Irvine Administrative Computing Services katya@uci.edu. Agenda. Our goals Our current setup Built-in vs. Third Party Search Engine Dynamic vs. Static Content
E N D
Searching uPortal with a third party Search Engine Katya Sadovsky University of California, Irvine Administrative Computing Services katya@uci.edu
Agenda • Our goals • Our current setup • Built-in vs. Third Party Search Engine • Dynamic vs. Static Content • Issues in combining uPortal with a search engine • Demonstration • Questions & Answers
Our goals • Use the portal as a “gateway” to information • Allow users to search for pertinent portal content • Present users with integrated search results (portal and non-portal content) • Aid the search engine in weighing the results (meaningful page title, metadata, etc.)
Our current setup • uPortal 2.0.3 • Verity Ultraseek Search Engine (formerly Inktomi) • Tomcat 4.0.6
Built-in vs. Third Party Search Engine • Pros to using a built-in search engine: • Ensure generation of correct links to content • Present users with customized (user-specific) result sets • Ability to fully utilize channel metadata • Employ portal’s authorization infrastructure
Built-in vs. Third Party Search Engine • Pros to using a third party search engine • Well tested mature functionality • Well developed dictionary and thesaurus • Ability to search content beyond uPortal and present users with integrated search results • URL filtering capabilities • Useful but optional: nice administrative GUI, quick link definitions
Dynamic v.s. Static Content • uPortal generates dynamic content that depends on user's preferences, security level, browser and operating system • Most search engines are designed to work with static content: • Search engines index content on a periodic basis and use cached/stored index to present user with search results • Search results are not user-specific • Only public content is indexed
Issues/Areas of difficulty • User Agent setting • Filtering out certain URLs • Deciding what to search: • Search index/start page • Searchable v.s. non-searchable content • Generating links to channels using: • global (published) vs. instance (subscribed) ID • functional names • Page title used in search results
User Agent • Issues: • uPortal needs to know the mapping between a user agent and a MIME type/output type • When user agent is not recognized, uPortal will display a screen allowing users to choose a profile to use • Solutions: • If you know the user agent reported by the search engine – add a mapping to the UP_USER_UA_MAP table • Choose a search engine that allows you to specify a user agent
Filtering out certain URLs • Issues: • A search engine may follow a link that includes a channel option or command • uPortal URL tags: • Dynamically generated for each URL hit • Tags, other than 'idempotent' make search result senseless • While indexing content, a search engine may enter a loop referencing the same page with different tags
Filtering out certain URLs (cont’d) • Solutions: • acquire a search engine that allows URL filtering and filter out all “offending” URLs • If available with the search engine, use advanced URL “de-duping”
What to search: index/start page • Issues: • A user layout may not be used as a starting point for a search engine: a typical layout doesn't contain all the channels • Need a page with 'idempotent' links to all the searchable channels • Solutions: • Searchable Channel Index channel
What to search: searchable v.s. non-searchable content • Issue: • not all channels needed to be included in the search • Solution: • added a 'searchable' attribute to all the channels
Generating links to channels • Problem: channel instance (subscribed) IDs vary from user to user, so the search result links are inconsistent • Solutions: link to channels using • global (published) IDs -- involves code changes • functional names (fname) -- this is a new functionality, available in CVS (Concurrent Versions System)
Linking to channels via their published IDs: implementation plan • Modified org/jasig/portal/UserInstance.java to recognize that user is asking for a published channel that may not be in user’s layout • Create a temporary hidden folder in user’s layout to store “temporary” channels (make sure to delete this folder before layout is saved to the database) • Add XML channel definitions to this hidden folder • Proceed to render as usual
Page titles used in search results • Issues: • Out of the box, uPortal has a statically set page title (no matter what channel is viewed) • Search engines generally use page titles (or other metadata) for: • search result titles • result ranking • de-duping • Users have to be trained to enter meaningful page titles when creating documents/channels (e.g. do not start each page title with UCIrvine)
Page titles used in search results • Solution: • when channels are rendered in 'focused’ or ‘detached’ mode, add channel title to the default page title (following is a fragment of webpages/stylesheets/org/jasig/portal/layout/tab-column/nested-tables/nested-tables.xsl): • <xsl:templatematch="layout_fragment">... <title><xsl:value-of select="$windowTitle"/> <xsl:value-of select="concat(': ',content//channel/@description)"/> </title> ...</xsl:template><xsl:templatematch="layout">... <title><xsl:value-of select="$windowTitle"/> <xsl:iftest="//focused"> <xsl:value-ofselect="concat(': ',//focused/channel/@description)"/> </xsl:if> </title>...</xsl:template>
Conclusions • There are tradeoffs when using either a built-in or a third-party search engine • We have yet to address the following issues: • searching restricted content • creating META data tags to help the search engine with content ranking • Overall, our portal project could not succeed without a search function
Links • UC Irvine’s uPortal installation (SNAP): http://snap.uci.edu • This presentation: http://snap.uci.edu/PortalDocs/uPortal_Search.ppt