320 likes | 433 Views
EAS313 Content Capture Technology Suite: EAI for the Web. Scott McReynolds, Sr Manager, scottmc@sybase.com / 925 236 4558 Prashanth Ponnachath, Software Engineer pponnach@sybase,coml / 925 236 6286 Date 08/07/2003. Session Objectives. Information Management Challenges.
E N D
EAS313 Content Capture Technology Suite: EAI for the Web Scott McReynolds, Sr Manager, scottmc@sybase.com / 925 236 4558 Prashanth Ponnachath, Software Engineer pponnach@sybase,coml / 925 236 6286Date 08/07/2003
Information Management Challenges • Quantity of information within and outside of enterprises has grown exponentially • Challenge to extract relevant information from a multitude of sources • Integrating extracted content that may be in different formats (EAI issues)
Information Management Challenges • Task Specific Customization or Personalization • Combine data from several different sources into a new data source • Data aggregation for mining and analysis • Bottled up data by artificial network or security barriers
Existing Capture Methodologies By Other Vendors Static data stored in databases • Not equivalent to storing dynamic data • Need to refreshed at regular intervals • Legal problems • More infrastructure investment
Existing Capture Methodologies By Other Vendors Screen Scraping • Snooping the contents of some display memory of a smart terminal through its auxillary port • Parsing the HTML with programs designed to mine out patterns of content • Ugly, ad-hoc very likely to break on even minor changes to the format of the data being snooped.
Content Capture Technology Suite (CCTS) What does it do ? • Set of API that capture dynamic content from a variety of sources into individual elements • Deploy and replay captured elements in any portal framework • Aggregate data from multiple sources into XML
Technology Driving CCTS – Feature Extraction Traditional Extraction Methodology • Outside in, based on HTML tags • Content feed breaks if page changes slightly
Technology Driving CCTS – Feature Extraction CCTS Extraction Methodology • Inside out, based on features of content desired
Technology Driving CCTS – Feature Extraction Feature Extraction (FE) ensures reliability of content aggregation • Parses out information on a page and breaks down into specific components • Fuzzy logic “digital signature” or symbolic reference rather than a static link ensures persistent extraction of desired content • Pattern recognition through “object specific” parsers enable an extendable set of aggregated object
Technology Driving CCTS – CCL Content Collection Language (CCL) • ‘Content bundle’ of everything needed to collect and playback desired content • Designed to be programmed through a user interface instead of by hand • Simple as a URL, but as powerful as a web scripting language
Technology Driving CCTS – Navigation • Tightly coupled with Content Collection Language • Written in Java • Servlet based and can be easily tied to a GUI
Technology Driving CCTS – CCL (continued) • New commands are easily added, not keyword based language • Can reside on the client or the server • Parsing and error management are shared by all commands. • Fast execution. • Used to eliminate session/calls to DB
CCTS Components Content Capture Engine • Takes in user input via a navigation GUI and generates the CCL or XML Playback Engine • Translates CCL statements into content Content Repository Interface • Deploy captured content into any portal repository
CCTS Components Content Capture Workbench • Eclipse based GUI that allows users to capture and deploy content using a GUI • Reference implementation of Capture and CRI API • Design pattern that can be used as a reference to integrate any custom GUI to the CCTS API
Suite of Powerful Content Aggregation Tools DataParts reduces the number of data tasks that require a programmer, and makes the remaining tasks easy to accomplish.
EAI Tools • Grid Charts • Messaging Portlets • Integrated Scripting Environment • DataParts
Demo : Sailing Event Web Application Scenario • You are a portal developer for a company managing sailing events • Assigned a task of creating a portal containing following information • Race Sites • Live weather information • Wind speed for last 12 hours as a graph • Tide information as a graph • Marine weather