210 likes | 277 Views
Explore the latest advancements in DirectInfo Documents, a robust document searching tool based on Oracle Text. Learn about speed enhancements, improved robustness, manageability, and future development. Experience a live demo of the enhanced features.
E N D
Googalize your Search with DirectInfo Documents Author: Kiril Rusev Software Architect Semantec Bulgaria OOD DirectInfo Documents - New Features Semantec GmbH Benzstr. 32 D-71083 Herrenberg, Germany www.semantec.de
Agenda • Motivation • What is DirectInfo Documents? • What's new? • Live Demo • Future development
Motivation - The Need ? ? ?
Motivation - The Challenge Database Data Local Files Intranet Email Internet
Motivation - The Answer Document Files DirectInfo Database Data Structured Search Results Oracle Text Index Web Contents
What is DirectInfo? • A framework based on Oracle Text • Can index and search into various data sources • Can be extended • Can be adjusted to the customer’s needs
DirectInfo and Oracle Text Custom defined document grouping Context indexes with USER_DATASTORE Oracle Fast and flexible searching Full control over the indexing A lot of context information Flexible and extensible filtering Summarizing capabilities Regular index management Oracle Text DirectInfo Effective caching mechanism
What is DirectInfo Documents? • Based on DirectInfo platform • A powerful document searching tool • A web based “google-like” application • Easily managed and deployed
What's new? • Speed improvement • Robustness • Manageability • Functional improvements • LF and search results presentation improved
Speed improvement – Document Cache User Datastore PL/SQL Procedure PDF HTML NullFilter HTML PDF Store/Retrieve HTML HTML Filtering Document Cache • Filtering is done only once • The HTML version of the document is cached
Speed improvement – Faster Crawling Internet Crawler Interface File Crawler Local Files DirectInfo Web Crawler Other… Email Crawlers are adjusted according to the target document sources
INSO Filter HTML PDF PDF Before: Datastore NULL Filter HTML PDF HTML After: Datastore Filter 1 Filter 2 … Filter N Robustness – Better Filtering XFilter
Index Before: Dtx_Ddl.Sync_Index Index Dtx_Ddl.Sync_Index Dtx_Ddl.Sync_Index After: Dtx_Ddl.Sync_Index ……… Manageability - Indexing in Chunks Unstoppable !!!
Before: Found Files Indexed Files After: Indexed Files Found Files Functional improvements - Duplicated Files Detection
LF and search results presentation improved • Deferred fragments loading • Skins support, XP look and feel • Visual and functional redesign - HTML Frames • Searching made more simple
Future development • Defining and searching of meta data • Search results clustering • Improved flexibility • Improved administration • Improved caching • Better summarizing