370 likes | 446 Views
Trends in Knowledge Retrieval Technology David Barton, Managing Director – North American Operations AIIM/ARMA Joint Breakfast Meeting March 20, 2002 Radnor Hotel, St. Davids, PA. Agenda. Background to document management and knowledge retrieval issues and technologies
E N D
Trends in Knowledge Retrieval TechnologyDavid Barton, Managing Director – North American OperationsAIIM/ARMA Joint Breakfast MeetingMarch 20, 2002Radnor Hotel, St. Davids, PA
Agenda • Background to document management and knowledge retrieval issues and technologies • The limitations of simple keyword searching • Alternative knowledge retrieval techniques and tools • How to incorporate these tools into EDM strategy & implementation • Demonstrations of searching, agent creation and other knowledge retrieval tools
Background to EDM and Knowledge Retrieval Issues and Technologies
More Data than Ever Before • Electronic Document/Knowledge Management applications focus in the ‘90’s • Evolving power of the Web and its importance to organizational and commercial success • Seven million pages added to the internet daily
Information Fatigue Syndrome • Reuters Business Information re: the malady of information overload and knowledge starvation • Information sharing is better but harnessing knowledge is still a problem • More data distributed than ever before • Proliferation of unstructured information • New challenges for knowledge workers
Valuable Unstructured Information Structured Information: SQL, Oracle, Dbase databases Unstructured Information: paper, Word, e-mail, fax, pictures, Spreadsheets, the Web
ElectronicDocument Management Systems Strengths • Take control of electronic and paper information • Authorization of access and document level security • Version control and revision rules • Effective naming and storage methods • Full audit trail facilities
Searching Limitations • Electronic Document Management systems typically offer limited search technology Database Growth Documents Number of documents a user is willing to browse Time
Costly Defocused Inaccurate Technology:Current Approaches • Searching • Manual Tagging/XML • Collaborative • Filtering / Forms
Alternative Knowledge Retrieval Tools: Adaptive Pattern Recognition Processing (APRP)
Knowledge Workers’Objectives • To be able to search the way they speak • Not to end up with unfocused queries that return thousands of hits, or queries that are too narrow to overlook important results
Adaptive Pattern Recognition Processing • Convera (Excalibur) pioneered knowledge retrieval solutions based on APRP more than 20 years ago • Advanced searching based on concepts, patterns and boolean clues • Natural language querying: users are free to search the way they think, confident that the best documents will be at the top of the results list
Concept Searching • People naturally think in terms of concepts, not keywords, and their searches are often exploratory in nature • Concept Searching + Pattern Searching = fast, accurate and relevant results
The Semantic Network • Pinpoints all relevant meanings, with its 500,000 word dictionary and 1.6 million word relationship database • Can deliver cross-lingual search results • Recognizes that misspellings and alternative spellings are common
Searching... • Pattern • Concept (or Semantic) • Boolean • Combination of the above!
Concept Searching... Broth Shares 90% 80% 65% 35% Soup 95% Stock 70% Certificates Breed 85% In English “STOCK” has 26 different meanings. 30% 90% Semantic Network contains 500,000 terms with 1.6 million links. Herd
Enterprise Benefits • Increase overall productivity by eliminating costly duplicate efforts and unifyingyour worldwide workforce • Return complete and relevant search results • Find relevant answers to questions even when they are buried in various unrelated sources, and regardless of whether they are stored in structured or unstructured formats • Turn Intranet information into shared knowledge
Alternative Knowledge Retrieval Tools: Proprietary Pattern Matching
Knowledge Workers’ Objective A software infrastructure that automates the processing of unstructured information to power e-business applications across all digital domains
Autonomy A fundamental technological breakthrough, based on unique pattern-matching algorithms, to make sense of the vast amount of unstructured information on the Internet and on corporate intranets
+ = The Solution Proprietary Pattern Matching Technology
Proprietary Pattern Matching Technology • Based on research from Cambridge University • Algorithm to extract “concepts” from text and learn • Language independent • Significant intellectual property content
Technology • Central component is the Autonomy Dynamic Reasoning Engine (DRE) - • Searching based on Adaptive Probablilistic Concept Modeling (APCM) • User Profiling • Personalization of information
User Profiling &Concept Agents • Autonomy learns the way that users interact with individual pieces of information - to build an understanding of their interests and skills • Using Concept Agents to analyze ideas in the documents users read or produce, it then finds similar concepts in e.g. a set of websites, newsfeed or an e-mail archive
Integration Summarization Visualization Hyperlinking Retrieval Suggest Aggregation Categorization Personalization Alerting Targeting Clustering Taxonomy Visualization Profiling Community Collaboration Expertise Location
Personalised Information • The user profiles are used to: • deliver personalized information, • create communities of interest • identify colleagues with useful expertise • Autonomy’s understanding of a user’s interests deepens and focuses with experience • As users’ needs and interests change, Autonomy automatically tracks those shifts
Personalization & Profiling Explicit Agents, Personalization • TRAIN BY EXAMPLE/ NATURAL LANGUAGE/ KEYWORD • REFINE BY EXAMPLE • CUSTOMIZING • ALERTING/ COLLABORATION • CONTEXTUAL • INDIVIDUAL • CROSS-DEVICE • CROSS-DATA SOURCE/FORMAT Implicit Agents, Profile • AUTOMATIC • MULTI-FACETED • CURRENT • TARGETING/ RECOMMENDING • ALERTING/ COLLABORATION
Search & Retrieval • Unique pattern matching that conceptually ‘profiles’ content • Derives an understanding of the context and meaning of information to analyze documents, extract ideas in the text and determine the most important
Retrieval • KEYWORD • BOOLEAN • OPERATORS • NATURAL LANGUAGE • CONCEPTUAL • REFINABLE Legacy support • CROSS-DATA SOURCE/FORMAT • Data DRE
Automatic Hyperlinking • DOCUMENT, AGENT, PEOPLE, PRODUCT • CONTEXTUAL • AUTOMATIC • REALTIME • LIVE • CROSS-DATA SOURCE/FORMAT
Automatic Categorization 90% • CONTEXTUAL • AUTOMATIC • REALTIME • SCALEABLE • TAGGING/ ROUTING • CHANNELS: WEB INTERFACE • TRAIN BY EXAMPLE/ BOOLEAN • 7000 WEB CATEGORIES • 500 NEWS CATEGORIES • FTSE World Global CATEGORIES 60% 30% 10% 90% Category DRE 60% 30% 10% • CROSS-DATA SOURCE/FORMAT Data DRE
Enterprise Benefits • Cost reduction & management through the automation of: • Content access, receipt, analysis & delivery • User profiling & content request profiling • Increased user productivity through searching & ease of use • Increased competitiveness through enhanced responsiveness, highly personalized targeting of products & content
How to incorporate knowledge retrieval tools into overall EDM strategy & implementation
Telemach Info International • The successor company of Thermo Info International, a division of Thermo Electron Corporation, a major US corporation based in Waltham, MA • Thermo Electron is a major investor in and shareholder of Telem@ch Info International • Telem@ch Info provides software solutions, backed by professional consultancy and support services, which encompass the following web-based technologies: Electronic Document/ContentManagement; Workflow; Knowledge Retrieval & Discovery; and Records Management.
Contact Information • David Barton – Managing Director, North American Operations (e-mail david.barton@telemachinfo.com) • Telemach Info International Inc, 76 South Hamilton Street, Doylestown, PA 18901 – 4125, USA • Tel: 1-215-489-8500 • www.telemachinfo.com