1 / 26

Avoiding Content Predicaments! Automatic 3Cs - Classification, Categorizing and Clustering

Avoiding Content Predicaments! Automatic 3Cs - Classification, Categorizing and Clustering. Thursday, 8 November 2007. Agenda. Introduction Content Management Automation: existing technologies The power of Expert System’s Semantic Technology: benefits, how it works

tiger
Download Presentation

Avoiding Content Predicaments! Automatic 3Cs - Classification, Categorizing and Clustering

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Avoiding Content Predicaments! Automatic 3Cs - Classification, Categorizing and Clustering Thursday, 8 November 2007

  2. Agenda • Introduction • Content Management Automation: existing technologies • The power of Expert System’s Semantic Technology: benefits, how it works • RCS MediaGroup: case study • Examples Luca Scagliarini - Expert System - 11-08-2007

  3. Knowledge Intensive Environment: Managing the Unmanageable The information that needs to be managed inside an organization continues to grow. More Sources (Document, news feeds, RSS, wiki, collaboration software, user created content, etc.) More Content Relevant information is buried in this mass of text Information is nothing if it's not managed. Luca Scagliarini - Expert System - 11-08-2007

  4. Content Predicaments Plague You? Consequences of information bottlenecks Outdated information Information black holes Lack of access to remote projects Wasted time Increase of costs Projects delayed Luca Scagliarini - Expert System - 11-08-2007

  5. The Cost of Content Predicaments Cost (direct and hidden) of manually organizing content continue to grow. The content management process is time-consuming, resource-intensive and mired with challenges that often don’t bring any benefits. Luca Scagliarini - Expert System - 11-08-2007

  6. Content Management Automation Effectively managed content for an organization means: • Information must be accessible to the right people when they need it. • The process to store the content should not become a source of ineffectiveness • Knowledge workers should be able to rapidly access the information that is relevant to them Content management automation is mainly about effectiveness (not cost reduction.) Rapid classification and quick access or retrieval of documents is strategic for any organization. Luca Scagliarini - Expert System - 11-08-2007

  7. Content Management Automation:Examples • Automatic classification and filtering of news feeds, email, online information, etc. • Automatic classification of newly created documents entered in the document management system • Faceted search • Automatic management of customer or internal FAQ • Automatic tagging of user created content • Etc.... Luca Scagliarini - Expert System - 11-08-2007

  8. Approaches to Content Management Automation: Statistic Algorithms • Use the relative frequencies and distribution of keywords (or series of characters) to infer the similarities among documents. • Require a training phase during which sample documents need to be classified manually so that they can drive the algorithm. • This selection of samples documents is critical for the success. • Plus: • Language independent • Minus: • Unable to understand the difference if the nodes in the taxonomy are similar • Work well only on limited taxonomy • Quality deteriorates quickly with the complexity of the taxonomy Luca Scagliarini - Expert System - 11-08-2007

  9. Approaches to Content Management Automation: Automatic Clustering • Very simple approach that uses again a statistical algorithm to automatically cluster documents. • It doesn’t require an initial training. • After the clustering is done the user manually labels the cluster. • Plus: • Language independent • Doesn’t require training • Minus: • Quality very low • Clusters are often irrelevant • Taxonomy decided after clustering is done Luca Scagliarini - Expert System - 11-08-2007

  10. Approaches to Content Management Automation: Semantic Intelligence • Expert System’s Semantic technology is based on an in-depth linguistic approach to unstructured information management. It analyzes textual information in a way that is similar to what people do when they read a document. • Since textual information is usually created to be read and understood by other people, an analytical approach similar to the one used by humans, is the best option to obtain high quality resultsin the automatic processing of unstructured textual information. • The organization defines the taxonomy and linguistic rules are defined to perform the categorization • Minus: • Language dependent • Plus: • High precision and recall with simple or complex multilevel taxonomies • Overall quality similar to results obtained from manual classification Luca Scagliarini - Expert System - 11-08-2007

  11. How it works OUTPUT INPUT PROCESSING Recognition and comprehension Clustering Quick visualization and retrieval of contents PDF FOLDER WEB FAX Semantic analysis of any text coming from different internal or externalsources and created in different formats resulting in: Comprehension of the meaning of all the concepts based on the context Identification of the relationships among the concepts and the semantic relevance of the concepts expressed in the text Classification of the content based on concepts and relevancy (same as when we read a document) Search and/or distribution of information to the different working areas and/or to different media (web, sms, wap, etc.). Luca Scagliarini - Expert System - 11-08-2007

  12. How can Expert System’s technology do that? Expert System’s proprietary and rich semantic net provides a rich representation of the language. Each concept is linked to the others through different semantic relationships. The semantic engine thoroughly analyzes sentences or whole documents and distinguishes the right meaning for each element found, eliminating possible ambiguities. Luca Scagliarini - Expert System - 11-08-2007

  13. Demo Luca Scagliarini - Expert System - 11-08-2007

  14. Q & A Luca Scagliarini - Expert System - 11-08-2007

  15. RCS MediaGroup success story Luca Scagliarini - Expert System - 11-08-2007

  16. Expert System for RCS MediaGroup RCS is the leading group of associated publishers in Italy, and one of the key groups of the European industry of information and communication. It is active in all the fields of press publishing, book and newspaper publishing, and always in leading position. Moreover the group operates successfully in the field of multimedia publishing. RCS has chosen a solution for automatic categorization of incoming news Luca Scagliarini - Expert System - 11-08-2007

  17. Document Management for a Complex Editorial Network OBJECTIVES • Optimize document management • Improve quality and reduce cost of article classification • BENEFITS • Automatic categorization of all incoming news based on international standard • Increased consistency in categorization process • Reduction in operating costs Luca Scagliarini - Expert System - 11-08-2007

  18. Real Time Categorization Every day, hundreds of articles are processed automatically in order to identify subject and to extract data related to places, companies, and people The system chooses, instantly and with great precision, from more than one thousand categories to match text subject, producing a template appropriately structured for the database of the search engine BENEFITS: Compared to manual cataloguing Expert System’s solution is more consistent, capable of working 24x7x365, operates steadily, allows easy transition into the document pipeline, and produces superior final results more quickly and precisely than any manual system. Luca Scagliarini - Expert System - 11-08-2007

  19. Results in Categorization Before: • 15 people doing manual categorization • 84% quality in categorization After: • 2 people overseeing automatic categorization • Yearly savings of more than $650,000 • Automatic categorization effectiveness surpassed manual categorization (85% vs. 84%) Luca Scagliarini - Expert System - 11-08-2007

  20. Examples Luca Scagliarini - Expert System - 11-08-2007

  21. Examples Documents are automatically and instantaneously classified in pre-defined categories with a high precision level. The user can find articles about one or more specific topics. Articles about the “Commercial aircraft 737-600” Luca Scagliarini - Expert System - 11-08-2007

  22. Examples Articles about “Defence Budget”” AND “Australia” Luca Scagliarini - Expert System - 11-08-2007

  23. Q & A Luca Scagliarini - Expert System - 11-08-2007

  24. Expert System • Privately held, profitable business, staff of more than 130 employees of which 65% engineers and R&D • Undisputed market leader in language technology in Italy and supplier of language technology to Microsoft • Pioneer in applying semantic technologies to language applications with hundred of man years in engineering and development • Customers in different segments including enterprises, homeland security, government organizations • Offices in Italy, Germany and Northern California MQ 2007 2006 Information Access Technology Quality and value in the management of unstructured information: tools based on semantic analysis 2007 24 Luca Scagliarini - Expert System - 11-08-2007

  25. Selected Private Sector Clients Media & Publishing Finance IT & Software Oil & Gas/ Manufacturing Telcos Logistics 25 Luca Scagliarini - Expert System - 11-08-2007

  26. Thank you Luca Scagliarini Cell +39 348 9043903 508 471 4981 lscagliarini@expertsystem.net www.expertsystem.net 26 Luca Scagliarini - Expert System - 11-08-2007

More Related