1 / 16

Building a Foundation for Info Apps

Building a Foundation for Info Apps. Tom Reamy Chief Knowledge Architect KAPS Group Program Chair – Text Analytics World Knowledge Architecture Professional Services http://www.kapsgroup.com. Agenda. Introduction A Semantic Platform – What and Why Text Analytics – What and Why

sen
Download Presentation

Building a Foundation for Info Apps

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Building a Foundation forInfo Apps Tom ReamyChief Knowledge Architect KAPS Group Program Chair – Text Analytics World Knowledge Architecture Professional Services http://www.kapsgroup.com

  2. Agenda • Introduction • A Semantic Platform – What and Why • Text Analytics – What and Why • Getting Started with Text Analytics • Building on the Platform: • Search • Range of Apps • Conclusion

  3. Introduction: KAPS Group • Knowledge Architecture Professional Services – Network of Consultants • Applied Theory – Faceted taxonomies, complexity theory, natural categories, emotion taxonomies • Services: • Strategy – IM & KM - Text Analytics, Social Media, Integration • Taxonomy/Text Analytics development, consulting, customization • Text Analytics Quick Start – Audit, Evaluation, Pilot • Social Media: Text based applications – design & development • Partners – Smart Logic, Expert Systems, SAS, SAP, IBM, FAST, Concept Searching, Attensity, Clarabridge, Lexalytics • Clients: • Genentech, Novartis, Northwestern Mutual Life, Financial Times, Hyatt, Home Depot, Harvard Business Library, British Parliament, Battelle, Amdocs, FDA, GAO, World Bank, etc. • Presentations, Articles, White Papers – www.kapsgroup.com

  4. Building a Foundation for Info AppsWhat is a Semantic Platform? • Semantic Layer = Taxonomies, Metadata, Vocabularies + Text Analytics – adding cognitive science • Technology Layer • Search, Content Management, SharePoint, Intranets • Publishing process, multiple users & info needs • Hybrid human automatic structure (tagging) • Infrastructure – Not an Application • Business / Library / KM / EA and IT • Building on the Foundation • Info Apps (Search-based Applications) • Foundation of foundation – Text Analytics

  5. Building a Foundation for Info AppsWhy a Semantic Platform • Search Failed – lack of semantics • Results of Find Wise survey – deep dissatisfaction • Ten years of development = ? • Content Management under-performing – lack of semantics • Taxonomy and Metadata – a solution but - Failed • Taxonomy – formal model of a domain • Library science good for some things – indexing, etc. • Semantics is about language, meaning, information • And structure = taxonomy Plus • Need cognitive science – how people think – Text Analytics • Solution = Strategic Vision + Quick Start

  6. Building a Foundation for Info AppsText Analytics Features • Noun Phrase Extraction / Fact Extraction • Catalogs with variants, rule based dynamic • Relationships of entities – Ontologies of people-organizations, etc. • Sentiment Analysis – Products and Phrases • Statistics, Dictionaries, & rules – Positive and Negative • Summarization – replace snippets • Auto-categorization – built on a taxonomy • Training sets, Terms, Semantic Networks • Rules: AND, OR, NOT, DIST, PARAGRAPH, SENTENCE • Foundation – subjects, disambiguation, add intelligence to all • Ontologies – fact extraction + reasoning about relationships • Text Mining – NLP, machine learning, predictive analytics

  7. Building a Foundation for Info AppsAdding Structure to Unstructured Content • How do you bridge the gap – taxonomy to documents? • Tagging documents with taxonomy nodes is tough • And expensive – central or distributed • Library staff –experts in categorization not subject matter • Too limited, narrow bottleneck • Often don’t understand business processes and business uses • Authors – Experts in the subject matter, terrible at categorization • Intra and Inter inconsistency, “intertwingleness” • Choosing tags from taxonomy – complex task • Folksonomy – almost as complex, wildly inconsistent • Resistance – not their job, cognitively difficult = non-compliance • Text Analytics is the answer(s)!

  8. Building a Foundation for Info AppsAdding Structure to Unstructured Content • Text Analytics and Taxonomy Together – Platform • Text Analytics provides the power to apply the taxonomy • And metadata of all kinds • Consistent in every dimension, powerful and economic • Hybrid Model • Publish Document -> Text Analytics analysis -> suggestions for categorization, entities, metadata - > present to author • Cognitive task is simple -> react to a suggestion instead of select from head or a complex taxonomy • Feedback – if author overrides -> suggestion for new category • Facets – Requires a lot of Metadata - Entity Extraction feeds facets • Hybrid – Automatic is really a spectrum – depends on context • Automatic – adding structure at search results

  9. Quick Start for Text Analytics Step 1 : Start with Self Knowledge • Ideas – Content and Content Structure • Map of Content – Tribal language silos • Structure – articulate and integrate • Taxonomic resources • People – Producers & Consumers • Communities, Users, Central Team • Activities – Business processes and procedures • Semantics, information needs and behaviors • Information Governance Policy • Technology • CMS, Search, portals, text analytics • Applications – BI, CI, Semantic Web, Text Mining

  10. Quick Start for Text AnalyticsStep 2: Software Evaluation: Different Type of Evaluation • Traditional Software Evaluation - Start • Filter One- Ask Experts - reputation, research – Gartner, etc. • Market strength of vendor, platforms, etc. • Feature scorecard – minimum, must have, filter to top 6 • Filter Two – Technology Filter – match to your overall scope and capabilities – Filter not a focus • Filter Three – In-Depth Demo – 3-6 vendors • Reduce to 1-3 vendors • Vendors have different strengths in multiple environments • Millions of short, badly typed documents, Build application • Library 200 page PDF, enterprise & public search • Essential Step – POC or Pilot – search or first Info App

  11. Quick Start for Text AnalyticsStep 3: Proof of Concept / Quick Start • POC – understand how text analytics can work in your environment • Learn the software – internal resources trained by doing • Learn the language – syntax (Advanced Boolean) • Learn categorization and extraction • Good categorization rules • Balance of general and specific • Balance of recall and precision • Develop or refine taxonomies for categorization • POC – can be the Quick Start or the First Application

  12. Development, ImplementationQuick Start – First Application: Search and TA • Simple Subject Taxonomy structure • Easy to develop and maintain • Combined with categorization capabilities • Added power and intelligence • Combined with people tagging, refining tags • Combined with Faceted Metadata • Dynamic selection of simple categories • Allow multiple user perspectives • Can’t predict all the ways people think • Monkey, Banana, Panda • Combined with ontologies and semantic data • Multiple applications – Text mining to Search • Combine search and browse

  13. Building a Foundation for Info AppsWhat are Info Apps? • Search-based Applications Plus • E-Discovery, Behavior Prediction, document duplication, BI & CI, etc. • Legal Review • Significant trend – computer-assisted review (manual =too many) • TA- categorize and filter to smaller, more relevant set • Payoff is big – One firm with 1.6 M docs – saved $2M • Expertise Location • Data (HR, project) plus text – authored documents – subject & level • Financial Services • Combine unstructured text (why) and structured data (what) • Anti-Money Laundering

  14. Building a Foundation for Info AppsPronoun Analysis: Fraud Detection - Enron Emails • Patterns of “Function” words reveal wide range of insights • Function words = pronouns, articles, prepositions, conjunctions, etc. • Used at a high rate, short and hard to detect, very social, processed in the brain differently than content words • Areas: sex, age, power-status, personality – individuals and groups • Lying / Fraud detection: Documents with lies have • Fewer and shorter words, fewer conjunctions, more positive emotion words • More use of “if, any, those, he, she, they, you”, less “I” • More social and causal words, more discrepancy words • Current research – 76% accuracy in some contexts • Text Analytics can improve accuracy and utilize new sources

  15. Conclusions • Info Apps based on search and search needs help • Text analytics with taxonomy & metadata = semantic platform • Formal and informal language and cognition • Semantic Infrastructure • Knowledge Audit -> Content, People, Technology, Processes • Strategic Vision • Integration of text analytics search, content management • Hybrid Model of tagging – best of human & machine • Build integrated Info Apps • Platform vs. Apps = Yes • Thing Big (Semantics), Build Small, Build Integrated

  16. Questions? Tom Reamytomr@kapsgroup.com KAPS Group Knowledge Architecture Professional Services http://www.kapsgroup.com

More Related