160 likes | 276 Views
Building a Foundation for Info Apps. Tom Reamy Chief Knowledge Architect KAPS Group Program Chair – Text Analytics World Knowledge Architecture Professional Services http://www.kapsgroup.com. Agenda. Introduction A Semantic Platform – What and Why Text Analytics – What and Why
E N D
Building a Foundation forInfo Apps Tom ReamyChief Knowledge Architect KAPS Group Program Chair – Text Analytics World Knowledge Architecture Professional Services http://www.kapsgroup.com
Agenda • Introduction • A Semantic Platform – What and Why • Text Analytics – What and Why • Getting Started with Text Analytics • Building on the Platform: • Search • Range of Apps • Conclusion
Introduction: KAPS Group • Knowledge Architecture Professional Services – Network of Consultants • Applied Theory – Faceted taxonomies, complexity theory, natural categories, emotion taxonomies • Services: • Strategy – IM & KM - Text Analytics, Social Media, Integration • Taxonomy/Text Analytics development, consulting, customization • Text Analytics Quick Start – Audit, Evaluation, Pilot • Social Media: Text based applications – design & development • Partners – Smart Logic, Expert Systems, SAS, SAP, IBM, FAST, Concept Searching, Attensity, Clarabridge, Lexalytics • Clients: • Genentech, Novartis, Northwestern Mutual Life, Financial Times, Hyatt, Home Depot, Harvard Business Library, British Parliament, Battelle, Amdocs, FDA, GAO, World Bank, etc. • Presentations, Articles, White Papers – www.kapsgroup.com
Building a Foundation for Info AppsWhat is a Semantic Platform? • Semantic Layer = Taxonomies, Metadata, Vocabularies + Text Analytics – adding cognitive science • Technology Layer • Search, Content Management, SharePoint, Intranets • Publishing process, multiple users & info needs • Hybrid human automatic structure (tagging) • Infrastructure – Not an Application • Business / Library / KM / EA and IT • Building on the Foundation • Info Apps (Search-based Applications) • Foundation of foundation – Text Analytics
Building a Foundation for Info AppsWhy a Semantic Platform • Search Failed – lack of semantics • Results of Find Wise survey – deep dissatisfaction • Ten years of development = ? • Content Management under-performing – lack of semantics • Taxonomy and Metadata – a solution but - Failed • Taxonomy – formal model of a domain • Library science good for some things – indexing, etc. • Semantics is about language, meaning, information • And structure = taxonomy Plus • Need cognitive science – how people think – Text Analytics • Solution = Strategic Vision + Quick Start
Building a Foundation for Info AppsText Analytics Features • Noun Phrase Extraction / Fact Extraction • Catalogs with variants, rule based dynamic • Relationships of entities – Ontologies of people-organizations, etc. • Sentiment Analysis – Products and Phrases • Statistics, Dictionaries, & rules – Positive and Negative • Summarization – replace snippets • Auto-categorization – built on a taxonomy • Training sets, Terms, Semantic Networks • Rules: AND, OR, NOT, DIST, PARAGRAPH, SENTENCE • Foundation – subjects, disambiguation, add intelligence to all • Ontologies – fact extraction + reasoning about relationships • Text Mining – NLP, machine learning, predictive analytics
Building a Foundation for Info AppsAdding Structure to Unstructured Content • How do you bridge the gap – taxonomy to documents? • Tagging documents with taxonomy nodes is tough • And expensive – central or distributed • Library staff –experts in categorization not subject matter • Too limited, narrow bottleneck • Often don’t understand business processes and business uses • Authors – Experts in the subject matter, terrible at categorization • Intra and Inter inconsistency, “intertwingleness” • Choosing tags from taxonomy – complex task • Folksonomy – almost as complex, wildly inconsistent • Resistance – not their job, cognitively difficult = non-compliance • Text Analytics is the answer(s)!
Building a Foundation for Info AppsAdding Structure to Unstructured Content • Text Analytics and Taxonomy Together – Platform • Text Analytics provides the power to apply the taxonomy • And metadata of all kinds • Consistent in every dimension, powerful and economic • Hybrid Model • Publish Document -> Text Analytics analysis -> suggestions for categorization, entities, metadata - > present to author • Cognitive task is simple -> react to a suggestion instead of select from head or a complex taxonomy • Feedback – if author overrides -> suggestion for new category • Facets – Requires a lot of Metadata - Entity Extraction feeds facets • Hybrid – Automatic is really a spectrum – depends on context • Automatic – adding structure at search results
Quick Start for Text Analytics Step 1 : Start with Self Knowledge • Ideas – Content and Content Structure • Map of Content – Tribal language silos • Structure – articulate and integrate • Taxonomic resources • People – Producers & Consumers • Communities, Users, Central Team • Activities – Business processes and procedures • Semantics, information needs and behaviors • Information Governance Policy • Technology • CMS, Search, portals, text analytics • Applications – BI, CI, Semantic Web, Text Mining
Quick Start for Text AnalyticsStep 2: Software Evaluation: Different Type of Evaluation • Traditional Software Evaluation - Start • Filter One- Ask Experts - reputation, research – Gartner, etc. • Market strength of vendor, platforms, etc. • Feature scorecard – minimum, must have, filter to top 6 • Filter Two – Technology Filter – match to your overall scope and capabilities – Filter not a focus • Filter Three – In-Depth Demo – 3-6 vendors • Reduce to 1-3 vendors • Vendors have different strengths in multiple environments • Millions of short, badly typed documents, Build application • Library 200 page PDF, enterprise & public search • Essential Step – POC or Pilot – search or first Info App
Quick Start for Text AnalyticsStep 3: Proof of Concept / Quick Start • POC – understand how text analytics can work in your environment • Learn the software – internal resources trained by doing • Learn the language – syntax (Advanced Boolean) • Learn categorization and extraction • Good categorization rules • Balance of general and specific • Balance of recall and precision • Develop or refine taxonomies for categorization • POC – can be the Quick Start or the First Application
Development, ImplementationQuick Start – First Application: Search and TA • Simple Subject Taxonomy structure • Easy to develop and maintain • Combined with categorization capabilities • Added power and intelligence • Combined with people tagging, refining tags • Combined with Faceted Metadata • Dynamic selection of simple categories • Allow multiple user perspectives • Can’t predict all the ways people think • Monkey, Banana, Panda • Combined with ontologies and semantic data • Multiple applications – Text mining to Search • Combine search and browse
Building a Foundation for Info AppsWhat are Info Apps? • Search-based Applications Plus • E-Discovery, Behavior Prediction, document duplication, BI & CI, etc. • Legal Review • Significant trend – computer-assisted review (manual =too many) • TA- categorize and filter to smaller, more relevant set • Payoff is big – One firm with 1.6 M docs – saved $2M • Expertise Location • Data (HR, project) plus text – authored documents – subject & level • Financial Services • Combine unstructured text (why) and structured data (what) • Anti-Money Laundering
Building a Foundation for Info AppsPronoun Analysis: Fraud Detection - Enron Emails • Patterns of “Function” words reveal wide range of insights • Function words = pronouns, articles, prepositions, conjunctions, etc. • Used at a high rate, short and hard to detect, very social, processed in the brain differently than content words • Areas: sex, age, power-status, personality – individuals and groups • Lying / Fraud detection: Documents with lies have • Fewer and shorter words, fewer conjunctions, more positive emotion words • More use of “if, any, those, he, she, they, you”, less “I” • More social and causal words, more discrepancy words • Current research – 76% accuracy in some contexts • Text Analytics can improve accuracy and utilize new sources
Conclusions • Info Apps based on search and search needs help • Text analytics with taxonomy & metadata = semantic platform • Formal and informal language and cognition • Semantic Infrastructure • Knowledge Audit -> Content, People, Technology, Processes • Strategic Vision • Integration of text analytics search, content management • Hybrid Model of tagging – best of human & machine • Build integrated Info Apps • Platform vs. Apps = Yes • Thing Big (Semantics), Build Small, Build Integrated
Questions? Tom Reamytomr@kapsgroup.com KAPS Group Knowledge Architecture Professional Services http://www.kapsgroup.com