400 likes | 616 Views
Best of All Worlds Text Analytics and Text Mining and Taxonomy. Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services http://www.kapsgroup.com. Agenda. Text Analytics Introduction Text Analytics Text Mining Case Study – Taxonomy Development
E N D
Best of AllWorlds Text Analytics and Text Mining and Taxonomy Tom ReamyChief Knowledge Architect KAPS Group Knowledge Architecture Professional Services http://www.kapsgroup.com
Agenda • Text Analytics Introduction • Text Analytics • Text Mining • Case Study – Taxonomy Development • Text Analytics, Text Mining, and Taxonomy, • Text Analytics Applications – New Directions • Search & Info Apps • Expertise Analysis, Behavior Prediction, More • Conclusions
KAPS Group: General • Knowledge Architecture Professional Services – Network of Consultants • Partners – SAS, SAP, IBM, FAST, Smart Logic, Concept Searching • Attensity, Clarabridge, Lexalytics, • Strategy – IM & KM - Text Analytics, Social Media, Integration • Services: • Taxonomy/Text Analytics development, consulting, customization • Text Analytics Quick Start – Audit, Evaluation, Pilot • Social Media: Text based applications – design & development • Clients: • Genentech, Novartis, Northwestern Mutual Life, Financial Times, Hyatt, Home Depot, Harvard Business Library, British Parliament, Battelle, Amdocs, FDA, GAO, etc. • Applied Theory – Faceted taxonomies, complexity theory, natural categories, emotion taxonomies Presentations, Articles, White Papers – http://www.kapsgroup.com
Taxonomy, Text Mining, and Text AnalyticsText Analytics Features • Noun Phrase Extraction • Catalogs with variants, rule based dynamic • Multiple types, custom classes – entities, concepts, events • Feeds facets • Summarization • Customizable rules, map to different content • Fact Extraction • Relationships of entities – people-organizations-activities • Ontologies – triples, RDF, etc. • Sentiment Analysis • Rules – Objects and phrases – positive and negative
Taxonomy, Text Mining, and Text AnalyticsText Analytics Features • Auto-categorization • Training sets – Bayesian, Vector space • Terms – literal strings, stemming, dictionary of related terms • Rules – simple – position in text (Title, body, url) • Semantic Network – Predefined relationships, sets of rules • Boolean– Full search syntax – AND, OR, NOT • Advanced – DIST (#), PARAGRAPH, SENTENCE • This is the most difficult to develop • Build on a Taxonomy • Combine with Extraction • If any of list of entities and other words
Taxonomy, Text Mining, and Text AnalyticsCase Study – Taxonomy Development Problem – 200,000 new uncategorized documents Old taxonomy –need one that reflects change in corpus Text mining, entity extraction, categorization Content – 250,000 large documents, search logs, etc. Bottom Up- terms in documents – frequency, date, Clustering – suggested categories Clustering – chunking for editors Entity Extraction – people, organizations, Programming languages Time savings – only feasible way to scan documents Quality – important terms, co-occurring terms
New Directions in Social MediaText Analytics, Text Mining, and Predictive Analytics • Two Systems of the Brain • Fast, System 1, Immediate patterns (TM) • Slow, System 2, Conceptual, reasoning (TA) • Text Analytics – pre-processing for TM • Discover additional structure in unstructured text • Behavior Prediction – adding depth in individual documents • New variables for Predictive Analytics, Social Media Analytics • New dimensions – 90% of information • Text Mining for TA– Semi-automated taxonomy development • Bottom Up- terms in documents – frequency, date, clustering • Improve speed and quality – semi-automatic
Text Analytics and TaxonomyComplimentary Information Platform • Taxonomy provides a consistent and common vocabulary • Enterprise resource – integrated not centralized • Text Analytics provides a consistent tagging • Human indexing is subject to inter and intra individual variation • Taxonomy provides the basic structure for categorization • And candidates terms • Text Analytics provides the power to apply the taxonomy • And metadata of all kinds • Text Analytics and Taxonomy Together – Platform • Consistent in every dimension • Powerful and economic
Taxonomy, Text Mining, and Text AnalyticsMetadata – Tagging – the Problem • How do you bridge the gap – taxonomy to documents? • Tagging documents with taxonomy nodes is tough • And expensive – central or distributed • Library staff –experts in categorization not subject matter • Too limited, narrow bottleneck • Often don’t understand business processes and business uses • Authors – Experts in the subject matter, terrible at categorization • Intra and Inter inconsistency, “intertwingleness” • Choosing tags from taxonomy – complex task • Folksonomy – almost as complex, wildly inconsistent • Resistance – not their job, cognitively difficult = non-compliance • Text Analytics is the answer(s)!
Taxonomy, Text Mining, and Text AnalyticsMetadata Tagging – the Solution • Mind the Gap – Manual, Automatic, Hybrid • All require human effort – issue of where and how effective • Manual - human effort is tagging (difficult, inconsistent) • Automatic and Hybrid - human effort is prior to tagging • Build on expertise – librarians on categorization, SME’s on subject terms • Hybrid Model • Publish Document -> Text Analytics analysis -> suggestions for categorization, entities, metadata - > present to author • Cognitive task is simple -> react to a suggestion instead of select from head or a complex taxonomy • Feedback – if author overrides -> suggestion for new category • Facets – Requires a lot of Metadata - Entity Extraction feeds facets • Hybrid – Automatic is really a spectrum – depends on context
Taxonomy, Text Mining, and Text AnalyticsApplications: Search • Multiple Knowledge Structures • Facet – orthogonal dimension of metadata • Taxonomy - Subject matter / aboutness • Ontology – Relationships / Facts • Subject – Verb - Object • Software - Search, ECM, auto-categorization, entity extraction, Text Analytics and Text Mining • People – tagging, evaluating tags, fine tune rules and taxonomy • People – Users, social tagging, suggestions • Rich Search Results – context and conversation
Taxonomy, Text Mining, and Text AnalyticsApplications: Search-Based Applications • Platform for Information Applications • Content Aggregation • Duplicate Documents – save millions! • Text Mining – BI, CI – sentiment analysis • Combine with Data Mining – disease symptoms, new • Predictive Analytics • Social – Hybrid folksonomy / taxonomy / auto-metadata • Social – expertise, categorize tweets and blogs, reputation • Ontology – travel assistant – SIRI • Use your Imagination!
Taxonomy, Text Mining, and Text AnalyticsApplications: Expertise Analysis • Sentiment Analysis to Expertise Analysis(KnowHow) • Know How, skills, “tacit” knowledge • Experts write and think differently • Basic level is lower, more specific • Levels: Superordinate – Basic – Subordinate • Mammal – Dog – Golden Retriever • Furniture – chair – kitchen chair • Experts organize information around processes, not subjects • Build expertise categorization rules
Taxonomy, Text Mining, and Text AnalyticsExpertise – application areas • Taxonomy / Ontology development /design – audience focus • Card sorting – non-experts use superficial similarities • Business & Customer intelligence – add expertise to sentiment • Deeper research into communities, customers • Text Mining - Expertise characterization of writer, corpus • eCommerce – Organization/Presentation of information – expert, novice • Expertise location- Generate automatic expertise characterization based on documents • Experiments - Pronoun Analysis – personality types • Essay Evaluation Software - Apply to expertise characterization • Model levels of chunking, procedure words over content
Beyond Sentiment: Behavior PredictionCase Study – Telecom Customer Service • Problem – distinguish customers likely to cancel from mere threats • Analyze customer support notes • General issues – creative spelling, second hand reports • Develop categorization rules • First – distinguish cancellation calls – not simple • Second - distinguish cancel what – one line or all • Third – distinguish real threats
Beyond SentimentBehavior Prediction – Case Study • Basic Rule • (START_20, (AND, • (DIST_7,"[cancel]", "[cancel-what-cust]"), • (NOT,(DIST_10, "[cancel]", (OR, "[one-line]", "[restore]", “[if]”))))) • Examples: • customer called to say he will cancell his account if the does not stop receiving a call from the ad agency. • cci and is upset that he has the asl charge and wants it offor her is going to cancel his act • ask about the contract expiration date as she wanted to cxltehacct Combine sophisticated rules with sentiment statistical training and Predictive Analytics
Beyond Sentiment - Wisdom of CrowdsCrowd Sourcing Technical Support • Example – Android User Forum • Develop a taxonomy of products, features, problem areas • Develop Categorization Rules: • “I use the SDK method and it isn't to bad a all. I'll get some pics up later, I am still trying to get the time to update from fresh 1.0 to 1.1.” • Find product & feature – forum structure • Find problem areas in response, nearby text for solution • Automatic – simply expose lists of “solutions” • Search Based application • Human mediated – experts scan and clean up solutions
Taxonomy, Text Mining, and Text Analytics Conclusions • Text Analytics is an essential platform for multiple applications • Text Analytics and Text Mining and Taxonomy are mutually enriching approaches • Sentiment Analysis, Beyond Positive & Negative • New emotion taxonomies, context around terms • New applications – Expertise, behavior prediction, etc. • Future – new kinds of applications: • Enterprise Search – Hybrid ECM model with text analytics • Expertise Analysis, Behavior Prediction, and more • Social Media and Big Data built from TM & TA • NeuroAnalytics– cognitive science meets taxonomy and more • Watson is just the start
Questions? Tom Reamytomr@kapsgroup.com KAPS Group Knowledge Architecture Professional Services http://www.kapsgroup.com
Resources • Books • Women, Fire, and Dangerous Things • George Lakoff • Knowledge, Concepts, and Categories • Koen Lamberts and David Shanks • Formal Approaches in Categorization • Ed. Emmanuel Pothos and Andy Wills • The Mind • Ed John Brockman • Good introduction to a variety of cognitive science theories, issues, and new ideas • Any cognitive science book written after 2009
Resources • Conferences – Web Sites • Text Analytics World • http://www.textanalyticsworld.com • Text Analytics Summit • http://www.textanalyticsnews.com • Semtech • http://www.semanticweb.com
Resources • Blogs • SAS- http://blogs.sas.com/text-mining/ • LinkedIn Groups: • Text Analytics World • Text Analytics Group • Data and Text Professionals • Sentiment Analysis • Metadata Management • Semantic Technologies
Resources • Web Sites • Taxonomy Community of Practice: http://finance.groups.yahoo.com/group/TaxoCoP/ • Whitepaper – CM and Text Analytics - http://www.textanalyticsnews.com/usa/contentmanagementmeetstextanalytics.pdf • Whitepaper – Enterprise Content Categorization strategy and development – http://www.kapsgroup.com
Resources • Articles • Malt, B. C. 1995. Category coherence in cross-cultural perspective. Cognitive Psychology 29, 85-148 • Rifkin, A. 1985. Evidence for a basic level in event taxonomies. Memory & Cognition 13, 538-56 • Shaver, P., J. Schwarz, D. Kirson, D. O’Conner 1987. Emotion Knowledge: further explorations of prototype approach. Journal of Personality and Social Psychology 52, 1061-1086 • Tanaka, J. W. & M. E. Taylor 1991. Object categories and expertise: is the basic level in the eye of the beholder? Cognitive Psychology 23, 457-82