220 likes | 374 Views
Taxonomy Boot Camp Panel Text Analytics. Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services http://www.kapsgroup.com. Agenda. Taxonomy and Text Analytics Search, Taxonomy, and Text Analytics Case Study – Taxonomy Development
E N D
Taxonomy Boot Camp PanelText Analytics Tom ReamyChief Knowledge Architect KAPS Group Knowledge Architecture Professional Services http://www.kapsgroup.com
Agenda • Taxonomy and Text Analytics • Search, Taxonomy, and Text Analytics • Case Study – Taxonomy Development • Text Analytics as a Taxonomy tool • Case Studies – Expertise & Sentiment & Beyond • Future of Text Analytics and Taxonomy • Beyond Indexing - Categorization • Sentiment, Expertise, Ontologies
Taxonomy and Text AnalyticsText Analytics Features • Noun Phrase Extraction • Catalogs with variants, rule based dynamic • Multiple types, custom classes – entities, concepts, events • Feeds facets • Summarization • Customizable rules, map to different content • Fact Extraction • Relationships of entities – people-organizations-activities • Ontologies – triples, RDF, etc. • Sentiment Analysis • Rules – Objects and phrases – positive and negative
Taxonomy and Text Analytics Text Analytics Features • Auto-categorization • Training sets – Bayesian, Vector space • Terms – literal strings, stemming, dictionary of related terms • Rules – simple – position in text (Title, body, url) • Semantic Network – Predefined relationships, sets of rules • Boolean– Full search syntax – AND, OR, NOT • Advanced – DIST (#), PARAGRAPH, SENTENCE • This is the most difficult to develop • Build on a Taxonomy • Combine with Extraction • If any of list of entities and other words
Search, Taxonomy, and Text AnalyticsElements • Multiple Knowledge Structures • Facet – orthogonal dimension of metadata • Taxonomy - Subject matter / aboutness • Categorization, clusters, entity extraction into facets • A Hybrid Model of ECM and Metadata • Authors, editors-librarians, Text Analytics • Submit a document -> TA generates metadata, extracts concepts, Suggests categorization (keywords) -> author OK’s (easy task) -> librarian monitors for issues • Use results as input into analytics • And/or Dynamic categorization-extraction at results time
Search, Taxonomy and Text Analytics Multiple Applications • Platform for Information Applications • Content Aggregation • Duplicate Documents – save millions! • Text Mining – BI, CI – sentiment analysis • Combine with Data Mining – disease symptoms, new • Predictive Analytics • Social – Hybrid folksonomy / taxonomy / auto-metadata • Social – expertise, categorize tweets and blogs, reputation • Ontology – travel assistant – SIRI • Use your Imagination!
Taxonomy and Text AnalyticsCase Study – Taxonomy Development Problem – 200,000 new uncategorized documents Old taxonomy –need one that reflects change in corpus Text mining, entity extraction, categorization Content – 250,000 large documents, search logs, etc. Bottom Up- terms in documents – frequency, date, Clustering – suggested categories Clustering – chunking for editors Entity Extraction – people, organizations, Programming languages Time savings – only feasible way to scan documents Quality – important terms, co-occurring terms
Taxonomy and Text Analytics ApplicationsExpertise Analysis • Sentiment Analysis to Expertise Analysis(KnowHow) • Know How, skills, “tacit” knowledge • Experts write and think differently • Basic level is lower, more specific • Levels: Superordinate – Basic – Subordinate • Mammal – Dog – Golden Retriever • Furniture – chair – kitchen chair • Experts organize information around processes, not subjects • Build expertise categorization rules
Expertise Analysis Expertise – application areas • Taxonomy / Ontology development /design – audience focus • Card sorting – non-experts use superficial similarities • Business & Customer intelligence – add expertise to sentiment • Deeper research into communities, customers • Text Mining - Expertise characterization of writer, corpus • eCommerce – Organization/Presentation of information – expert, novice • Expertise location- Generate automatic expertise characterization based on documents • Experiments - Pronoun Analysis – personality types • Essay Evaluation Software - Apply to expertise characterization • Model levels of chunking, procedure words over content
Beyond Sentiment: Behavior PredictionCase Study – Telecom Customer Service • Problem – distinguish customers likely to cancel from mere threats • Analyze customer support notes • General issues – creative spelling, second hand reports • Develop categorization rules • First – distinguish cancellation calls – not simple • Second - distinguish cancel what – one line or all • Third – distinguish real threats
Beyond SentimentBehavior Prediction – Case Study • Basic Rule • (START_20, (AND, • (DIST_7,"[cancel]", "[cancel-what-cust]"), • (NOT,(DIST_10, "[cancel]", (OR, "[one-line]", "[restore]", “[if]”))))) • Examples: • customer called to say he will cancell his account if the does not stop receiving a call from the ad agency. • cci and is upset that he has the asl charge and wants it offor her is going to cancel his act • ask about the contract expiration date as she wanted to cxltehacct Combine sophisticated rules with sentiment statistical training and Predictive Analytics
Beyond Sentiment - Wisdom of CrowdsCrowd Sourcing Technical Support • Example – Android User Forum • Develop a taxonomy of products, features, problem areas • Develop Categorization Rules: • “I use the SDK method and it isn't to bad a all. I'll get some pics up later, I am still trying to get the time to update from fresh 1.0 to 1.1.” • Find product & feature – forum structure • Find problem areas in response, nearby text for solution • Automatic – simply expose lists of “solutions” • Search Based application • Human mediated – experts scan and clean up solutions
Text Analytics Development Best Practices - Principles • Categorization taxonomy structure • Tradeoff of depth and complexity of rules • Multiple avenues – facets, terms, rules, etc. • No right balance • Recall-precision balance is application specific • Training sets of starting points, rules rule • Need for custom development • Different kinds of taxonomies • Sentiment – products and features • Expertise – process • Categorization – smaller – power in categorization rules • Facets – combine – more orthogonal categories
Taxonomy and Text Analytics Conclusions • Text Analytics (Entity extraction and auto-categorization, sentiment analysis) are an essential platform • Text Analytics add a new dimension to taxonomy • Taxonomists are an essential resource – understand information structure • Enterprise Search – Hybrid ECM model with text analytics • Future – new kinds of applications: • Text Mining and Data mining, research tools, sentiment • Social Media – multiple sources for multiple applications • Beyond Sentiment – expertise applications, behavior • NeuroAnalytics – cognitive science meets taxonomy and more • Watson is just the start
Questions? Tom Reamytomr@kapsgroup.com KAPS Group Knowledge Architecture Professional Services http://www.kapsgroup.com