310 likes | 812 Views
Text Analytics And Text Mining Best of Text and Data. Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services http://www.kapsgroup.com. Agenda. Text Analytics Capabilities Text Analytics Applications Text Mining and Text Analytics
E N D
Text Analytics And Text MiningBest of Text and Data Tom ReamyChief Knowledge Architect KAPS Group Knowledge Architecture Professional Services http://www.kapsgroup.com
Agenda • Text Analytics Capabilities • Text Analytics Applications • Text Mining and Text Analytics • Data and Unstructured Content • Case Study – Text Mining for Taxonomy Development • Conclusion
KAPS Group: General • Knowledge Architecture Professional Services • Virtual Company: Network of consultants – 8-10 • Partners – SAS, Smart Logic, Microsoft-FAST, Concept Searching, etc. • Consulting, Strategy, Knowledge architecture audit • Services: • Text Analytics evaluation, development, consulting, customization • Knowledge Representation – taxonomy, ontology, Prototype • Metadata standards and implementation • Knowledge Management: Collaboration, Expertise, e-learning • Applied Theory – Faceted taxonomies, complexity theory, natural categories
Introduction to Text AnalyticsText Analytics Features • Noun Phrase Extraction • Catalogs with variants, rule based dynamic • Multiple types, custom classes – entities, concepts, events • Feeds facets • Summarization • Customizable rules, map to different content • Fact Extraction • Relationships of entities – people-organizations-activities • Ontologies – triples, RDF, etc. • Sentiment Analysis • Statistical, rules – full categorization set of operators
Introduction to Text AnalyticsText Analytics Features • Auto-categorization • Training sets – Bayesian, Vector space • Terms – literal strings, stemming, dictionary of related terms • Rules – simple – position in text (Title, body, url) • Semantic Network – Predefined relationships, sets of rules • Boolean– Full search syntax – AND, OR, NOT • Advanced – NEAR (#), PARAGRAPH, SENTENCE • This is the most difficult to develop • Build on a Taxonomy • Combine with Extraction, Sentiment • Foundation for best text analytics & combination
Varieties of Taxonomy/ Text Analytics Software • Taxonomy Management • Synaptica, SchemaLogic • Full Platform • SAS-Teragram, SAP-Inxight, Smart Logic, Data Harmony, Concept Searching, Expert System, IBM, GATE • Content Management – embedded • Embedded – Search • FAST, Autonomy, Endeca, Exalead, etc. • Specialty • Sentiment Analysis , VOC – Lexalytics, Attensity / Reports • Ontology – extraction, plus ontology
Text Analytics ApplicationsPlatform for Multiple Applications • Content Aggregation, Duplicate Documents – save millions! • Business intelligence, Customer Intelligence • Social Media - sentiment analysis, Voice of the Customer • Social – Hybrid folksonomy / taxonomy / auto-metadata • Social – expertise, categorize tweets and blogs, reputation • Ontology – travel assistant, semantic web, etc. • eDiscovery, Reputation management, Customer Experience • Expertise Location, Crowd sourcing Technical support
Text Analytics Applications:Enterprise Search - Elements • Text Analytics can “solve” enterprise search • Multiple Knowledge Structures • Facet – orthogonal dimension of metadata • Taxonomy - Subject matter / aboutness • Software - Search, ECM, auto-categorization, entity extraction, Text Analytics and Text Mining • People – tagging, evaluating tags, fine tune rules and taxonomy • Rich Search Results – context and conversation • Platform for search based applications
Text Analytics and Text MiningData and Unstructured Content • 80% of content is unstructured – adding to semantic web is major • Text Analytics – content into data • Big Data meets Big Content • Real integration of text and ontology • Beyond “hasDescription” • Improve accuracy of extracted entities, facts – disambiguation • Pipeline – oil & gas OR research / Ford • Add Concepts, not just “Things” – 68% want this • Semantic Web + Text Analytics = real world value • Linked Data + Text Analytics – best of both worlds • Build superior foundation elements – taxonomies, categorization
Text Analytics and Text Mining and Data MiningVaccine Adverse Reaction • Combine with Data Mining • New sources of information • News stories, medical records • Blogs, social • Find new connections, sources of knowledge • Vaccine Adverse Effects – disease, symptoms, variables • Unstructured text into a data source • Some preliminary analysis, content structure • Find unknown adverse effects and prevalence • Drug Discovery + search / research – 5 year story
Text Analytics ApplicationsExample – Vaccine Adverse Effects
Text Analytics ApplicationsExample – Vaccine Adverse Effects
Text Analytics ApplicationsExample – Vaccine Adverse Effects
Text Analytics and Text MiningCase Study – Taxonomy Development Problem – 200,000 new uncategorized documents Old taxonomy –need one that reflects change in corpus Text mining, entity extraction, categorization Bottom Up- terms in documents – frequency, date, Clustering – suggested categories Clustering – chunking for editors Time savings – only feasible way to scan documents Quality – important terms, co-occurring terms
Text Analytics and Text MiningCase Study – Taxonomy Development • Text into Data: Article, Abstract, Title, Subtitle – fields & source of terms • Add Data: PubDate, journalTitle, Taxonomy Node • Terms – Map to frequency, date, date ranges, Taxonomy Node • New Terms, Trends • Relevance – frequency, Abstract, Title, human judgment • Entity Extraction – Authors, Organizations, Products, • Categorization – build on clusters & taxonomy • Combination – reports, visualizations, interactive explorations
Conclusion • Text Analytics impact is huge – solve information overload • Enterprise Search and Search Based Applications: Save millions and enhance productivity • Combination of Text Analytics & Text Mining – unlimited range of applications • Mutual Enrichment – more data, add structure to unstructured • Add Ontology = Richer Text Analytics – smarter, more useful • Text Analytics + Text Mining + Semantic Web • Move from theory to new practical applications • The best is yet to come!
Questions? Tom Reamytomr@kapsgroup.com KAPS Group Knowledge Architecture Professional Services http://www.kapsgroup.com