1 / 30

Text Analytics And Text Mining Best of Text and Data

Text Analytics And Text Mining Best of Text and Data. Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services http://www.kapsgroup.com. Agenda. Text Analytics Capabilities Text Analytics Applications Text Mining and Text Analytics

Download Presentation

Text Analytics And Text Mining Best of Text and Data

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Text Analytics And Text MiningBest of Text and Data Tom ReamyChief Knowledge Architect KAPS Group Knowledge Architecture Professional Services http://www.kapsgroup.com

  2. Agenda • Text Analytics Capabilities • Text Analytics Applications • Text Mining and Text Analytics • Data and Unstructured Content • Case Study – Text Mining for Taxonomy Development • Conclusion

  3. KAPS Group: General • Knowledge Architecture Professional Services • Virtual Company: Network of consultants – 8-10 • Partners – SAS, Smart Logic, Microsoft-FAST, Concept Searching, etc. • Consulting, Strategy, Knowledge architecture audit • Services: • Text Analytics evaluation, development, consulting, customization • Knowledge Representation – taxonomy, ontology, Prototype • Metadata standards and implementation • Knowledge Management: Collaboration, Expertise, e-learning • Applied Theory – Faceted taxonomies, complexity theory, natural categories

  4. Introduction to Text AnalyticsText Analytics Features • Noun Phrase Extraction • Catalogs with variants, rule based dynamic • Multiple types, custom classes – entities, concepts, events • Feeds facets • Summarization • Customizable rules, map to different content • Fact Extraction • Relationships of entities – people-organizations-activities • Ontologies – triples, RDF, etc. • Sentiment Analysis • Statistical, rules – full categorization set of operators

  5. Introduction to Text AnalyticsText Analytics Features • Auto-categorization • Training sets – Bayesian, Vector space • Terms – literal strings, stemming, dictionary of related terms • Rules – simple – position in text (Title, body, url) • Semantic Network – Predefined relationships, sets of rules • Boolean– Full search syntax – AND, OR, NOT • Advanced – NEAR (#), PARAGRAPH, SENTENCE • This is the most difficult to develop • Build on a Taxonomy • Combine with Extraction, Sentiment • Foundation for best text analytics & combination

  6. Varieties of Taxonomy/ Text Analytics Software • Taxonomy Management • Synaptica, SchemaLogic • Full Platform • SAS-Teragram, SAP-Inxight, Smart Logic, Data Harmony, Concept Searching, Expert System, IBM, GATE • Content Management – embedded • Embedded – Search • FAST, Autonomy, Endeca, Exalead, etc. • Specialty • Sentiment Analysis , VOC – Lexalytics, Attensity / Reports • Ontology – extraction, plus ontology

  7. Text Analytics ApplicationsPlatform for Multiple Applications • Content Aggregation, Duplicate Documents – save millions! • Business intelligence, Customer Intelligence • Social Media - sentiment analysis, Voice of the Customer • Social – Hybrid folksonomy / taxonomy / auto-metadata • Social – expertise, categorize tweets and blogs, reputation • Ontology – travel assistant, semantic web, etc. • eDiscovery, Reputation management, Customer Experience • Expertise Location, Crowd sourcing Technical support

  8. Text Analytics Applications:Enterprise Search - Elements • Text Analytics can “solve” enterprise search • Multiple Knowledge Structures • Facet – orthogonal dimension of metadata • Taxonomy - Subject matter / aboutness • Software - Search, ECM, auto-categorization, entity extraction, Text Analytics and Text Mining • People – tagging, evaluating tags, fine tune rules and taxonomy • Rich Search Results – context and conversation • Platform for search based applications

  9. Text Analytics and Text MiningData and Unstructured Content • 80% of content is unstructured – adding to semantic web is major • Text Analytics – content into data • Big Data meets Big Content • Real integration of text and ontology • Beyond “hasDescription” • Improve accuracy of extracted entities, facts – disambiguation • Pipeline – oil & gas OR research / Ford • Add Concepts, not just “Things” – 68% want this • Semantic Web + Text Analytics = real world value • Linked Data + Text Analytics – best of both worlds • Build superior foundation elements – taxonomies, categorization

  10. Text Analytics and Text Mining and Data MiningVaccine Adverse Reaction • Combine with Data Mining • New sources of information • News stories, medical records • Blogs, social • Find new connections, sources of knowledge • Vaccine Adverse Effects – disease, symptoms, variables • Unstructured text into a data source • Some preliminary analysis, content structure • Find unknown adverse effects and prevalence • Drug Discovery + search / research – 5 year story

  11. Text Analytics ApplicationsExample – Vaccine Adverse Effects

  12. Text Analytics ApplicationsExample – Vaccine Adverse Effects

  13. Text Analytics ApplicationsExample – Vaccine Adverse Effects

  14. Text Analytics and Text MiningCase Study – Taxonomy Development Problem – 200,000 new uncategorized documents Old taxonomy –need one that reflects change in corpus Text mining, entity extraction, categorization Bottom Up- terms in documents – frequency, date, Clustering – suggested categories Clustering – chunking for editors Time savings – only feasible way to scan documents Quality – important terms, co-occurring terms

  15. Text Analytics and Text MiningCase Study – Taxonomy Development • Text into Data: Article, Abstract, Title, Subtitle – fields & source of terms • Add Data: PubDate, journalTitle, Taxonomy Node • Terms – Map to frequency, date, date ranges, Taxonomy Node • New Terms, Trends • Relevance – frequency, Abstract, Title, human judgment • Entity Extraction – Authors, Organizations, Products, • Categorization – build on clusters & taxonomy • Combination – reports, visualizations, interactive explorations

  16. Case Study – Taxonomy Development

  17. Case Study – Taxonomy Development

  18. Case Study – Taxonomy Development

  19. Conclusion • Text Analytics impact is huge – solve information overload • Enterprise Search and Search Based Applications: Save millions and enhance productivity • Combination of Text Analytics & Text Mining – unlimited range of applications • Mutual Enrichment – more data, add structure to unstructured • Add Ontology = Richer Text Analytics – smarter, more useful • Text Analytics + Text Mining + Semantic Web • Move from theory to new practical applications • The best is yet to come!

  20. Questions? Tom Reamytomr@kapsgroup.com KAPS Group Knowledge Architecture Professional Services http://www.kapsgroup.com

More Related