1 / 17

Text Analytics Workshop

Text Analytics Workshop. Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services http://www.kapsgroup.com. Agenda. Introduction – Elements & Infrastructure Platform Semantics not technology Infrastructure not project Value of Text Analytics

liam
Download Presentation

Text Analytics Workshop

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Text AnalyticsWorkshop Tom ReamyChief Knowledge Architect KAPS Group Knowledge Architecture Professional Services http://www.kapsgroup.com

  2. Agenda • Introduction – Elements & Infrastructure Platform • Semantics not technology • Infrastructure not project • Value of Text Analytics • Evaluating Software • Two Phase Process • Designing the Team and Content Structures • Development – Taxonomy, Categorization, Faceted Metadata • Text Analytics Applications • Integration with Search and ECM • Platform for Information Applications

  3. KAPS Group: General • Knowledge Architecture Professional Services • Virtual Company: Network of consultants – 8-10 • Partners – SAS, SAP, Microsoft-FAST, Concept Searching, etc. • Consulting, Strategy, Knowledge architecture audit • Services: • Taxonomy/Text Analytics development, consulting, customization • Technology Consulting – Search, CMS, Portals, etc. • Evaluation of Enterprise Search, Text Analytics • Metadata standards and implementation • Knowledge Management: Collaboration, Expertise, e-learning • Applied Theory – Faceted taxonomies, complexity theory, natural categories

  4. Introduction to Text AnalyticsSemantic Infrastructure - Elements • Taxonomy – Thesauri, Controlled Vocabulary • Metadata – Standard (Dublin Core) and Facets • Basic Text Analytics • Categorization – Document Topics – Aboutness • Entity Extraction – noun phrases, feed facets • Summarization – beyond snippets • Advanced Text Analytics • Fact extraction – ontologies • Sentiment Analysis – good, bad, and ugly • What is in a Name – text analytics or ?

  5. Introduction to Text AnalyticsTaxonomy • Thesauri, Controlled Vocabulary • Resources to build on • Indexing not categorization • Taxonomy • Foundation for Categorization • Browse – classification scheme • Formal – Is-Child-Of, Is-Part-Of • Large taxonomies - MeSH – indexing all topics • Small is better – for categorization and faceted navigation

  6. Introduction to Text AnalyticsMetadata • Metadata standards – Dublin Core - Mostly syntactic not semantic • Description – static or dynamic (summarization) • Semantic – keywords – very poor performance • Best Bets – high level categorization-search • Human judgments • Audience – mixed results • Role, function, expertise, information behaviors • Facets – classes of metadata • Standard - People, Organization, Document type-purpose • Specialized – methods, materials, products

  7. Introduction to Text AnalyticsText Analytics • Categorization • Multiple techniques – examples, terms, Boolean • Built on a taxonomy • Entity Extraction • Catalogs with variants, rule based dynamic • Summarization • Rules – find sentences in a document • Fact Extraction • Relationships of entities – people-organizations-activities • Sentiment Analysis • Rules – adjectives & adverbs not nouns

  8. Introduction to Text AnalyticsText Analytics • Why Text Analytics? • Enterprise search has failed to live up to its potential • Enterprise Content management has failed to live up to its potential • Taxonomy has failed to live up to its potential • Adding metadata, especially keywords has not worked • What is missing? • Intelligence – human level categorization, conceptualization • Infrastructure – Integrated solutions not technology, software • Text Analytics can be the foundation that (finally) drives success – search, content management, and much more

  9. Text Analytics Platform4 Basic Contexts • Ideas – Content Structure • Language and Mind of your organization • Applications - exchange meaning, not data • People – Company Structure • Communities, Users • Central team - establish standards, facilitate • Activities – Business processes and procedures • Technology • CMS, Search, portals, taxonomy tools • Applications – BI, CI, Text Mining

  10. Text Analytics Platform: The start and foundationKnowledge Architecture Audit • Knowledge Map - Understand what you have, what you are, what you want • The foundation of the foundation • Contextual interviews, content analysis, surveys, focus groups, ethnographic studies • Category modeling – “Intertwingledness” -learning new categories influenced by other, related categories • Natural level categories mapped to communities, activities • Novice prefer higher levels • Balance of informative and distinctiveness • Living, breathing, evolving foundation is the goal

  11. Text Analytics Platform – BenefitsIDC White Paper • Time Wasted • Reformat information - $5.7 million per 1,000 per year • Not finding information - $5.3 million per 1,000 • Recreating content - $4.5 Million per 1,000 • Small Percent Gain = large savings • 1% - $10 million • 5% - $50 million • 10% - $100 million

  12. Text Analytics Platform – Benefits • Findability within and outside the enterprise • Savings per year - $millions • Rescue enterprise search and ECM projects • Add semantics to search • Clean up enterprise content • Duplication and accurate categorization • Improve the quality of information access • Finding the right information can save millions • Build smarter applications • Social networking, locate expertise within the enterprise

  13. Text Analytics Platform – Benefits • Understand your customers • What they are talking about and how they feel about it • Empower your employees • Not only more time, but they work smarter • Understand your competitors • What they are working on, talking about • Combine unstructured content and rich data sources – more intelligent analysis

  14. Text Analytics Platform – Dangers • Text Analytics as a software project • Not enough resources – to develop, to maintain-refine • Wrong resources – SME’s, IT, Library • Need all of the above and taxonomists+ • Bad Design: • Start with bad taxonomy • Wrong taxonomy – too big or two flat • Bad Categorization / Entity Extraction • Right kind of experience

  15. Resources • Books • Women, Fire, and Dangerous Things • George Lakoff • Knowledge, Concepts, and Categories • Koen Lamberts and David Shanks • The Stuff of Thought – Steven Pinker • Web Sites • Text Analytics News - http://social.textanalyticsnews.com/index.php • Text Analytics Wiki - http://textanalytics.wikidot.com/

  16. Resources • Blogs • SAS- Manya Mayes – Chief Strategist - http://blogs.sas.com/text-mining/ • Web Sites • Taxonomy Community of Practice: http://finance.groups.yahoo.com/group/TaxoCoP/ • Whitepaper – CM and Text Analytics - http://www.textanalyticsnews.com/usa/contentmanagementmeetstextanalytics.pdf

  17. Questions? Tom Reamytomr@kapsgroup.com KAPS Group Knowledge Architecture Professional Services http://www.kapsgroup.com

More Related