170 likes | 449 Views
Text Analytics Workshop. Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services http://www.kapsgroup.com. Agenda. Introduction – Elements & Infrastructure Platform Semantics not technology Infrastructure not project Value of Text Analytics
E N D
Text AnalyticsWorkshop Tom ReamyChief Knowledge Architect KAPS Group Knowledge Architecture Professional Services http://www.kapsgroup.com
Agenda • Introduction – Elements & Infrastructure Platform • Semantics not technology • Infrastructure not project • Value of Text Analytics • Evaluating Software • Two Phase Process • Designing the Team and Content Structures • Development – Taxonomy, Categorization, Faceted Metadata • Text Analytics Applications • Integration with Search and ECM • Platform for Information Applications
KAPS Group: General • Knowledge Architecture Professional Services • Virtual Company: Network of consultants – 8-10 • Partners – SAS, SAP, Microsoft-FAST, Concept Searching, etc. • Consulting, Strategy, Knowledge architecture audit • Services: • Taxonomy/Text Analytics development, consulting, customization • Technology Consulting – Search, CMS, Portals, etc. • Evaluation of Enterprise Search, Text Analytics • Metadata standards and implementation • Knowledge Management: Collaboration, Expertise, e-learning • Applied Theory – Faceted taxonomies, complexity theory, natural categories
Introduction to Text AnalyticsSemantic Infrastructure - Elements • Taxonomy – Thesauri, Controlled Vocabulary • Metadata – Standard (Dublin Core) and Facets • Basic Text Analytics • Categorization – Document Topics – Aboutness • Entity Extraction – noun phrases, feed facets • Summarization – beyond snippets • Advanced Text Analytics • Fact extraction – ontologies • Sentiment Analysis – good, bad, and ugly • What is in a Name – text analytics or ?
Introduction to Text AnalyticsTaxonomy • Thesauri, Controlled Vocabulary • Resources to build on • Indexing not categorization • Taxonomy • Foundation for Categorization • Browse – classification scheme • Formal – Is-Child-Of, Is-Part-Of • Large taxonomies - MeSH – indexing all topics • Small is better – for categorization and faceted navigation
Introduction to Text AnalyticsMetadata • Metadata standards – Dublin Core - Mostly syntactic not semantic • Description – static or dynamic (summarization) • Semantic – keywords – very poor performance • Best Bets – high level categorization-search • Human judgments • Audience – mixed results • Role, function, expertise, information behaviors • Facets – classes of metadata • Standard - People, Organization, Document type-purpose • Specialized – methods, materials, products
Introduction to Text AnalyticsText Analytics • Categorization • Multiple techniques – examples, terms, Boolean • Built on a taxonomy • Entity Extraction • Catalogs with variants, rule based dynamic • Summarization • Rules – find sentences in a document • Fact Extraction • Relationships of entities – people-organizations-activities • Sentiment Analysis • Rules – adjectives & adverbs not nouns
Introduction to Text AnalyticsText Analytics • Why Text Analytics? • Enterprise search has failed to live up to its potential • Enterprise Content management has failed to live up to its potential • Taxonomy has failed to live up to its potential • Adding metadata, especially keywords has not worked • What is missing? • Intelligence – human level categorization, conceptualization • Infrastructure – Integrated solutions not technology, software • Text Analytics can be the foundation that (finally) drives success – search, content management, and much more
Text Analytics Platform4 Basic Contexts • Ideas – Content Structure • Language and Mind of your organization • Applications - exchange meaning, not data • People – Company Structure • Communities, Users • Central team - establish standards, facilitate • Activities – Business processes and procedures • Technology • CMS, Search, portals, taxonomy tools • Applications – BI, CI, Text Mining
Text Analytics Platform: The start and foundationKnowledge Architecture Audit • Knowledge Map - Understand what you have, what you are, what you want • The foundation of the foundation • Contextual interviews, content analysis, surveys, focus groups, ethnographic studies • Category modeling – “Intertwingledness” -learning new categories influenced by other, related categories • Natural level categories mapped to communities, activities • Novice prefer higher levels • Balance of informative and distinctiveness • Living, breathing, evolving foundation is the goal
Text Analytics Platform – BenefitsIDC White Paper • Time Wasted • Reformat information - $5.7 million per 1,000 per year • Not finding information - $5.3 million per 1,000 • Recreating content - $4.5 Million per 1,000 • Small Percent Gain = large savings • 1% - $10 million • 5% - $50 million • 10% - $100 million
Text Analytics Platform – Benefits • Findability within and outside the enterprise • Savings per year - $millions • Rescue enterprise search and ECM projects • Add semantics to search • Clean up enterprise content • Duplication and accurate categorization • Improve the quality of information access • Finding the right information can save millions • Build smarter applications • Social networking, locate expertise within the enterprise
Text Analytics Platform – Benefits • Understand your customers • What they are talking about and how they feel about it • Empower your employees • Not only more time, but they work smarter • Understand your competitors • What they are working on, talking about • Combine unstructured content and rich data sources – more intelligent analysis
Text Analytics Platform – Dangers • Text Analytics as a software project • Not enough resources – to develop, to maintain-refine • Wrong resources – SME’s, IT, Library • Need all of the above and taxonomists+ • Bad Design: • Start with bad taxonomy • Wrong taxonomy – too big or two flat • Bad Categorization / Entity Extraction • Right kind of experience
Resources • Books • Women, Fire, and Dangerous Things • George Lakoff • Knowledge, Concepts, and Categories • Koen Lamberts and David Shanks • The Stuff of Thought – Steven Pinker • Web Sites • Text Analytics News - http://social.textanalyticsnews.com/index.php • Text Analytics Wiki - http://textanalytics.wikidot.com/
Resources • Blogs • SAS- Manya Mayes – Chief Strategist - http://blogs.sas.com/text-mining/ • Web Sites • Taxonomy Community of Practice: http://finance.groups.yahoo.com/group/TaxoCoP/ • Whitepaper – CM and Text Analytics - http://www.textanalyticsnews.com/usa/contentmanagementmeetstextanalytics.pdf
Questions? Tom Reamytomr@kapsgroup.com KAPS Group Knowledge Architecture Professional Services http://www.kapsgroup.com