470 likes | 603 Views
Semantic Infrastructure Workshop Applications. Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services http://www.kapsgroup.com. Agenda. Search and Semantic Infrastructure Elements /Rich Dynamic Results Different Environments Design Issues
E N D
Semantic Infrastructure Workshop Applications Tom ReamyChief Knowledge Architect KAPS Group Knowledge Architecture Professional Services http://www.kapsgroup.com
Agenda • Search and Semantic Infrastructure • Elements /Rich Dynamic Results • Different Environments • Design Issues • Platform for Information Applications • Multiple Applications • Case Study – Categorization & Sentiment • Case Study – Taxonomy Development • Case Study – Expertise & Sentiment & Beyond • Conclusions
A Semantic Infrastructure Approach to SearchElements • Multiple Knowledge Structures • Facet – orthogonal dimension of metadata • Taxonomy - Subject matter / aboutness • Ontology – Relationships / Facts • Subject – Verb - Object • Software - Search, ECM, auto-categorization, entity extraction, Text Analytics and Text Mining • People – tagging, evaluating tags, fine tune rules and taxonomy • People – Users, social tagging, suggestions • Rich Search Results – context and conversation
A Semantic Infrastructure Approach to Search:Rich Results • Elements • Faceted Navigation • Categorization – metadata and/or dynamic • Tag Clouds – clustering • User Tags, personalization • Related topics – discovery • Supports all manner of search behaviors and needs • Find known items – zero in with facets • Discovery – Tags clouds, user tags, related topics • Deep dive - categorization
A Semantic Infrastructure Approach to Search: Three Environments • E-Commerce • Catalogs, small uniform collections of entities • Conflict of information and Selling • Uniform behavior – buy this • Enterprise • More content, more types of content • Enterprise Tools – Search, ECM • Publishing Process – tagging, metadata standards • Internet • Wildly different amount and type of content, no taggers • General Purpose – Flickr, Yahoo • Vertical Portal – selected content, no taggers
A Semantic Infrastructure Approach to Search: Enterprise Environment –Taxonomy, 7 facets • Taxonomy of Subjects / Disciplines: • Science > Marine Science > Marine microbiology > Marine toxins • Facets: • Organization > Division > Group • Clients > Federal > EPA • Instruments > Environmental Testing > Ocean Analysis > Vehicle • Facilities > Division > Location > Building X • Methods > Social > Population Study • Materials > Compounds > Chemicals • Content Type – Knowledge Asset > Proposals
A Semantic Infrastructure Approach to Search: Internet Design • Subject Matter taxonomy – Business Topics • Finance > Currency > Exchange Rates • Facets • Location > Western World > United States • People – Alphabetical and/or Topical - Organization • Organization > Corporation > Car Manufacturing > Ford • Date – Absolute or range (1-1-01 to 1-1-08, last 30 days) • Publisher – Alphabetical and/or Topical – Organization • Content Type – list – newspapers, financial reports, etc.
Rich Search ResultsDesign Issues - General • What is the right combination of elements? • Faceted navigation, metadata, browse, search, categorized search results, file plan • What is the right balance of elements? • Dominant dimension or equal facets • Browse topics and filter by facet • When to combine search, topics, and facets? • Search first and then filter by topics / facet • Browse/facet front end with a search box
Rich Search ResultsDesign Issues - General • Homogeneity of Audience and Content • Model of the Domain – broad • How many facets do you need? • More facets and let users decide • Allow for customization – can’t define a single set • User Analysis – tasks, labeling, communities • Issue – labels that people use to describe their business and label that they use to find information • Match the structure to domain and task • Users can understand different structures
Rich Search ResultsAutomatic Facets – Special Issues • Scale requires more automated solutions • More sophisticated rules • Rules to find and populate existing metadata • Variety of types of existing metadata – Publisher, title, date • Multiple implementation Standards – Last Name, First / First Name, Last • Issue of disambiguation: • Same person, different name – Henry Ford, Mr. Ford, Henry X. Ford • Same word, different entity – Ford and Ford • Number of entities and thresholds per results set / document • Usability, audience needs • Relevance Ranking – number of entities, rank of facets
Semantic Infrastructure for Search Based AppsMultiple Applications • Platform for Information Applications • Content Aggregation • Duplicate Documents – save millions! • Text Mining – BI, CI – sentiment analysis • Combine with Data Mining – disease symptoms, new • Predictive Analytics • Social – Hybrid folksonomy / taxonomy / auto-metadata • Social – expertise, categorize tweets and blogs, reputation • Ontology – travel assistant – SIRI • Use your Imagination!
Semantic Infrastructure for Search AppsMultiple Applications • SIRI – Travel Assistant
Semantic Infrastructure for Search Apps Case Study – Categorization & Sentiment • Call Motivation • Categorization – Motivation Taxonomy • Purpose of previous calls to understand current call • Issues of scale, small size of documents, jargon, spelling • Customer Sentiment • Telecom Forums • Feature level – not just products • Issue of context - sarcasm, jargon • Knowledge Base • Categorization, Product extraction, expertise-sentiment analysis • Social Media as source for solutions
Sentiment AnalysisDevelopment Process • Combination of Statistical and categorization rules • Start with Training sets – examples of positive, negative, neutral documents • Develop a Statistical Model • Generate domain positive and negative words and phrases • Develop a taxonomy of Products & Features • Develop rules for positive and negative statements • Test and Refine • Test and Refine again
Semantic Infrastructure for Search Apps Case Study – Taxonomy Development Problem – 200,000 new uncategorized documents Old taxonomy –need one that reflects change in corpus Text mining, entity extraction, categorization Content – 250,000 large documents, search logs, etc. Bottom Up- terms in documents – frequency, date, Clustering – suggested categories Clustering – chunking for editors Entity Extraction – people, organizations, Programming languages Time savings – only feasible way to scan documents Quality – important terms, co-occurring terms
Semantic Infrastructure ApplicationsExpertise Analysis • Sentiment Analysis to Expertise Analysis(KnowHow) • Know How, skills, “tacit” knowledge • No single correct categorization • Women, Fire, and Dangerous Things • Types of Animals • Those that belong to the Emperor • Embalmed Ones • Suckling Pigs • Fabulous Ones • Those that are included in this classification • Those that tremble as if they were mad • Other
Semantic Infrastructure ApplicationsExpertise Analysis – Basic Level Categories • Mid-level in a taxonomy / hierarchy • Short and easy words • Maximum distinctness and expressiveness • First level named and understood by children • Level at which most of our knowledge is organized • Levels: Superordinate – Basic – Subordinate • Mammal – Dog – Golden Retriever • Furniture – chair – kitchen chair
Semantic Infrastructure ApplicationsExpertise Analysis • Experts prefer lower, subordinate levels • In their domain, (almost) never used superordinate • Novice prefer higher, superordinate levels • General Populace prefers basic level • Not just individuals but whole societies / communities differ in their preferred levels • Issue – artificial languages – ex. Science discipline • Issue – difference of child and adult learning – adults start with high level
Semantic Infrastructure ApplicationsExpertise Analysis • What is basic level is context(s) dependent • Document/author expert in news health care, not research • Hybrid – simple high level taxonomy (superordinate), short words – basic, longer words – expert Plus • Develop expertise rules – similar to categorization rules • Use basic level for subject • Superordinate for general, subordinate for expert • Also contextual rules • “Tests” is general, high level • “Predictive value of tests” is lower, more expert • If terms appear in same sentence - expert
Expertise Analysis Expertise – application areas • Taxonomy / Ontology development /design – audience focus • Card sorting – non-experts use superficial similarities • Business & Customer intelligence – add expertise to sentiment • Deeper research into communities, customers • Text Mining - Expertise characterization of writer, corpus • eCommerce – Organization/Presentation of information – expert, novice • Expertise location- Generate automatic expertise characterization based on documents • Experiments - Pronoun Analysis – personality types • Essay Evaluation Software - Apply to expertise characterization • Model levels of chunking, procedure words over content
Beyond Sentiment: Behavior PredictionCase Study – Telecom Customer Service • Problem – distinguish customers likely to cancel from mere threats • Analyze customer support notes • General issues – creative spelling, second hand reports • Develop categorization rules • First – distinguish cancellation calls – not simple • Second - distinguish cancel what – one line or all • Third – distinguish real threats
Beyond SentimentBehavior Prediction – Case Study • Basic Rule • (START_20, (AND, • (DIST_7,"[cancel]", "[cancel-what-cust]"), • (NOT,(DIST_10, "[cancel]", (OR, "[one-line]", "[restore]", “[if]”))))) • Examples: • customer called to say he will cancell his account if the does not stop receiving a call from the ad agency. • cci and is upset that he has the asl charge and wants it offor her is going to cancel his act • ask about the contract expiration date as she wanted to cxltehacct Combine sophisticated rules with sentiment statistical training and Predictive Analytics
Beyond Sentiment - Wisdom of CrowdsCrowd Sourcing Technical Support • Example – Android User Forum • Develop a taxonomy of products, features, problem areas • Develop Categorization Rules: • “I use the SDK method and it isn't to bad a all. I'll get some pics up later, I am still trying to get the time to update from fresh 1.0 to 1.1.” • Find product & feature – forum structure • Find problem areas in response, nearby text for solution • Automatic – simply expose lists of “solutions” • Search Based application • Human mediated – experts scan and clean up solutions
Semantic Infrastructure: A Platform for KM Applications • Expertise Location – Individuals and Communities • Knowledge Sharing – Com. Of Practice • Find right person better • Knowledge representation to support better sharing • Enhance sharing as well as sub for person • Knowledge Base // Portal • Greatly improved – find what you are looking for • New kinds of presentations – rich search to dynamic graphs • Process – deliver rich K representation in work flow – SIRI+
Text Analytics: Future Directions • Start with the 80% of significant content that is not data • Enterprise search, content management, Search based applications • Text Analytics and Text Mining • Text Analytics turns text into data – Build better TM Apps • Better extraction and add Subject / Concepts • Sentiment and Beyond – Behavior, Expertise • Text Mining and Text Analytics • TM enriching TA • Taxonomy development • New Content Structures, ensemble models • Text Analytics and Predictive Analytics • More content, New content – social, interactive – CSR • New sources of content/data = new & better apps
Semantic Infrastructure ApproachConclusions • Semantic Infrastructure solution (people, policy, technology, semantics) and feedback is best approach • Foundation – Hybrid ECM model with text analytics, Search • Integrated information, knowledge, and semantics • Semantic Infrastructure as a platform for multiple applications • Build on infrastructure for economy and quality • Text Analytics (Entity extraction and auto-categorization, sentiment analysis) are essential • Future – new kinds of applications: • Text Mining and Data mining, research tools, sentiment • Beyond Sentiment – expertise applications • NeuroAnalytics – cognitive science meets search and more • Watson is just the start
Questions? Tom Reamytomr@kapsgroup.com KAPS Group Knowledge Architecture Professional Services http://www.kapsgroup.com
Resources • Books • Women, Fire, and Dangerous Things • George Lakoff • Knowledge, Concepts, and Categories • Koen Lamberts and David Shanks • Formal Approaches in Categorization • Ed. Emmanuel Pothos and Andy Wills • The Mind • Ed John Brockman • Good introduction to a variety of cognitive science theories, issues, and new ideas • Any cognitive science book written after 2009
Resources • Conferences – Web Sites • Text Analytics World • http://www.textanalyticsworld.com • Text Analytics Summit • http://www.textanalyticsnews.com • Semtech • http://www.semanticweb.com
Resources • Blogs • SAS- http://blogs.sas.com/text-mining/ • Web Sites • Taxonomy Community of Practice: http://finance.groups.yahoo.com/group/TaxoCoP/ • LindedIn – Text Analytics Summit Group • http://www.LinkedIn.com • Whitepaper – CM and Text Analytics - http://www.textanalyticsnews.com/usa/contentmanagementmeetstextanalytics.pdf • Whitepaper – Enterprise Content Categorization strategy and development – http://www.kapsgroup.com
Resources • Articles • Malt, B. C. 1995. Category coherence in cross-cultural perspective. Cognitive Psychology 29, 85-148 • Rifkin, A. 1985. Evidence for a basic level in event taxonomies. Memory & Cognition 13, 538-56 • Shaver, P., J. Schwarz, D. Kirson, D. O’Conner 1987. Emotion Knowledge: further explorations of prototype approach. Journal of Personality and Social Psychology 52, 1061-1086 • Tanaka, J. W. & M. E. Taylor 1991. Object categories and expertise: is the basic level in the eye of the beholder? Cognitive Psychology 23, 457-82