1 / 24

LanguageWare Introduction

LanguageWare Introduction. Marie Wallace, IBM LanguageWare. Strategy. There are many challenges in maximizing the total ROI of natural language processing which LanguageWare addresses through it’s comprehensive “big picture” design & implementation. Morphological Analysis Lexical Analysis

argus
Download Presentation

LanguageWare Introduction

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. LanguageWare Introduction Marie Wallace, IBM LanguageWare

  2. Strategy

  3. There are many challenges in maximizing the total ROI of natural language processing which LanguageWare addresses through it’s comprehensive “big picture” design & implementation • Morphological Analysis • Lexical Analysis • Parsing & Grammars • Semantic Analysis & Disambiguation • Statistical Processing • Knowledge Integration (mental  semantic models) • Social Computing (tagging, relationship extraction, ontology derivation, …) • Social Semantic Search & Discovery (w/ disambiguation) • Knowledge Integration (converting documents into business objects) • On-the-fly analysis (form filling, semi/automatic tagging, disambiguation, …) Explicit knowledge in documents, emails and other written forms. Tacit knowledge located in the experience of individuals, networks and communities. Embeddedknowledge in work routines, practices and norms.

  4. Find Create Store Understand IBM launched the LanguageWare project 2001 with the vision of creating a common NLP componentry that could be flexibly applied to a wide range of challenges across IBM’s entire product portfolio (entire information lifecycle) Text Analytics can improve content consistency, and it’s associated meta-data, through semi-automatic analysis at content creation

  5. Find Create Store Understand IBM launched the LanguageWare project 2001 with the vision of creating a common NLP componentry that could be flexibly applied to a wide range of challenges across IBM’s entire product portfolio (entire information lifecycle) It can generate valuable meta-data which can be leveraged for subsequent analysis – integrate multiple sources of (un)structured information

  6. Find Create Store Understand IBM launched the LanguageWare project 2001 with the vision of creating a common NLP componentry that could be flexibly applied to a wide range of challenges across IBM’s entire product portfolio (entire information lifecycle) It can help enhance search experience through leveraging semantics, social networks, taxonomies & folksonomies, … to uncover knowledge hidden in the unstructured content

  7. Find Create Store Understand IBM launched the LanguageWare project 2001 with the vision of creating a common NLP componentry that could be flexibly applied to a wide range of challenges across IBM’s entire product portfolio (entire information lifecycle) It can support BI techniques and algorithms to extract actionable knowledge and insight from vast quantities of available information (structured & unstructured)

  8. To achieve a solution that addressed the diverse and conflicting requirements across divisions, brands, products, industries, and platforms, we needed to create something that was… • Ubiquitous & Flexible • Leveraged comprehensively across all IBM divisions through simple-to-use integration packages satisfying any type of application • Enterprise-transforming & Enterprise-ready • Combining strong engineering principles with latest research techniques • Highly Extensible & Customizable • Applying an open extensible data-driven model which delivers a highly optimized industrial-strength runtime, with simple yet powerful customization tools for developing domain resources • Standards-based & Easily Accessible • Leveraging open source technologies & standards, such as UIMA, and made freely available for evaluation & prototyping through Alphaworks

  9. Our philosophy was to create light-weight technology that could be easily embedded into any existing solution to provide natural language understanding transparently and unobtrusively for the end user • “Language Understanding” is personal & specific to industry sector, company, organization, function, person, and time … • The data that drives the discovery process is YOUR competitive advantage • You need technology that can be enhanced with your data models to allow you to capture insights that your competitors can’t • Discovery is an integral part of all our lives and you want it integrated seamlessly into your entire information life-cycle – from creation to obsolescence (and beyond) • You need technology that can seamlessly integrate the knowledge of your people – mapping personal models to analytics models • The closer you move the analytics to the knowledge worker (allowing a feedback loop to harness knowledge) the higher quality analytics

  10. As a result of the successful execution of this strategy, LanguageWare is the most broadly used NLP technology across IBM… • Embedded into Lotus, WebSphere, DB2, and Rational products • Integrated into IBM’s hardware • Used internally by IBM’s CIO Office • Deployed in GBS and SWG services • Used within IBM Research • Used as part of European Research projects • And now licensed to end-customers

  11. Technology

  12. English Relationship Extraction Language Identification Segmentation Classification Normalization Disambiguation Fuzzy matching Spelling correction, approx. lookup, hyphenation Rules Regular expressions, parsing, grammars running = noun (not verb)tank = vehicle (not container) Utilize, describe, modulate, … Disease PharmaAction PharmaEffect C18.452.339.500.396 C18.654.726.500 In the context of the present invention, a compound as desribed herein or pharmaceutical composition thereof can be utilized for modulating the activity of RUP3 receptor mediated diseases, conditions and/or disorders as described herein. Examples of modulating the activity of RUP3 receptor mediated diseases include the prophylaxis or treatment of metabolic related disorders such as, but not limited to, type I diabetes, type II diabetes, inadequate glucose tolerance, insulin resistance, hyperglycemia, hyperlipidemia, hypertriglyceridemia, hypercholesterolemia, dyslipidemia and syndrome X. Other examples of modulating the activity of RUP3 receptor mediated diseases include the prophylaxis or treatment of obesity and/or overweight by decreasing food intake, inducing satiation (i.e., the feeling of fullness), controlling weight gain, decreasing body weight and/or affecting metabolism such that the recipient loses weight and/or maintains weight. In the context of the present invention, a compound as described herein or pharmaceutical composition thereof can be utilized for modulating the activity of RUP3 receptor mediated diseases, conditions and/or disorders as described herein. Examples of modulating the activity of RUP3 receptor mediated diseases include the prophylaxis or treatment of metabolic related disorders such as, but not limited to, type I diabetes, type II diabetes, inadequate glucose tolerance, insulin resistance, hyperglycemia, hyperlipidemia, hypertriglyceridemia, hypercholesterolemia, dyslipidemia and syndrome X. Other examples of modulating the activity of RUP3 receptor mediated diseases include the prophylaxis or treatment of obesity and/or overweight by decreasing food intake, inducing satiation (i.e., the feeling of fullness), controlling weight gain, decreasing body weight and/or affecting metabolism such that the recipient loses weight and/or maintains weight. In the context of the present invention, a compound as desribed herein or pharmaceutical composition thereof can be utilized for modulating the activity of RUP3 receptor mediated diseases, conditions and/or disorders as described herein. Examples of modulating the activity of RUP3 receptor mediated diseases include the prophylaxis or treatment of metabolic related disorders such as, but not limited to, type I diabetes, type II diabetes, inadequate glucose tolerance, insulin resistance, hyperglycemia, hyperlipidemia, hypertriglyceridemia, hypercholesterolemia, dyslipidemia and syndrome X. Other examples of modulating the activity of RUP3 receptor mediated diseases include the prophylaxis or treatment of obesity and/or overweight by decreasing food intake, inducing satiation (i.e., the feeling of fullness), controlling weight gain, decreasing body weight and/or affecting metabolism such that the recipient loses weight and/or maintains weight. In the context of the present invention, a compound as described herein or pharmaceutical composition thereof can be utilized for modulating the activity of RUP3 receptor mediated diseases, conditions and/or disorders as described herein. Examples of modulating the activity of RUP3 receptor mediated diseases include the prophylaxis or treatment of metabolic related disorders such as, but not limited to, type I diabetes, type II diabetes, inadequate glucose tolerance, insulin resistance, hyperglycemia, hyperlipidemia, hypertriglyceridemia, hypercholesterolemia, dyslipidemia and syndrome X. Other examples of modulating the activity of RUP3 receptor mediated diseases include the prophylaxis or treatment of obesity and/or overweight by decreasing food intake, inducing satiation (i.e., the feeling of fullness), controlling weight gain, decreasing body weight and/or affecting metabolism such that the recipient loses weight and/or maintains weight. Pharma Patent Compound X<addresses> Disease Y Combination Therapy lipase inhibitors, such as tetrahydrolipstatin Compound X <can be combined with>Compound Y Obesity-related Orlistat

  13. Mention Mention Mention Mention Automatic tagging based on concept mentions NETWORK OF CONCEPTS Finding “focus” concept Mapping mentions to concepts . TEXT

  14. Mention Mention Multidimensional disambiguation TEXT

  15. LanguageWare comprises a number of building blocks which combine together to deliver the capabilities • The heart of the solution and provides the foundation on which most other capabilities are built • It provides a language-agnostic text analyzer • Analysis driven by data and logic encoded in our resources • It allows us to rapidly analyze text, identify lexical units, normalize and classify those units Runtime

  16. Lexico-semantic resources that drive the behavior of the system • Optimized for size and performance • Customizable for different domains • A semantic layer modeled by a directed graph to represent the knowledge (network of concepts) • Graph mining techniques for analysis of semantic network Concepts (with navigation) Lexical entries to concepts Morphological description of words LanguageWare comprises a number of building blocks which combine together to deliver the capabilities Resources Runtime

  17. LanguageWare comprises a number of building blocks which combine together to deliver the capabilities UIMA UIMA Annotators The UIMA provides a framework for building text analytics applications, with annotators acting as the plug-ins which encode the processing logic. Resources Runtime

  18. LanguageWare comprises a number of building blocks which combine together to deliver the capabilities • Eclipse-based tooling for manipulation of language resources. • Allows customers to easily develop new, or customize existing, language resources thereby modifying the behaviour of the annotators. • Targets the non-developer – terminologist, taxonomist, domain specialist, … • Supports several representations of domain knowledge – thesauri, taxonomies, ontologies, … UIMA Workbench UIMA Annotators Resources Runtime

  19. LanguageWare comprises a number of building blocks which combine together to deliver the capabilities UIMA Workbench UIMA Annotators Resources Runtime

  20. Annotations Text UIMA (CAS) Language Classifier Lexical Analyzer Semantic Analyzer Parser POS Tagger Annotators LanguageWare Resource Workbench Resources Resources Resources Rules & Seed list Rules Customizable Domain Resources DLTLS uTagger aFST Software Libraries Core NLP (DJTJ) Char handling, regex, … (ICU4J) LanguageWare Architecture / Processing Model

  21. Collection Reader Crawled Documents Annotator CAS Annotator CAS CAS Annotator DB Index CAS CAS Consumer UIMA Pipeline UIMA • Language Identification • Document Classification • Lexical Analyzer • POS Tagger • Parser • Semantic Analyzer • Named-Entity Extraction • Relationship Extraction

  22. LanguageWare Resource Workbench Any Application XML: Taxonomy definition, dictionary data, pointers to training data, rules, … Search Engine InterfaceData collection, and analysis verification & test Analyse Machine Learning and Results Visualization, Manipulation, and Verification Measurements & Statistics Develop rules, regular expressions, templates, … XML: Updated taxonomy definition, dictionary data, analysis results, change reports, recommendations, … Generate Reports Develop / import domain resources, rules, models, … Create/Modify Validate Review and modify annotation results Common Building Blocks, i.e. Template Annotators Parsing Lexical Analysis Named Entity Recognition Build UIMA-compliant application POS Tagging Semantic Analysis & Disambiguation Annotator: Descriptor, Class(es), language resource(s) Relationship Extraction Document Classification Deploy LanguageWare Resource Workbench

  23. Application Spot term mentions, analyze distribution of concepts, disambiguate, find focus, … Sample Processing Model Navigation Concepts Text Lexical Analyzer To spot concept mentions Concept Navigation API for navigation through the network of concepts Layered lexico-semantic resources – linking morphology, lexical entries, concepts & relationships Language Resources

  24. Contacts • LanguageWare External Download on Alphaworkshttp://www.alphaworks.ibm.com/tech/lrw • LanguageWare Wikipediahttp://en.wikipedia.org/wiki/Languageware • LanguageWare Senior Research & Development Managermarie.wallace@ie.ibm.com

More Related