1 / 17

National Centre for Text Mining

National Centre for Text Mining. Mission To provide TM tools for users, in particular, scientists and researchers To coordinate activities in the TM community ・  Core Partners University of Manchester: NLP and DM Salford University: Terminology

carsyn
Download Presentation

National Centre for Text Mining

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. National Centre for Text Mining • Mission To provide TM tools for users, in particular, scientists and researchers To coordinate activities in the TM community ・ Core Partners University of Manchester: NLP and DM Salford University: Terminology Liverpool University: IR and Digital Archive ・ External Partners San Diego SC, UC Berkeley, University of Geneva, University of Tokyo

  2. Biomedical domain National Centre for Text Mining • Mission To provide TM tools for users, in particular, scientists and researchers To coordinate activities in the TM community ・ Core Partners University of Manchester: NLP and DM Salford University: Terminology Liverpool University: IR and Digital Archive ・ External Partners San Diego SC, UC Berkeley, University of Geneva, University of Tokyo

  3. Strategy and Roadmap for TM in Biomedicine Vast number of Google/Yahoo users, satisfied Huge Demand for specialized tools for TM in Bio-Medical Domains Small number of users, unsatisfied The current TM tools, though successful in some business applications, do not meet requirements of users in bio-medical domains. More publicity and marketing More demand-oriented approach What are the requirements for TM for users in bio-medical domains? What technologies should be integrated in future TM for science? Is the nature of TM in scientific fields different from that of business applications?

  4. From technological seeds

  5. Effective management of text and knowledge is the key Natural Language Processing Ontology-based KMS Intelligent Text Management System Science: Knowledge Raw Data Unstructured Information (Text) Semi-structured Information (XML+Text) Structured Information (Data bases)

  6. Retrieval Intelligent Information Retrieval and Question Answering Integration Integration of Text with Data and Knowledge Discovery Text Mining and Knowledge Discovery Intelligent TM systems

  7. Ontology Relationships among concepts Metabolic Pathways Signal Pathways Association between Diseases and Genes …… Motivated Independently of language From Text to Knowledge Non-Trivial Mappings Terminology NLP Paraphrasing Language Domain Knowledge Domain

  8. Examples of Technical Seeds • Term Variants • Terms (names of proteins, genes, diseases, symptoms, etc.) denote basic conceptual units in the knowledge domain. • Syntactic Variants • Relationships and complex conceptual units are mapped to sentences. • Term Acquisition from Text • New terms (basic conceptual units) are constantly introduced. Resource building for specialized domains is crucial.

  9. Examples of Technical Seeds • Term Variants • Terms (names of proteins, genes, diseases, symptoms, etc.) denote basic conceptual units in the knowledge domain. • Syntactic Variants • Relationships and complex conceptual units are mapped to sentences. • Term Acquisition from Text • New terms (basic conceptual units) are constantly introduced. Resource building for specialized domains is crucial.

  10. Hypernym Expanded form acronym Synonym NF-kappa B NF kappa B NFKB factor NF-KB NF kB nuclear factor-kappa B nuclear-factor kappa B nuclear factor kappa B nuclear factor κB Nuclear Factor kappa B ……….. Spelling variation

  11. Automatic Generated Term Variants (1) 1.000 NF kappa B 128 0.500 Transcription Factor NF kappa B 0 0.429 NF-kappa B 912 0.286 NF kB 0 0.286 Immunoglobulin Enhancer-Binding Protein 0 0.286 Immunoglobulin Enhancer Binding Protein 0 0.286 Transcription Factor NF-kB 0 0.286 Transcription Factor NF kB 0 0.286 Factor NF-kB, Transcription 0 0.286 nuclear factor kappa beta 2 0.286 NF kappaB 1 0.273 NF kappa B chain 0 0.273 NF kappa B subunit 0 0.214 Transcription Factor NF-kappa B 0 0.214 NF-kB, Transcription Factor 0 0.214 NF-kB 67 0.200 Neurofibromatosis Type kappa B 0

  12. Automatic Generated Term Variants (2) 1.000 tumor necrosis factor A 0 0.316 TNF A 1 0.200 tumor necrosis factor 1653 0.158 TNF alpha 358 0.133 TNFA 32 0.133 TNF 2631 0.133 Tumour necrosis factor alpha 14 0.133 Tumor Necrosis Factor alpha 2 0.133 Tumor Necrosis Factor-Alpha 0 0.133 TUMOR NECROSIS FACTOR.ALPHA 0 0.133 Tumor necrosis factor alpha 52 0.133 Tumor Necrosis Factor-alpha 8 0.133 TNF-Alpha 0 0.133 TNF-alpha 6899

  13. Examples of Technical Seeds • Term Variants • Terms (names of proteins, genes, diseases, symptoms, etc.) denote basic conceptual units in the knowledge domain. • Syntactic Variants • Relationships and complex conceptual units in the knowledge domain are mapped to sentences in the language domain. • Term Acquisition from Text • New terms (basic conceptual units) are constantly introduced. Resource building for specialized domains is crucial.

  14. Syntactic Variants [A] protein activates [B] (Pathway extraction) Full-strength Straufen protein lacking this insertion is able to assocaite with osker mRNA and activate its translation, but fails to ….. Transcription initiation by the sigma(54)-RNA polymerase holoenzyme requires an enhancer-binding protein that is thought to contact sigma(54) to activate transcription. Since ……., we postulate that only phosphorylated PHO2 protein could activate the transcription of PHO5 gene. Non-trivial Mapping Spelling Variants Synonyms Acronyms Same relations with different Structures Language Domain Knowledge Domain Independently motivated of Language

  15. Predicate-argument structureParser based on Probabilistic HPSG (Enju) s vp vp np pp arg2 arg1 mod dt np vp vp pp np DT NN VBZ VBN IN PRP The protein is activated by it

  16. Text Archive with Feature Obejcts Managing texts, data representation and their semantics Semantics Data representation Text ID Data Base Module Copy and Unification Start Position of the region DB of Feature Objects End Position of the region Annotator Content Specialization by unification Text DB Fine grained units of information Context dependency Persistent nature of knowledge and information Ubiquitin E is bound with Text

  17. Demo (The website demo is not available now. )

More Related