BioNLP related talks and demos at ACL and CONLL ‘05

BioNLP related talks and demos at ACL and CONLL ‘05 Presented by Beatrice Alex BioNLP meeting 11th of July 2005

Simple Algorithms for Complex Relation Extraction with Applications to Biomedical Information Extraction (R. McDonald, F. Pereira, S. Kulick, S. Winters, Y. Jin, P. White) • Complex relations: • John Smith is the CEO at Inc. Corp. (John Smith, CEO, Inc. Corp.) • John Smith goes to his office at Inc. Corp. (John Smith,  , Inc. Corp.)

Simple Algorithms for Complex Relation Extraction with Applications to Biomedical Information Extraction • Complex Relation Extraction: • Recognition of pairs of entity mentions (binary relations are edges in a graph and named entities are nodes) • Create set of positive (valid) and negative (invalid) relations using a standard maxent classifier (Berger et al. ’96, McCallum ’02) • Reconstruction of complex relations by making tuples from maximal cliques in the graph

Simple Algorithms for Complex Relation Extraction with Applications to Biomedical Information Extraction • Complex relation reconstruction methods: • Maximal cliques (MC) • Consider all cliques in graph consistent with definition of the relation and add  • For overlapping cliques, only return maximal cliques (those that are not a subset of other cliques). • Use branch and bound algorithm to find all maximal cliques (Bron and Kerbosch ’73) = very efficient

Simple Algorithms for Complex Relation Extraction with Applications to Biomedical Information Extraction • Probabilistic Cliques (PC) • Assign weight to each binary relation (taken from classifier) • Weight of a cliques w(C) is the mean weight of the edges in the clique • Cliques is valid if w(C)  0.5

Simple Algorithms for Complex Relation Extraction with Applications to Biomedical Information Extraction • Extraction of genomic variation events from biomedical text (variation type, location, initial state, altered state) “At codons 12 and 16, the occurrence of point mutations from G/A to T/G were observed. (point mutation, codon 12, G, T) (point mutation, codon 16, A, G)

Simple Algorithms for Complex Relation Extraction with Applications to Biomedical Information Extraction • 447 Medline abstracts • 4691 sentence, 4773 entities, 1218 relations (38% not binary) • 760 2-ary relations • 283 3-ary relations • 175 4-ary relations • Gold standard named entities (56% of entity pairs not related)

Simple Algorithms for Complex Relation Extraction with Applications to Biomedical Information Extraction • Results: MC and PC significantly faster and more accurate than NE (naïve enumeration)

Search Engine Statistics Beyond the n-gram: Applications to Noun Compound Bracketing (P. Nakov and M. Hearst) • Unsupervised method for noun compound bracketing [[liver cell] antibody] vs. [liver [cell line]] • Use of bigram estimates with ² measure • Use of surface features for querying web search engines • Experiments with paraphrases • Evaluation on encyclopaedia and bioscience text

Search Engine Statistics Beyond the n-gram: Applications to Noun Compound Bracketing • Web-driven surface features • Dash: cell-cycle analysis, donor T-cell • Possessive marker: brain’s stem cell, brain stem’s cells • Internal capitalisation: Plasmodium vivax Malaria, brain Stem cells • Embedded slashes: leukaemia/lymphoma cell • Brackets: growth factor (beta), (brain) stem cells • Collected surface features using regular expressions in summaries of returned documents of exact NC queries

Search Engine Statistics Beyond the n-gram: Applications to Noun Compound Bracketing • Other features: • Abbreviations: “tumor necrosis factor (NF)”, tumor necrosis (TN) factor • Concatenation: “health care reform” -> healthcare, carereform • Reordering • Internal inflection variability

Search Engine Statistics Beyond the n-gram: Applications to Noun Compound Bracketing • Paraphrases: “brain stem cells” “stem cells in the brain” “cells from the brain stem” • Used queries with a set of selected paraphrase patterns to see how often they occurred for bracketing prediction

Search Engine Statistics Beyond the n-gram: Applications to Noun Compound Bracketing • Evaluation • Lauer’s data set (Lauer ‘95) • 244 three noun NCs • Biomedical data set • Extracted 500 three noun NCs from Medline abstracts • 430 unambiguous (361 with left, 69 with right bracketing) • Inter-annotator agreement: 88% and 82% (kappa: .606 and .442)

Search Engine Statistics Beyond the n-gram: Applications to Noun Compound Bracketing • Results: • Surface features perform best Enc.: P=85.51% with 87.70% coverage Bio: P=88.84% with 100% coverage • Best overall scores by combining most reliable models (majority vote)

Search Engine Statistics Beyond the n-gram: Applications to Noun Compound Bracketing

Dynamically Generating a Protein Entity Dictionary Using Online Resources (H. Liu, Z. Hu and C. Wu) • Available at: http://biocreative.ifsm.umbc.edu/biothesaurus • 4,046,733 terms and 1,640,082 entities

Dynamically Generating a Protein Entity Dictionary Using Online Resources • Use of large biological databases incl. • 3 NCBI databases (GenPept, RefSeq, Entrez GENE) • PSD database from Protein Information Resources (PIR) • Uniprot • Model organism databases • Nomenclature databases

Dynamically Generating a Protein Entity Dictionary Using Online Resources • Automatically gathered fields containing annotation information for each iProtClass record • Extracted terms associated with one or more UniProt unique identifiers => raw dictionary • Automated curation using UMLS to flag UMLS semantic types and remove high frequency nonsensical terms

BioNLP related talks and demos at ACL and CONLL ‘05

BioNLP related talks and demos at ACL and CONLL ‘05

Presentation Transcript

The Role of the Board in Related Party Transactions

UROEPITHELIAL TUMORS TERRENCE C. DEMOS, MD DEPARTMENT OF RADIOLOGY LOYOLA UNIVERSITY MEDICA

GPSC and Related MSP Fees

Alice: Functions

HIV RELATED MALIGNANCIES

Sedimentary Basins Related to Volcanic Arcs

Substance-Related Disorders

Session M430 ELD Lightning Talks

Anxiety disorders, Obsessive-Compulsive and Related disorders, and Trauma and Stressor-related Disorders

Natural Hazards and Disasters Chapter 10 (part B) Climate Change and Weather Related to Hazards

Lecture 11: Perception and Cognition II

Substance-Related Disorders

Number Talks Next Steps

Democracy

Light Hearted Look At GPU Programming And Linear Algebra Operators

EuroDocsis Introduction

Critical Thinking: Chapter 6

Science Olympiad

Spirometry and Related Tests

Trudeau goes to Washington