390 likes | 581 Views
The Use of Semantic Graphs for Modeling Biomedical Text. Laura Plaza NIL- Natural Interaction based on Language Universidad Complutense de Madrid. Text summarization. Semantic graph based representation. Semantic graph based representation. Automatic Indexing. Information Retrieval.
E N D
The Use of SemanticGraphsforModelingBiomedicalText Laura Plaza NIL- Natural InteractionbasedonLanguage Universidad Complutense de Madrid
Textsummarization Semanticgraphbasedrepresentation Semanticgraphbasedrepresentation AutomaticIndexing InformationRetrieval
Whysemantic? Cerebrovascular diseases during pregnancy may result from hemorrhage Brain vascular disorders during gestation may result from hemorrhage Synonymy = • The common cold is more common in cold weather than in summer Polysemy
Whygraphs? Pneumococcalinfectionis a lunginfectioncausedbystreptococcuspneumonia. Mycoplasma pneumonia is another type of atypical phneumonia. PneumonIa The patient referred feeling short of breath andwas diagnosed with pneumonia Symptom Co-occurswith Pneumococcal pneumonia influenza
OurProposal • Usingconcepts and relationsfromexternalknowlegdesourcesforrepresentingthetext as a graph • Exploitingthetopology of thenetworktoidentifygroups of conceptssemanticallyrelatedthatrepresentdifferenttopics
Representation Process Document pre-processing Concept identification Document representation Concept clustering and topic recognition
Concept Identification The goal of the trial was to assess cardiovascular mortality for stroke
Concept Identification - Ambiguity Tissues are often cold • Personalized • PageRank (PPR) • Journal Descriptor • Indexing (JDI) • Machine Readable • Dictionary (MRD) • AutomaticExtracted • Corpus (AEC) WSD
Document Representation Activity Disease Personnel Anatomic Structure Clinical or Research Activity Professional Personnel System or Substance Disorder Or Finding Finding by Site or System Research Activity Disease or Disorder The goal of the trial was to assess cardiovascular mortality and morbidity for stroke, coronary heart diseaseandcongestive heart failure, as an evidence-based guide for clinicians who treat hypertension. Clinicians Organ System Cardiovascular System Finding Non-Neoplastic Disorder Disorder by Site Study Cardiovascular System Non-Neoplastic Disorder by Site Respiratory and Thoracic Disorder Blood Pressure Finding Clinical Study Non-Neoplastic Cardiovascular Disorder Hypertensive Disease Thoracic Disorder Clinical Trials Non-Neoplastic Vascular Disorder Non-Neoplastic Heart Disorder Heart Disorder Cerebrovascular Disorder Congestive Heart Failure Coronary Heart Disease Cerebrovascular Accident
DocumentRepresentation • All the sentence graphs are merged into a single DocumentGraph • Thegraphis extended with more semantic relations • Eachedge is assigned a weight in [0, 1] • Different relations may be assigned different weights • The more specific are theconcepts, the more weightisassignedtotheedge
The goal of the trial was to assess cardiovascular mortality and morbidity for stroke, coronary heart diseaseand congestive heart failure, as an evidence-based guide for clinicians who treat hypertension. While event rates for fatal cardiovascular disease were similar, there was a disturbing tendency for stroke to occur more often in the doxazosin group, than in the group taking chlorthalidone Disease or Disorder Non-Neoplastic Disorder Disorder by Site Finding by Site or System Respiratory and Thoracic Disorder Disorder of Cardiovascular System Non-NeoplasticDisorder by Site Organ System Cardiovascular Diseases Non-NeoplasticCardiovascular Disorder Cardiovascular System Finding Cardiovascular System Thoracic Disorder Non-Neoplastic Heart Disorder Non-Neoplastic Vascular Disorder Blood Pressure Finding Heart Disorder Congestive Heart Failure Cerebrovascular Disorder Hypertensive Disease Coronary Heart Disease Cerebrovascular Accident Pharmaceutical Adjuvant Cardiovascular Drug Research Activity 1/2 1/2 Diuretic Study Alpha-Adrenergic BlockingAgent 2/3 2/3 Thiazide Diuretics Clinical Study Clinicians 1 3/4 Doxazosin Chlorthalidone Is a relations Clinical Trials Other related relations Associated with relations
Concept Clustering & TopicRecognition hubs . . .
Concept Clustering & TopicRecognition • Concepts are rankedbysalience • Thenverticeswith a highestsalience are calledhubvertices
Concept Clustering & TopicRecognition • Thehubvertices are groupedintoHubVertex Sets (HVSs) • The remaining vertices are assigned to the cluster to which they are more connected • The number and properties of the clustering strongly depends on the parameters’ values
Concept Clustering & TopicRecognition Congestiveheartfailure Adverse reactions Amlodipine Chlorthalidone Drugpseudoallergenbyfunction Bloodpressurefinding Cerebrovascular accident Hepatic . . . Health personnel Elderly Organism Population group Persons Clinicians Patients
Textsummarization Semanticgraphbasedrepresentation AutomaticIndexing InformationRetrieval
TextSummarization Creating a compactedversion of oneorvariousdocuments • Summaries as anindication of what a documentisabout • Improvingindexing, categorization, and IR Motivation • Extracts vs. abstracts • Single vs. multi-document • Generic vs. Application-oriented Types
TextSummarization Similarity = 35.0 Similarity = 4.0 Sentence1 Cluster 1 . . . . . . Similarity = 86.0 Similarity = 12.0 Sentence n Cluster m
TextSummarization • H.1: Selectingthe top nrankedsentencesfromthebiggestcluster • H.2: Selectingnisentencesfromeachcluster • H.3: Weightingthesentence-to-clustersimilaritytotheclusters’ sizes Sentenceselection + othertraditionalcriteria: frequency, position, similaritywiththetitle, etc
TextSummarization • Evaluation: How is the important content preserved in the summary? • ROUGE automatic evaluation metrics • Comparison with the abstracts of the articles
TextSummarization • Evaluation:How does ambiguity affect summarization?
Summarization of BiologicalEntity-relatedInformation • Given a list of genes (orproteins): • Retrievingdocumentsrelatedtothe genes • Building a sematicgraph-basedrepresentation of the corpus • Identifyinggroups of genes/proteins • Generating a summaryforeachgroupthat describes thefunctionality of theentities Multi-document, application-oriented summarization
AutomaticIndexing of BiomedicalLiteratureusingSummaries Title + Abstract Full text MTI Orderedlist of MeSHmainheadings Refinedlist of MeSHHeadings
AutomaticIndexing of BiomedicalLiteratureusingSummaries Whataboutusingthefull texts? • Recallincreasesbyprecisiondecreases Whataboutusingautomaticsummariesof differentlenghts? • As thelenghtincreases, recallimprovesbutprecisionworsens • Thereis a summarylenghtwhichmaximizes F-measure
Textsummarization Semanticgraphbasedrepresentation AutomaticIndexing InformationRetrieval
Retrieval of Similar Patient Cases Motivation: Facilitating the access to previous cases Problem: Given a reference patient record, to retrieve others from the clinical database that are similar to the reference one
Retrieval of Similar Patient Cases When can we consider that two patient records are similar? • Same symptom or sign (e.g. , fever) • Same diagnosis (e.g. bacterial pneumonia) • Same test or procedure (e.g., endoscopy biopsy) • Same medication (e.g. clopidogrel) • But … absent criteria are not relevant!!!
Retrieval of Similar Patient Cases • The records are represented using UMLS graphs • Concepts are filtered by semantic types • Negated concept are ignored
Graph A Graph B Clinical finding Clinical finding 1/11 Finding by site Finding by site Disease 2/11 Respiratory Disorder by Disorder by finding body site Infectious 3/5 body site 3/11 disease ... Functional finding ... of respiratory tract 8/11 4/5 Virus Diseases Bacterial Bacterial Coughing pneumonia pneumonia 5/5 9/11 Pneumonia due to Pneumonia due anaerobic bacteria to Streptococcus 10/11 Pneumococcal Pneumonia due pneumonia to pleuropneumonia 11/11 Mycoplasma pneumonia Retrieval of Similar Patient Cases • We compute the similarity among the reference record and all records in the database
Textsummarization Semanticgraphbasedrepresentation AutomaticIndexing InformationRetrieval
Automatic Indexing of EHR • Discovering relevant SNOMED-CT concepts in health records • Spell checking • Acronym expansion and WSD • Negation detection • Concept identification 4 steps
AutomaticIndexing of EHR • Spell Checking • Hunspell + Levenshtein + keyboard + phonetic distance
AutomaticIndexing of EHR • Acronym expansion and WSD • A list of abbreviation + Machine Learning + expert rules
AutomaticIndexing of EHR • Negation detection • NegEx algorithm Spanish adaptation • Negation cue + Negation scope
AutomaticIndexing of EHR • Concept identification Query El recién nacido fue ingresado • Candidatemappings • Recién nacido. • Recién nacido prematuro. • Ingreso del paciente. SNOMED-CT concept descriptions Scoring function • Final mappings • Recién nacido. • Ingreso del paciente.
AutomaticIndexing of EHR • Future work • Representing the EHR as a graph using different relations from SNOMED-CT • Computing the salience of the concepts to obtain the most representative ones • Using such representation in different NLP tasks (e.g., categorization, IR, etc.)
FurtherReadings Summarization Plaza, L., Díaz, A., Gervás, P. (2011). A semantic graph-based approach to biomedical summarization. Artificial Intelligence in Medicine,53. Plaza, L. (2012). Evaluating the importance of sentence position for automatic summarization of biomedical literature. Submitted to Bioinformatics Word Sense Disambiguation Plaza, L., Stevenson, M., Díaz, A. (2012). Resolving Ambiguity in Biomedical Text to Improve Summarization. Information Processing & Management, 48(4). Plaza, L., Jimeno-Yepes, A., Díaz, A., Aronson, A.(2011).Studying correlation between different word sense disambiguation methods and summarization effectiveness in biomedical texts. BMC Bioinformatics, 12. Automatic Indexing Jimeno-Yepes, A., Plaza, L., Mork, J., Díaz, A., Aronson, A.(2012).Using automatic summaries to improve automatic indexing. To appear in BMC Bioinformatics. Retrieval of Similar Cases Plaza, L., Díaz, A.(2010).Retrieval of Similar Electronic Health Records using UMLS Concept Graphs. 15th International Conf. on Applications of Natural Language to Information Systems.