210 likes | 294 Views
Annotation of 311 Admission Summaries of the ICU Corpus. Yefeng Wang. Aim. Create evaluation data for SNOMED CT concept matching performance. Create training data for machine learning systems. Rule-based systems has low recall Difficult to tune parameter, building the rules
E N D
Annotation of 311 Admission Summaries of the ICU Corpus Yefeng Wang
Aim • Create evaluation data for SNOMED CT concept matching performance. • Create training data for machine learning systems. • Rule-based systems has low recall • Difficult to tune parameter, building the rules • Machine learning system is the state of art • No such annotated data available yet.
Existing Corpora • Most of the existing corpora are in biomedical domain • GENIA (2000 abstracts from MEDLINE) • PennBioIE (2300 MEDLINE abstracts) • Only a few are from clinical domain • Ogren et al., (clinical condition only) • Chapman et al., (clinical condition only) • CLEF, (semantically annotation, formal report)
Selection of Data • Clinical notes were from 311 patients’ admission summaries. • One note per patient • Admission notes were used for annotation • Semi Structured, Variety of information • Chief Complaint • Background • History of Presented illness • Medication • Examination • Observation in Nursing Notes • Social • Other summaries (Echo reports, Surgical reports, etc)
The Annotation Task • Concept Annotation • Annotate semantic category of medical concepts • Categories were based on SNOMED CT • Relation Annotation • Relationships between concepts. • Inter-term relation • Relationship between two separate concepts • Intra-term relation • Relationship between atomic concepts within a composite concept (Post-coordination).
Development of Guidelines • Iterative Approach • 10 reports were annotated jointly by two annotators. • Discussion, • Development of initial guidelines • 25 reports were used for iterative refinement of guidelines • Annotate separately • 5 documents for each iteration • New examples, rules were added into annotation guidelines if necessary
Annotation Agreement • Inter-Annotator Agreement were calculated during each development cycle. • F1- is used for calculation • Harmonic mean of recall and precision • Precision = # correct annotation / # annotation • Recall = # correct annotation / # existing concepts • Repeat development process until the annotator agreement reach a threshold of 90%. • The guidelines then are finalised, no more new rules will be added into the guidelines. • Differences resolved by a third annotator to make a gold standard corpus.
Comparison to other corpus • Comparison to corpus in newswire, biomedical, science (astronomy) domain. • Available corpus MUC, GENIA, ASTRO
Concept Identification Result • 279 documents for training • 32 documents for testing • 4656 tokens, 1218 concepts • Rule-based system (TTSCT) • Use Conditional Random Fields CRF++ as the learner. • Evaluate using CONLL 2000 evaluation script.
Inter Relation Annotation • Annotate relationship between concepts • Inter-concept relations • Relationship between two outermost concepts • CXR in ED bilateralmid- lower zoneopacification
Intra-Concept Relations • Relations between inner concepts and outermost concepts • Term decomposition • Rgroinabscess
Inter + Intra Concept Relationships • Hemicolectomy and formation of ileostomy for bowel obstruction