40 likes | 171 Views
Shock. Progress & Direction. MetaMap. Tokenized words for Mohammed Enables him to test his new models for Pattern matcher Mallet Training Data for Laura Enables her to work on creating a better version of MetaMap .
E N D
Shock Progress & Direction
MetaMap • Tokenized words for Mohammed • Enables him to test his new models for Pattern matcher • Mallet Training Data for Laura • Enables her to work on creating a better version of MetaMap. • MetaMap currently has many concept annotation issues because the dictionary they use is so large. Concepts are frequently tagged incorrectly.
Mishaps & Solutions • Mapping concepts to the phrases • Maddening XML Schema • Makes it difficult to understand how the words & their POS’s in each phrase map to the concepts. • Created a model of the utterances, phrases and concepts to solve this problem. • Used a DOM parser at first. Took about 15 minutes per xml file for 18 hours total. • Replaced with SAX parser which sped up progress. • Lesson learned: Do not use a DOM parser for a large document.
Next Task:Exploration of New MethodsFor Extracting Named Entities • The XConc Suite • Corpus developer and annotator • Explore XConc as a MetaMap replacement for extracting named entities using event based annotation.