80 likes | 160 Views
15 Sep 2009. Medical text extraction. Objective. A lot of biomedical articles Too troublesome to read through When all you want to know is: Author, Institution, Research Database, Analysis Tools used, etc. Use information retrieval to extract relevant info from articles. Approach. CRF++
E N D
15 Sep 2009 Medical text extraction
Objective • A lot of biomedical articles • Too troublesome to read through • When all you want to know is: • Author, Institution, Research Database, Analysis Tools used, etc. • Use information retrieval to extract relevant info from articles
Approach • CRF++ • Training files • XML tagged medical articles • Tagging done by some doctors (from Duke-NUS side)
Tags of importance • Author • Institution • Email • Database Name • Data Analysis Name
Result (1/2) • 3-fold cross-validation • 50 articles used • More available, but not used due to noise (to be cleaned up) • 12 features used
Some difficulties • Some peculiar Asian names • Unpredictable for Database name: • <database_name> mfold 3.2 online software </database_name> • <database_name> regional mailing list of the Institute of General Practice, University Hospital Schleswig-Holstein </database_name> • <database_name> hospital and population data set </database_name> • <database_name> national registry </database_name>