190 likes | 568 Views
Datamining MEDLINE for Topics and Trends in Dental and Craniofacial Research William C. Bartling, D.D.S. NIDCR/NLM Fellow in Dental Informatics Center for Biomedical Informatics University of Pittsburgh Titus K. L. Schleyer, D.M.D., Ph.D. Director, Center for Dental Informatics
E N D
Datamining MEDLINE for Topics and Trends in Dental and Craniofacial Research William C. Bartling, D.D.S. NIDCR/NLM Fellow in Dental Informatics Center for Biomedical Informatics University of Pittsburgh Titus K. L. Schleyer, D.M.D., Ph.D. Director, Center for Dental Informatics University of Pittsburgh School of Dental Medicine
Overview • Goals of project • Retrieving the entire corpus of dental and craniofacial research literature from MEDLINE • Determining the characteristics of a dental research article • Machine learning to extract articles from any body of literature • Methods to categorize dental research literature to study temporal trends • Summary
Goals of project • To use computerized methods to determine topics and trends in dental and craniofacial research since 1966. • Determining the structure of such research can help to identify those research areas emerging and those waning. • Identify research funding opportunities?
Retrieving the dental literature • MEDLINE chosen as the database • MeSH tree searched manually for dental and craniofacial terms • Many MeSH terms were found in unusual locations in the hierarchy. • Decision to keep or discard term • Search limited to : • English language • Journal article • Abstract present
Results of search • ~450,000 English language articles in: • DENTISTRY • STOMATOGNATHIC SYSTEM (not PHARYNX) • STOMATOGNATHIC DISEASES (not PHARYNGEAL DISEASES) • ~61,000 articles indexed with dental MeSH terms not in above set • ~134,000 articles remaining after limiting to journal articles containing abstracts
What is a dental research article? • Currently at this phase of project • 1000 abstracts randomly chosen, 5 groups of 200 each • 15 expert judges • 3 judges assigned to each group • Judges categorize each article as: • Dental or craniofacial research • Dental or craniofacial, non-research • Non-dental • Not sure • Web interface for judging- PHP with mySQL
Differentiation of article categories • Acceptable reliability in each group ( > 0.70) • Use results of each category to develop training set • Identify Patient Sets (IPS) software • Developed by Dr. Greg Cooper at University of Pittsburgh CBMI • Natural language processing used to find patient records of a certain type from free text documents, i.e. hospital admission records
IPS creates a document vector for each document or set of documents Document i Word 1 p1 Word 2 p2 Word 3 p3 Word n pn
IDENTIFY PATIENT SETS (IPS) • Uses machine learning technique of “text classification” • All articles fed into the program • Select fields (title, abstract, MeSH terms) • Training set: • 2/3 of validated “dental research” articles • Add remaining 1/3 to original set, less the training set • Calculate success of retrieval using model created from training set • Adjust IPS and iterate, or train set with more or less documents until successful
Determining trends and topics in dental and craniofacial research • Entire set of dental research articles used • Knowledge visualization and bibliometric methods • Based on the assumption that articles in a given field are similar to one other (Hearst & Pedersen, 1996) • Similar articles and topics tend to cluster together
Bibliometric examples from other fields • Co-word analysis • Software engineering (Coulter, Monarch, and Konda, 1998) • Co-descriptor analysis • Information science (McCain, 1995) • Co-author analysis • Information retrieval literature (Ding et. al., 1999) • Co-citation analysis • Medical informatics literature (Morris & McCain, 1998)
Visual methods to categorize literature • Co-occurrence vectors or weights • Weights based on co-occurrence of terms • Multidimensional scaling • Display of points in two or three dimensions • Points closer together on matrix when articles are more similar • Clustering • Groups of points in close proximity to each other are bounded to provide an intellectual grouping
How do we cluster dental research? • Entire text of abstracts • MeSH terms only • Major headings • Subheadings • All MeSH headings • Journal titles • Combinations of the above
Once clustering is done: • Cluster dental research within certain time periods (5 years) • Determine quantities of articles published for each cluster within each time period • Cluster including only journals with a given impact factor threshhold • Study changes over time of different categories of research
Summary • A comprehensive content analysis of the dental and craniofacial research literature has not been done. • Computerized methods can help to retrieve and categorize this literature. • Study of trends in dental research can help researchers to identify relevance of current studies and possibly reveal future research opportunities.
Many thanks to the following: • Amy Gregg, MLIS-Dental Reference Librarian • Falk Library for the Health Sciences • University of Pittsburgh • Shyam Visweswaran, MD- NLM Fellow in Intelligent Systems • Center for Biomedical Informatics • University of Pittsburgh • All of my expert raters! • This research is supported with a training grant from the National Institute of Dental and Craniofacial Research and the National Library of Medicine