MedSearch : A Specialized Search Engine for Medical Information Retrieval

MedSearch: A Specialized Search Engine for Medical Information Retrieval Presented by NavyasriCanumalla Paper by Gang Luo , Chunqiang Tang , Hao Yang , Xing Wei

Overview • Motivation • MedSearchObjectives • Challenges • Approach • Implementation • Experimental Setup • Conclusion

Motivation • Medical information searcher is uncertain about his exact question. • Prefers to give long queries describing the symptoms in plain English. • Unfamiliar with Medical terminology and uses web search to better digest information obtained from doctors afterwards.

Limitations of existing Medical Search Engines: -They impose certain limits on query length. Eg. Google and Healthline are 32 words and 20 words , respectively - Cannot suggest diversified,related medical phrases, if query is in plain English • A medical information searcher prefers the search engine to suggest diversified, related medical phrases that help him quickly digest search results and refine the query.

MedSearch Objectives • Patient can use MedSearch to facilitate preliminary self diagnosis. • Patient can use MedSearch to better prepare for doctor’s appointments. • Patient can use MedSearch to help him digest the information that he does not fully understand after consulting the doctor. • Patient can use MedSearch to find more information and clarify his symptom description.

Challenges • Rewrite long queries without losing info • When ranking the suggested medical phrases, it has to resolve the terminological discrepancy between medical phrases and queries written in plain English.

Approach • MedSearch crawls Web pages from a few selected, high-quality medical Web sites. • Calculates relevance score by ranking documents. • MedSearch makes use of the Medical Subject Headings (MeSH) ontology , a standard vocabulary edited by the National Library of Medicine to generate medical phrases.

Implementation MedSearch processes a medical query Q in the following steps: Step 1: Remove stopwords from Q. Step 2: Rewrite Q into a moderate length if it is too long. Step 3: Produce diversified search result pages. Step 4: Generate snippets. Step 5: Suggest related medical phrases

Step 2: Rewriting Queries • MedSearch uses a length threshold lT= 10 • If lT< ||Q|| , MedSearch treats Q as a long query and rewrites into a moderate-length query Q’ by selectively dropping unimportant terms • Terms in query are ranked according to tfxidfvalue and those with largest tf×idf values are kept. • An upper bound U on the length of the modified query, U = 80

Step 3: Diversifying Search Results • Cluster results in collection C into k clustersusing K-means clustering. • Aconstant j=20 is chosen and each cluster contributes the highest relevance score(in that cluster) result to top-j results. • Both relevance and diversity are judged using a single metric: usefulness. Webpage P is useful thenscoreu(P ) = 1 ,otherwise 0

For the returned top-20 Web pages, their weighted average usefulness score is defined as

When K is too small, relevant Web pages tend to gather in the same clusters • When K is too large, the clustering effect is not significant • K = 1500

Step 4: Generating Snippets • For each such snippet sn, MedSearch highlights the medical phrases and the top-3 common terms between snand the query Q Step 5: Suggesting related Medical Phrases • For each query, MedSearch suggests V related medical phrases, where V is 60

Sub-step 1: Generating Candidate Set • MedSearchselects V distinct medical phrases with thelargest tf×idfvalue from the returned top-J Web pages to form a candidate set S. Sub-step 2: Ranking Medical Phrases • For each medical phrase M retrieve the top-ranked r Web pages in C and use them as M’s representative Web pages • Compute the relevance score between M and Qas a weighted average of the relevance scores between Q and M’s representative Web pages

To achieve good performance, it is best to set r=1.

Experimental Setup • Crawled 20GB of Web pages from WebMD, one of the most popular medical Web sites. • Fed MedSearch with natural medical queries extracted from the Med Help International Medical and Health Forum.

Conclusion • MedSearchis a specialized Web search engine for medical information retrieval • MedSearch supports queries written in plain English, accepts long queries, provides diversified search results, and suggests related medical phrases with proper ranking and annotation • These features are attractive to ordinary Internet users who have little medical knowledge and are unfamiliar with medical terminology

Thank You

MedSearch : A Specialized Search Engine for Medical Information Retrieval