1 / 46

Kai Zheng, PhD, Qiaozhu Mei, PhD, David A. Hanauer, MD University of Michigan

Developing an Intelligent and Socially Oriented Search Query Recommendation Service for Facilitating Information Retrieval in Electronic Health Records. Kai Zheng, PhD, Qiaozhu Mei, PhD, David A. Hanauer, MD University of Michigan.

verity
Download Presentation

Kai Zheng, PhD, Qiaozhu Mei, PhD, David A. Hanauer, MD University of Michigan

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Developing an Intelligent and Socially Oriented Search Query Recommendation Service for Facilitating Information Retrieval in Electronic Health Records Kai Zheng, PhD, Qiaozhu Mei, PhD, David A. Hanauer, MD University of Michigan - On Behalf of William Wilcox, Danny Wu,and Lei Yang

  2. Information Retrieval in EHR ? Millions of patient records Specialized language Rich, implicit intra/inter document structures Deep NLP/Text Mining is necessary Complicated information needs Privacy is a big concern

  3. Problem Statement Electronic health records (EHR), through its capability of acquiring and storing vast volumes of data, provides great potential to help create a “rapid learning” healthcare system However, retrieving information from narrative documents stored in EHRs is extraordinarily challenging, e.g., due to frequent use of non-standard terminologies and acronyms

  4. Problem Statement (Cont.) Similar to how Google has changed the way people find information on the web, a Google-like, full-text search engine can be a viable solution to increasing the value of unstructured clinical narratives stored in EHRs However, average users are often unable to construct effective and inclusive search queries due to their lack of search expertise and/or domain knowledge

  5. Proposed Solution An intelligent query recommendation service that can be used by any EHR search engine to • Artificial Intelligence: augment human cognition so that average users can quickly construct high quality queries in their EHR search  • Collective (social) Intelligence: engender a collaborative and participatory culture among users so that search queries can be socially formulated and refined, and search expertise can be preserved and diffused across people and domains

  6. A Typical IR System Architecture Documents INDEXING Query Rep Doc Rep query INTERFACE Ranking SEARCHING results Feedback Users QUERY MODIFICATION

  7. EMERSE EMERSE - Electronic Medical Record Search Engine Full-text search engine Created by David Hanauer Widely used in UMHS since 2005 (and VA) Boolean keyword queries Routinely utilized by frontline clinicians, medical coding personnel, quality officers, and researchers at the University of Michigan Health System The test platform for the solutions being built through this project

  8. Specific Aims of the Project Aim #1: Developing AI-based Query Recommendation Algorithms Aim #2: Leveraging Social Intelligence to Enhance EHR Search Aim #3: Defining a Flexible Service Architecture

  9. Aim #1: Developing AI-based Query Recommendation Algorithms Clinicians find great difficulty to formulate queries to express their information needs EMERSE provide “semi-automatic” query suggestion (synonyms, spelling, etc.) Example: uti  uti "urinary tract infection" 25% adoption rate! Text mining/machine learning methods to automatically select alternative query terms Technical details left later in the talk

  10. Aim #2: Leveraging Social Intelligence to Enhance EHR Search Enhancing AI-based algorithms with social intelligence: • Allow users to bundle search terms and share • Social appraisal • Classifying search terms bundles for easy retrieval • Other community features • Enhancing collaboration among user communities across institutions

  11. Aim #3: Defining a Flexible Service Architecture A service-oriented architecture serving general search knowledge Locally implementable APIs Implementation of the community features

  12. System Architecture

  13. To Challenge Us – Why Bother? Q1: Is this different from PubMed? • EHRs have very different properties Q2: Is this different from Google? • Very different information needs in EHR search Q3: Could “social search” even work?

  14. Dictated Notes vs Typed Notes Hypothesis: there exists a considerable amount of lexical and structural differences. Such differences could have a significant impact on the performance of natural language processing tools, necessitating these two different types of documents being differentially treated Data: 30,000 dictated notes and 30,000 typed notes of deceased patients, randomly sampled Same genre: encounter notes that physicians composed to describe an outpatient encounter or to communicate with other clinicians regarding patient conditions

  15. Comparison: Vocabulary 64,487 > 80% OHSUMED: 172 UMLS+: English dictionaries + commonly used medical terminologies + all concepts/terms in UMLS

  16. Comparison: Acronym Usage

  17. Comparative Analysis: Perplexity Fewer occurrences Sparser information! Less functional words Words repeat less Higher perplexity/randomness * Typed notes have higher variance of almost all document measures

  18. Lessons Learned Clinical notes are much noisier than biomedical literature Among them, notes typed-in by physicians are much noisier and sparser than notes dictated. What about different genres of notes? These differences of linguistic properties imply potential difficulty in natural language processing

  19. Analysis of EMERSE Query Log Hours of a day Days of a week (Mon - Sun) 202,905 queries collected over 4 years 533 users (medical professionals in UMHS) 35,928 user sessions (sequences of queries)

  20. Query Distribution – Not a Power Law! Long tail – but no fat head

  21. A Categorization of EHR Search Queries Almost no navigational queries; most queries are informational/transactional Using the top-level concepts of SNOMED CT

  22. Comparison to Web Search Almost no navigational queries (Web: ~ 30%); Average query length (Web: 2.3): • User typed in: 1.7 • All together (typed in + query suggestions + bundles): 5.0 Queries with Acronym: 18.9% (Web: ~5%) Dictionary coverage: 68% (Web: 85%-90%) Average length of session: 5.64 queries (Web: 2.8) Query suggestions adopted: 25.9% (Web: < 10%)

  23. Lessons Learned Question: Can the users help each other to formulate queries? Medical search is much more challenging than Web search • More complicated information need • Longer queries, more noise Users have substantial difficulty to formulate their queries • Longer search sessions • High adoption rate of system generated suggestions

  24. “Social” (Collaborative) Search in EMERSE - Zheng, Mei, Hanauer. Collaborative search in electronic health records. JAMIA2011 Changing a search experience into a social experience Users create search bundles (bundled query) • Collection of keywords that are found effective as a query • Reuse search bundles • Share them with other users Public sharing vs. private sharing Search knowledge diffuses from bundle creators to bundle users

  25. Example: a Search Bundle

  26. Share a Bundle Publically/Privately

  27. The Effectiveness of Collaborative Search Search bundles (as of Dec. 2009): • 702 bundles • 58.7% of active users • Almost half of the pageviews • 19.3% of all queries (as of Dec. 2010) • 27.7% search sessions ended with a search bundle (as of Dec. 2010) • Bundle creator: 188 • Bundle sharers: 91 • Bundle leechers: 77

  28. Example Bundles GVHD: "GVHD” "GVH” "Graft-Versus-Host-Disease” "Graft-Versus-Host Disease” "Graft Versus Host Disease” "Graft Versus Host” "Graft-Versus-Host” "Graft vs. Host Disease” "Graft vs Host Disease” "Graft vs. Host” "Graft vs Host"

  29. Example Bundle (cont.) Myocardial infarction: NSTEMI STEMI ~AMI "non-stelevation” "non stelevation” "st elevation MI” "stelevation” "acute myocardial infarction” "myocardial infarction” "myocardial infarct” "anterior infarction” "anterolateralinfarction” "inferior infarction” "lateral infarction” "anteroseptalinfarction” "anterior MI” "anterolateralMI” "inferior MI” "lateral MI” "anteroseptalMI” infarcted infarction infarct infract "Q wave MI” "Q-wave MI” "Q wave” "Q-wave” "st segment depression” "t wave inversion” "t-wave inversion” "acute coronary syndrome” "non-specific ST wave abnormality” "non specific ST wave abnormality” "ST wave abnormality” "ST-wave abnormality” "CPK-MB” "CPK MB” "troponin” ~^MI -$"MI \s*\d{5}” -systemic

  30. Bundle Sharing Across Departments

  31. Bundle Sharing Across Individual Users Red links: cross department links

  32. Bundle Sharing Facilitated Diffusion of Information Quantitative network analysis of search knoweldge diffusion networks Giant component exists Small world (high clustering coefficient & short paths) Publically shared bundles better facilitates knowledge diffusion • Privately shared bundles adds on top of public bundles Users tends to share bundles to people in the same department; but specialty is a more natural representation of communities. (based on modularity)

  33. Lessons Learned Medical search is much more challenging than Web search Users have substantial difficulty to formulate their queries • Longer search sessions • High adoption rate of system generated suggestions • High usage of search bundles Collaborative search has facilitated the sharing/diffusion of search knowledge • Public bundles are more effective than private • 30% bundle users are leechers; half of the bundle creators don’t share

  34. Automatic Query Recommendation: Methods Similarity based (kNN) Pseudo-feedback Semantic term expansion Network-based ranking Learning to rank (much labeled training data needed)

  35. Automatic Query Recommendation: Available Information Information to leverage: • Co-occurrence within queries • Transition in query sessions • Co-occurrence within clinical documents • Annotation by ontological concepts • Ontology structures • Morphological closeness • Clickthrough

  36. A Network View

  37. Random Walk and Hitting Time P = 0.3 k 0.3 A i 0.7 P = 0.7 j Hitting Time • TA: the first time that the random walk is at a vertex in A Mean Hitting Time • hiA: expectation of TA given that the walk starts from vertex i

  38. Computing Hitting Time hiA = 0.7 hjA + 0.3 hkA + 1 • TA: the first time that the random walk is at a vertex in A h = 0 0.7 k A i • hiA: expectation of TA given that the walk starting from vertex i 0.7 Apparently, hiA = 0 for those j Iterative Computation

  39. Generate Query Suggestion Notes/Concepts/sessions… • Construct a (kNN) subgraph centered by the query term (s) • Could be bipartite • Compute transition probabilities (based on co-occurrence/similarity) • Compute hitting time hiA • Rank candidate queries using hiA Query 4 D1 uti 2 D2 bacterial D3… urinary tract infection

  40. Other Network-based Methods Stationary distribution Absorbing probability Commute time Other measures More general: network regularization

  41. Ranking with Multiple Networks B C C …… C A D D A B A B D Query transitions Distributional similarity Ontology structures Ranking/Transductive Learning with Multiple Views (e.g., Zhou et al. 2007, Muthukrishnan et al. 2010) Suggested Queries

  42. Evaluation Cranfield evaluation (adopted by TREC) • Sample information needs  queries • Fixed test document collection • Pool results of multiple candidate systems • Human annotation of relevance judgments • IR Evaluation (e.g., MAP, NDCG) Directly rating by users (bucket testing)

  43. Towards the Next Generation EHR Search Engine Better understanding of information needs by medical professionals • frontline clinicians, administrative personnel, and clinical/translational researchers Better natural language processing for patient records Better mechanisms of automatic query recommendation in the medical context Better ways to facilitate collaborative search and preserve search knowledge Better ways to improve the comprehensibility of medical data by patients and families (future)

  44. Publications to Date Kai Zheng, Qiaozhu Mei, David A. Hanauer. Collaborative search in electronic health records. JAMIA. 2011;18(3):282–91. Lei Yang, Qiaozhu Mei, Kai Zheng, David A. Hanauer. Query log analysis of an electronic health record search engine. AMIA Annual Symposium Proc. 2011. (forthcoming) Kai Zheng, Qiaozhu Mei, Lei Yang, Frank J. Manion, Balis UJ, David A. Hanauer. Voice-dictated versus typed-in clinician notes: Linguistic properties and the potential implications on natural language processing. AMIA Annual Symposium Proc. 2011. (forthcoming)

  45. Thanks!

More Related