1 / 75

Data Integration in Healthcare

Data Integration in Healthcare. Tutorial Project by Bill Davis Designed for CS 511 Spring 2005. About this Presentation. Project + Homework #3 = min( prepare time ) Be kind! Ask questions! :-) = Fun = Zzzzz….. Length = (5, 35] minutes. Tutorial Background.

lynde
Download Presentation

Data Integration in Healthcare

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Data Integration in Healthcare Tutorial Project by Bill Davis Designed for CS 511 Spring 2005

  2. About this Presentation • Project + Homework #3 = min( prepare time ) • Be kind! Ask questions! :-) • = Fun = Zzzzz….. • Length = (5, 35] minutes

  3. Tutorial Background • Spoke to individuals in different areas of healthcare: • Brianne Davis: Registered Nurse at Provena Covenant in Urbana, Illinois • Justin Roberts: Resident Medical Student in Youngstown, Ohio • Julie Bachman: Receptionist for small town family practice doctor in Reinbeck, Iowa

  4. Tutorial Background • What did I find out? • Wow…Healthcare is way behind! • There are many barriers to successful data integration • What current methods are used to store information, and what software are they using • What do they anticipate to be the future of healthcare data

  5. Tutorial Background Now to the actual presentation…

  6. Planning for the Future • Healthcare system is shifting from individual treatments to continuous process involving a variety of professionals and institutions • Researchers have a difficult time tracing a patient’s history, and also how a patient responds to specific treatments in the long term due to the disconnect in information • Data mining and analysis is not possible due to the current method of storing records in paper format • It remains difficult for medical staff to share situational knowledge to benefit patient care

  7. Current vs. Integrated Approach Current Method: Integrated Approach:

  8. Current Sources of Information • Small networks of clinics • Relatively few in number, close in proximity • Vast majority of patient information comes from own clinics • Have custom built databases to suit specific needs of the care facility • Additional patient information collected by contacting other clinics with patient’s authorization • Interaction with 3rd party affiliates such as insurance and collection agencies is slow due to the restricted nature of medical records

  9. Current Sources of Information • General Practitioner coordinates the flow of a patient’s record

  10. Current Sources of Information • Medical Encyclopedias • Medline Plus, Healthlinks, etc. • Payment Collection Information • SearchAmerica • Research Datasets • CDC, AHRQ, Pan American Health Org. • Datasets small subset of entire population and usually only provided by those who volunteer their personal data

  11. Technical Obstacles • Many custom built databases that are difficult to merge into common format • Deciding on a single format that will perfectly fit every healthcare need is unrealistic, so determining the universal format is hard • Maintaining security and preventing unauthorized access to records while allowing easier access to those who truly need the data

  12. Technical Obstacles • Hard distributed database problem; where do you look to find a patient’s previous records

  13. Social Resistance • Who owns the patient records if stored in a central database? • Who is authorized to add or change a patient’s record • Large target for hackers; difficult to prevent all unauthorized users since user base is so large • Difficult to guarantee patient privacy on the same level as paper records • Clinics are reluctant to accept proposed standards fearing the possibility of a difficult migration to the eventually accepted standard

  14. Social Resistance • Hospitals Reluctant to integrate • Lose competitive edge as patients have more freedom to move amongst providers • Universal storage format may not fit their needs as well as a custom solution • Data Entry Problems • Which Entry Method: Natural Language, Structured Data, Form Driven • Establish Common Vocabulary to ensure consistency across all patient records

  15. Primary Data Integration Issues • Merging thousands of custom databases into a format that can easily transfer data to other systems • Connecting thousands of custom databases to accurately and efficiently combine information access • Where will the records be located: central database, or remain distributed • What will be the standard messaging format for data exchange

  16. Goal of Healthcare Integration Connect the vital areas of healthcare to a universal Electronic Data Interchange (EDI) system

  17. Proposed Solutions and Standards *Dozens more not listed here

  18. What’s in a Standard? • Much like HTML is the language of the internet, allowing computers with different applications and configurations to communicate, healthcare has developed message passing standards that regulate how information should be transferred between computers. • There are many different types of standards that are in development, the most popular of which is Health Level 7 (HL7)

  19. Health Level 7 Healthcare Standard Mission: “To provide standards for the exchange, management and integration of data that support clinical patient care and the management, delivery and evaluation of healthcare services.” • HL7 version 1.0 standard established in 1999: currently on version 3.0 which is XML based • HL7 is primarily focused on developing a standard for exchanging messages among information systems that implement healthcare applications

  20. Health Level 7 Healthcare Standard • HL7 is built on a Reference Information Model (RIM) which is a large representation of the clinical domain • RIM identifies the life cycle of events that a message or groups of related messages will carry

  21. Health Level 7 Healthcare Standard • Provides recommendations on database layout, application data flow, and message formatting • Standard developed and maintained by professionals in the healthcare field to accurately represent the integration needs of healthcare • Takes all aspects of the healthcare process into consideration to ensure that HL7 can facilitate true domain wide integration

  22. HL7 Message Format • An HL7 message consists of data fields that are of variable length separated by a field separator character • Data fields are combined into logical groupings called segments • Structure hierarchy: • Message • Segment (Some are Repeatable) • Fields (Some are Repeatable) • Components • Subcomponents

  23. HL7 Message Format Segments Used: MSH – Message Header PID – Patient Identification FT1 – Financial Transaction OBR – Observation Reporting OBX – Observation Results Example Message Encoding:

  24. jEngine Application Interface • Application interface built on the HL7 message passing standard • HL7 does not provide methods by which data will be shared or accessed in a distributed environment; jEngine provides an application to transfer data among databases • Goal is to ensure freedom, flexibility, and cost-savings by providing an open-source integration engine to the healthcare industry

  25. Application on Computer 1 Application on Computer 2 Message Transmission Facility 1 Facility 2 Decrypt message Parse HL7 message into record components Encodes Patient Record into HL7 Message Message encrypted for transmission jEngine Transmit message jEngine * The applications are independent of each other thanks to HL7

  26. jEngine Application Interface jEngine translates incoming messages defined by the HL7 standard and passes the information on to external systems (databases, applications, etc.)

  27. Solutions Overview • HL7 is not the only standard available, however it currently has the largest backing by the healthcare community • jEngine is one of several dozen applications building upon the standards set forth by organizations like HL7 • So many systems currently exist that it is difficult to determine which system or standard will emerge as the market standard: this uncertainty causes hesitation in those looking to prepare their systems for integration

  28. Message Security • Australian care facilities experimented with using smartcard technology to protect patient records • Patients and physicians must enter their cards into the computer at the same time to unlock the patient records • Patients liked this method since they felt they had more control over their personal records

  29. Progress • The Medicare Modernization Act was passed in December of 2003 and given the aim of creating a strategic plan for the nation’s health IT infrastructure • President George W. Bush is currently pushing to have electronic health records available to all U.S. residents by 2014 • Progress is slow as healthcare facilities are slow to adapt, however many groups recognize the need for integration and are constantly proposing new methods of improving healthcare integration

  30. Outlook – Proposed Items • In order to begin the move toward healthcare integration, a common data representation must be selected and agreed upon by the entire healthcare community • The entire healthcare domain must be taken into consideration • The model will need to accurately represent the data flow that exists and make healthcare personnel’s jobs easier in order to be accepted • Patients will need assurance that their information is safe from hackers and also have some control over their own privacy (such as a smart card)

  31. Outlook – Proposed Items • Systems must agree on some form of RIM model to track message life cycles • Without common lifecycles, some messages transmitted between companies with different database setups will not be able to store the data accurately • For this reason HL7 has developed process lifecycles as well as suggested database designs as guidelines for applications to increase compatibility

  32. Outlook – Moving Forward • Privacy concerns continue to be the primary social resistance factor • Without the government stepping in to promote the acceptance of certain standards, the ‘weeding out’ period could take a long time • Many applications are provided as open source to help promote their growth, this will likely be essential to overall acceptance

  33. Research Direction • Develop or modify a standard that not only embodies the current healthcare needs, but is also built for easy expansion to accommodate for future needs • Examine the current model of record processing to determine how a fully integrated system can eliminate many unnecessary steps • Determine what representation of data is easiest for healthcare personnel to enter while at the same time allowing data mining applications to use the data most effectively

  34. End of Talk

  35. Information Integration in Life Science Govind Kabra gkabra2@uiuc.edu

  36. Reference publications [1] Bioinformatics Resources From the National Center for Biotechnology Information: An Integrated Foundation for Discovery. Barbara A rapp and David L. Wheeler. In proc of Journal of the American Society for information science and technology. 2005. [2] Integration of Biological Sources: Current Systems and Challenges Ahead Thomas Hernandez and Subbarao Kambhampati. In SIGMOD Record September 2004. [3] Integration of Biological Data from Web Resources: Management of Multiple Answers through Metadata Retrieval Marie-Dominique Devignes and Malika Smail In ISMB-ECCB 2004

  37. Reference Talks [1] Information Management for Genome Level Bioinformatics. Norman Paton and Carole Goble. Tutorial in VLDB 2001. [2] Data Integration in Life Sciences. Kenneth Griffiths and Richard Resnick. Invited Talk to AAAI.

  38. Outline of the talk • Current focus of Life Science: Genomics • Management of genomic data • Need for information integration • Key challenges • Alternative approaches, design choices and existing Solutions • An Ideal System

  39. Life Science Data Recent focus on genetic data “genomics: the study of genes and their function. Recent advances in genomics are bringing about a revolution in our understanding of the molecular mechanisms of disease, including the complex interplay of genetic and environmental factors. Genomics is also stimulating the discovery of breakthrough healthcare products by revealing thousands of new biological targets for the development of drugs, and by giving scientists innovative ways to design new drugs, vaccines and DNA diagnostics. Genomics-based therapeutics include "traditional" small chemical drugs, protein drugs, and potentially gene therapy.” The Pharmaceutical Research and Manufacturers of America - http://www.phrma.org/genomics/lexicon/g.html

  40. What is the use of genomic data? • Study of genes and their function • Chromosomal location, sequence, protein structure • Homology, motifs, expression • Understanding molecular mechanisms of disease • Development of drugs, vaccines, and diagnostics

  41. What are the sources of genomic data? • Primary Databases • Data generated by experiments • Role: standards, quality thresholds, dissemination • Sequence databases: EMBL, GenBank • Increasingly other data types: micro-array • Secondary Databases • Data obtained from analysis and expertise • Role: accumulated specialist knowledge; Annotations • Swiss-Prot, PRINTS, CATH, PAX6, Enzyme, dbSNP • Role: Warehouses to support analysis • GIMS, aMAZE, InterPro

  42. Data Model • For Gene Sequences: good data abstraction • Sequence data • For Functional genomic data: no obvious abstraction • Descriptive models • Unstable Schemas • Retain all results in primary database (e.g., microarray images)

  43. How to interact with genomic databases? • Web browser • Point and click • Query by navigation • Results in flat file or graphical formats • Screen-scrapping • Perl scripts over downloaded flat files • The most popular form • XML formats taking hold • Beginnings of API’s in Corba • But still limited to call-interface rather than queries

  44. Example Queries • Retrieve the motifs of proteins from S. cerevisiae • Retrieve proteins from A. fumigatus that are homologous to those in S. cerevisiae. • Retrieve the motifs of proteins from A. fumigatus that are homologous to those in S. cerevisiae.

  45. Need for Integration of multiple sources • Quantity of biological resources: • Large number of independent databases • Several analysis tools developed • Meaningful answers to many requests require access to data from multiple sources

  46. Characteristics of biological sources that make integration difficult • Highly diverse nature of data stored • Representational heterogeneity of data • Autonomous and web-based interaction • Various querying capabilities and interfaces

  47. Variety of Data • Stored data includes gene expression and sequences, disease characteristics, molecular structures, micro-array data, protein interaction, etc. • Relationships between objects and concepts are difficult to formalize • Not only large amounts of data, but each datum or record can itself be extremely large.

  48. Representational heterogeneity • Similar data can be present in several sources in different representation • Structural differences • Semantic discrepancy due to naming and semantic differences • Content differences for same object due to missing values • Makes entity identification and consistency difficult

  49. Autonomous and web-based sources • Can modify design, or data, or block access • Not aware of integration systems accessing them • All are web-based, so access is dependent on network traffic • Dynamic nature makes keeping the integration system consistent with sources difficult

  50. Differing querying capabilities • User-access interfaces vary across sources, and users need to learn them • Prevent direct access to their data • Sometimes useful information cannot be accessed because of query restrictions

More Related