200 likes | 297 Views
CCEGA Informatics Project: Developing Shared Infrastructure and Data Models. Project Leader: Brad Hemminger bmh@ils.unc.edu School of Information and Library Science University of North Carolina at Chapel Hill. Participants. Brad Hemminger bmh at ils.unc.edu
E N D
CCEGA Informatics Project: Developing Shared Infrastructure and Data Models Project Leader: Brad Hemminger bmh@ils.unc.edu School of Information and Library Science University of North Carolina at Chapel Hill
Participants • Brad Hemminger bmh at ils.unc.edu • Kaye Balke balke at ils.unc.edu • Kirk Wilhemsen kirk at neurology.unc.edu • David Threadgill dwt at med.unc.edu • Dong Xiang dxiang at email.unc.edu • Min Xu xumin at med.unc.edu • Joel Kingsolver jgking at bio.unc.edu • Paul Brown paul.brown at unc.edu • Lavana Ramakrishnan lavanya at renci.org • Roger Akers akers at unc.edu • Peter DeSaix pdesaix at email.unc.edu • Clark Jeffries clark_jeffries at med.unc.edu • Xiaojun Guan xguan at renci.org • Kevin Gamiel kgamiel at renci.org • Erik Scott escott at renci.org • Barrie Hayes bhayes at email.unc.edu
Project Aims Goal: Development of common data model and informatics infrastructure for UNC • Determine needs of research labs on campus • Determine applicable global standards that can be utilized • Determine issues that affect whether research labs would utilize a common infrastructure and common data model. • Understand and address security issues • Based on this information, develop model
Lab Surveys • Bioinformatics Research labs at UNC were invited to provide details of their data infrastructure, in particular their data models (and example data). • PIs and database administrators from the projects meet with our full committee for interviews, and afterwards we followed up to obtain dumps of their data schemas.
Labs that provided in depth interviews and complete data models • Kirk Wilhelmsen (alcoholism and addiction projects) • Paul Brown (Cell Biology, multiple projects) • Roger Akers (Epidemiology Specimen Tracking) • Lineberger (multiple cancer projects) • Mike Knowles (Pulmonary and Cystic Fibrosis) • Kari North (case control and family based studies of cardiovascular disease) • Proteomics Center (earlier project)
Global Standards • While there are no overarching standards that define common definitions for all the data elements necessary, standards exists in many individual domains (microarrays, genetic sequences, proteins, etc). Additionally, larger scale efforts are being made, such as CDSIC (clinical trials) and caBIG (cancer). caBIG has a whole workgroup devoted to vocabularies and common data elements (VCDE).
Issues affecting user acceptance • Most all research projects prefer to have their own database • Specific projects • No need to tie into other researchers data • No need to preserved data generated by study • Easier to build themselves • More control when managed themselves • Core facilities • Require specific control, privacy of data • Clinical facilities • Rigorous requirements regarding sharing of data (ELSI, HIPAA)
Reasons for Sharing • More studies are required to share data between projects (larger studies, multicenter studies) • More projects depend on outside resources (databanks) • Free, or inexpensive disk space • Dependable archiving of data • Assistance in designing data models for study
Security Possible security design requirements: • Identification tables of entities (as in Trusted Broker doc) • Translation tables among entities • Authentication (two-way) between broker and entities • Authorization of entities by broker • Encrypted channels (SSL, IPSec, other) • Protection against various denial of service attack types (limiting multiple accesses or very frequent access requests from any one researcher, etc.) • Multiple types of access requirements for the human trusted broker (something you have, you know, or you are) • Other requirements on trusted broker (bonded staff, permission to modify databases requiring at least two separate trusted brokers cooperating, etc.) • Remote backup system...
Common Data Model • Had a general framework from previous work • Built new model from ground up • Took all data elements from all the research labs and pooled together to define overall set of elements, including which elements from different labs mapped to the same “common” elements. • Produced set of core elements that were common to many projects and important for sharing. • Integrated new model with overall design principles from general framework to develop final “common data model”.
Example of integrating data • View integration spreadsheet, look at example (samples) of before and after.
Final Common Model • Developing taking common data elements and putting into a database system for testing. • Database schema design (see printout) • Integrate standards in definition of data elements • Incorporate into actual database • Test model database by incorporating actual data from volunteer labs (Kirk, Roger)
Next Steps • The aim of this P20 planning project is to prepare for further grants in this area, and to hopefully help lay the groundwork for building a common biomedical informatics infrastructure at UNC • In Jan 2007, we submitted a CTSA grant (Clinical and Translational Science Award). This grant aims to integrate all biomedical informatics infrastructure on campus.
CTSA--overview • The TraCS Biomedical Informatics Core will unite the silos of biomedical informatics research excellence at UNC and across North Carolina to maximize re-use of data, knowledge and processes. With the establishment of the North Carolina Collaboratory for Biomedical Informatics (NCCBI), TraCS will support research, patient care, education and policy-making while building upon, leveraging and extending the current biomedical informatics infrastructure at UNC-CH. This core involves several external partners with a strong presence in NC and world-wide: Red Hat, IBM, SAS, Allscripts, Quintiles and NCHICA. We are committed to achieving a national leadership role in the design and development of best practices for the inclusion of clinical data into shared repositories of biomedical data.
CTSA—tie in clinical data • To support the goals of the TraCS Institute, the Biomedical Informatics Core will create a statewide interdisciplinary and inter-institutional collaboratory (collaborative laboratory): the North Carolina Collaboratory for Biomedical Informatics (NCCBI). It will build on the transformative technology used by the NIH to create Entrez for the NCBI. The long-term goal is to create a shared biomedical informatics data repository connecting clinical enterprises across the State of North Carolina to create a demonstration project for clinical data that will be a model for sharing and re-use of clinical data. This repository will contain appropriately de-identified data from clinical trials and clinical care. With the establishment of the NCCBI, the TraCS Biomedical Informatics Core will transform the excellent but fragmented biomedical informatics capabilities at UNC-CH into a coherent and connected system that facilitates routine re-use of research knowledge, data and processes throughout UNC and North Carolina, serving as a prototype for the nation.
Example Centers Included • General Clinical Research Center, the Collaborative Studies Coordinating Center, the Lineberger Comprehensive Cancer Center, the Carolina Center for Exploratory Genetic Analysis, the Carolina Center for Genome Sciences, the Carolina Exploratory Center for Cheminformatics Research, the Biomedical Imaging Research Center, the Carolina Environmental Bioinformatics Center, the Center for Bioinformatics, the Renaissance Computing Institute, and the Odum Institute for Research in Social Science
CTSA • In short, the CTSA proposal builds on the work of the P20, and offers us the potential to truly transform the way scientists and clinicians work at UNC, and bring about unprecedented integration and data sharing.
Summary--Timeline Initial Workshop beginning project (spring 2005) • Analysis of data requirements, policies, and existing infrastructure at UNC. Internal interviews with labs (spring through fall 2005) • Development complete list of data elements, review with labs and finalize elements for common model (fall 2005-spring 2006) • Development of draft model (fall 2006-spring 2007) • Testing of draft model using example labs data (fall 2007) • Review by labs and researchers at UNC. Share with outside experts to solicit critiques. (fall 2007) • Use this work to develop new grants to fund actual deployment of common data models, policies and infrastructure at UNC. (spring 2007-current)