360 likes | 514 Views
Pragmatic translational informatics: Supporting collaborative science. Nicholas Anderson, PhD NeuroDevNet, Toronto, Canada. Overview. Motivations and Challenges Definitions and requirements The ambiguity of effective translational data sharing
E N D
Pragmatic translational informatics: Supporting collaborative science Nicholas Anderson, PhDNeuroDevNet, Toronto, Canada
Overview • Motivations and Challenges • Definitions and requirements • The ambiguity of effective translational data sharing • Digital data infrastructures are their own worst enemies • Emerging Trends • Learning healthcare research enterprises • Precision medicine through data sharing • Big Data research computing ecosystems • Integrating Genomic data into EHRs • Futures
Defining pragmatic, translational, informatics Pragmatic — Advocatingbehavior that is dictated more by practical consequences than by theoryordogma • Pragmatic trials are primarily designed to determine the effects of an intervention under the usual conditions in which it will be applied, whereas explanatory trials are primarily designed to determine the effects of an intervention under ideal circumstances (Thorpe CMAJ 2009) Translational - The rendering of something into another language or into one's own from another language. • Translational Research - the process of applying discoveries generated during research in the laboratory, and in preclinical studies, to the development of trials and studies in humans. Research aimed at enhancing the adoption of best practices in the community. (NIH CTSA RFA 2009) Informatics - The study of information processing; computer science. • Translational Informatics - integrated solutions to manage the: (i) logistics, (ii) data integration, (iii) collaboration and (iv) knowledge generation required by translational investigators and their supporting institutions. - AMIA 2012
Some requirements for sustained translational research environments • Ongoing testing and evaluation of new models of collaborative science, data and resource sharing • Cyclical positive benefit between learning healthcare systems and research environments • Trustworthy approaches to supporting regulatory, ethics and data provenance policy within and across informatics systems
The ambiguity of “translation” Lost in Translation, 2003 Poetry is what gets lost in translation.Robert Frost
Translational data sharing: intersection of policy, technology and politics • Requirements to increase data sharing (funding agency, publications, translational research designs, community pressure) • Limited broad or adopted data persistence, auditing or provenance measures • Poorly stewarded research results can significantly affect community trust1,2 • Privacy breaches have significant professional, personal and punitive risks • Data does not (usually) share itself
Addressing (presently) low expectations • Datas shameful neglect (Nature 2009) • Empty archives (Nature 2009) • Desperately seeking cures (Newsweek 2010) • Rare sharing of data led to results on Alzheimers (NYT Aug 12, 2010) • Dumped on by Data: Scientists Say a Deluge is Drowning Research (Chronicle of Higher Ed 2011) • Troves of Personal Data, Forbidden to Researchers (NYT, 2012) • Retraction record rocks community, Nature 2012
Building digital data infrastructures • Biology/medicine trails other domains in collaborative research, but has higher expectations • “A wealth of information creates a poverty of attention and a need to allocate that attention efficiently among the overabundance of information sources that might consume it” • Herbert Simon, 1971 • “It must be so nice to be in biomedicine where all the data is so homogenous” • Personal comment by the Chair of a major Computer Science and Engineering department
(Over) availability of biological databases • 1380 databases online as of 20121 • 92 new databases/100 new papers on databases since 2011 • Genbank - 143,081,765,233 base pairs from 156,424,033 sequences2 • 1NAR online molecular Biology Database Collection Annual Survey, Galperin 2012 • 2NCBI August 15 2012
Biomedical data resources are often own worst enemies of collaboration • Most data warehouses generated from emphasis on reductionism: • Study and distill knowledge “up” • Reassemble data as necessary • Effective for foundational blocks, less so in the support of systems thinking, data collaborations • Do not necessarily lend themselves to systems modeling • Silos can’t entirely be avoided – in science, domain knowledge often embedded in design and use
Data sharing and collaboration policy approaches evolving in opposite directions • Greater de-identification/ security • + Large scale efforts • - Limited understanding of secondary consequences • - Reduction of specificity/utility • Enhanced interaction with patients • - More constrained population sizes • + Patient managed consents • +/- Participatory governance
Clinical integrated data warehouses for research • “A clinical data repository that utilizes a non-transactional structure optimized for research purposes rather than clinical care”1 • Wide range of maturities, standards, and approaches to access 1,Mackenzie, Anderson et al, and Perspectives on Building Integrated Data Repositories, JAMIA, 2011 N=35
Cost of computing cheaper,Price of knowledge more expensive • Expect unexpected obstacles in deriving knowledge • Availability is sometimes not what it seems • Quantity does not necessarily mean quality
Reflecting on the complexity Masys, IOM 2008
Emerging (meta) trends • Persistence, transparency and reproducibility • Systems thinking • Implementation Science • Fault tolerant systems • Linkage to compute, storage resources and eScience • Moving from deidentification to trust • Patients as data generators and participants
“The design of systems determines the kinds of politics that can take place in them, and designing a system is itself a political act” Mitch Kapor – Electronic Frontier Foundation www.eff.org
The Learning Health Care System “Progress in computational science, information technology (IT), and biomedical and health research methods have made it possible to foresee the emergence of a learning health system that enables both the seamless and efficient delivery of best care practices and the real-time generation and application of new knowledge.” Digital Infrastructure for the Learning Health System, IOM (2011)
Developing a learning healthcare research system • Capture, represent and manage high-throughput, multi-dimensional phenotypic data • Provide routes to safe and novel hypothesis discovery • Support rapid clinical study design and execution • Support multi-scale computation and analytics • Ensure that multiple complementary socio-cultural frameworks and human factors are available • Bridge public research and private industry
The 4 P’s of Precision or Personalized Medicine Predictive Capabilities through usable analytics Preventive Population profiles for designing care And Personalized Design and delivery of therapies Participatory Health through actively involved patients and reported outcomes Leads to And
Enabling P4 Medicine • Increase access to patient provided samples, surveys, outcomes • Automate annotation from the clinical record, public health data, family history • Lower barriers to analytics across these data, genome data, test data, epidemiology data • Ensure privacy and effective relationships for patients, families, communities • Expedite delivery of knowledge to health care, outcomes to patients (repeat)
Big Data: the capture and management of multi-dimensional • Vaguely: “Data sets whose size is beyond the ability of commonly used tools to process it within tolerable time.” (Wikipedia) • Recognizing and supporting the three “V’s” of Big Data • Volume – that large dimensionality/size data increasingly will need to stay “in place” • Velocity – That data is rarely at rest, and can accumulate faster than can be analyzed or used • Variability – That expecting change is practical – through supporting both loose and hard standards of use
Integrating Genomic Data into Health Records • Maintain separation of primary molecular observations from the clinical interpretations of those data • Support lossless data compression from primary molecular observations to clinically manageable subsets • Maintain linkage of molecular observations to the laboratory methods used to generate them • Support compact representation of clinically actionable subsets for optimal performance • Simultaneously support human-viewable formats and machine-readable formats in order to facilitate implementation of decision support rules • Anticipate fundamental changes in the understanding of human molecular variation • Support both individual clinical care and discovery science Masys, et al, Technical desiderata for the integration of genomic data into Electronic Health Records,JAMIA 2011
It’s a just a framework Dilbert Sept 5, 2012
Integration is beginning to see results • Cracking your Genetic Code – Nova 3/2012 • Attempted repeat of an earlier successful “n of 1” therapy – results inconclusive • SAGE bionetworks open source computational biology public/private institution, with “portable consent” (sagebase.org) • Advocacy driven science (Genetic Alliance rare disease network –geneticalliance.org) • DTC consumer participation (23andme.com/research)
Sergey Brin’s search for a Parkinsons Cure Adapted from Wired 2010, Borrowed from Phil Payne
The Future: Open and Share "Somebody has to share. If we all hold on it, we all lose it." George Church, Harvard Personal Genome Project, May 12, 2009, Chicago Public Lecture
Increasing engagement of patients (and the healthy): From Patient-Centric to Participant Centric • Patient-centric • Supporting the individual as a participant in their own health management • Patient health portals • Data controlled with health organization • Participant-centric • Supporting the individual as a participant in their own health management through involvement/partnership in research • Study volunteer registries • Direct to Consumer genetic testing • Citizen science • Data controlled by individual, companies, collaboratively Anderson, et al Participant-Centric Initiatives: Tools to Facilitate Engagement In Research, Applied and Translational Genomics 2012
Pragmatic systems development approaches • Nonstandard lessons about standards. • Do not place a higher importance on the use of standards than on providing functionality that is directed to researchers’ immediate needs. • Make adoption of standards easier rather than harder • Focus efforts designed to facilitate data sharing on those scientific problems in which no single lab or institution can solve the problem without the collaboration and data provided by other labs and organizations • Scope and complexity • One framework will probably not rule them • Nimble software development teams solve one challenge at a time. • Expect use and impact in months, not years • Chart path to widespread uptake and use • Identify realistic incentives to use – Financial is not sustainable • Collaborate with organizations that demonstrate their willingness to invest their own assets in implementation of the software. Masys,etal, Designing a Public Square for Research Computing, Science Translational Medicine, 2012
Implementation is adoption • Phases of e-Science1 • Generation 1: Isolated Adoption Characterized by researchers using tools within their particular problem area, with some reuse of tools, data and methods within the discipline. • Generation 2: Investing in re-use Facilitated reuse of the increasing pool of tools, data and methods across areas/ disciplines. “Research Objects” created • Generation 3: Radical Sharing Characterized by global reuse of tools, data and methods across any discipline, and surfacing the right levels of complexity for the researcher. “Long tail” research enablement.2 1De Roure, Machines, Methods and Music: On the evolution of e-Research, IEEE Xplore, 2011 2Howe, et al, Database as Service for long-tail science, SSDBM 2011
Transformed Research Practices • Develop review, oversight, and accountability processes (long term investments) • Cultivate ethical and effective research collaborations (in and out of the research institutions) • Configure processes for transparency and traceability (set up communication channels) • Take seriously the need to aim research toward community benefit (and think creatively about benefits) • Consider interdisciplinary team configurations to meet short-term needs, or other processes that would build capacity (consider benefits from participant perspective) • Caulfield et al. 2008; Goering et al. 2008
Acknowledgments • Collaborators • Kelly Edwards, PhD • Peter Tarczy-Hornoch, MD • Jim Brinkley, MD,PhD • Dan Masys, PhD • Isaac Kohane, MD, PhD • Peggy Porter, MD • Ed Lazoska, PhD • Phil Payne, PhD • Bill Howe, PhD • William Reiter, MD, • Jesse Tennenbaum, PhD • Jerry Jarvik, MD • Funding • NIH NCATS UL1TR000423 • NIH DHHS 90HT0047/01 • University of Washington eScience Institute • Washington Life Science Discovery Fund
Thank you for your time and attention! • nicka@uw.edu • @nick_r_anderson • http://escience.washington.edu • http://www.iths.org • http://www.mebi.washington.edu