350 likes | 470 Views
BioKnowledge Expert Community. A Proposal to Expand Disease Network Reconstruction Efforts to Support Novel Target and Biomarker Discovery and Development. The scale of data generation in life science and biomedical research is beyond the reach of most researchers.
E N D
BioKnowledge Expert Community A Proposal to Expand Disease Network Reconstruction Efforts to Support Novel Target and Biomarker Discovery and Development
The scale of data generation in life science and biomedical research is beyond the reach of most researchers
Massive amounts of data are available to integrate and build models capable of predicting disease and drug response • Biology is an information-driven science • Large-scale genomics, physiology, population genetic, and imaging sets are now generated on massive scales: • > 100,000 publicly available DNA profiles and clinical traits available from genome-wide association scans • Imaging data available on 100,000s and growing rapidly • > 300,000 publicly available gene expression profiles spanning many disease areas; many of these sets are coherent, high quality • > 10,000,000 EMRs available for mining • Metabolite and proteomic data being more routinely generated • Protein interaction, DNA-protein binding, histone modification, and other molecular data being generated at unprecedented rates • Sequencing technologies on the verge of exploding, with possibility of generating > 1TB of data per patient! • The scale of data being generated is leading to unprecedented rate of discovery: • Hundreds of genes now identified in humans that are causal disease • Networks demonstrated to predict disease and drug response • EMRs leveraged to flag adverse events in response to drug treatment
Embrace data, complexity, and uncertainty or continue to fail • Those who master the information will develop a superior understanding of the biology of disease and drug response • Those who fail to adapt will ultimately go extinct: High rate of failure in phase 2 trials Reason for failures Other (14%) Safety (16%) Inefficacy (70%) 17% • Bottom line is that we do not understand the biology sufficiently enough to design effective treatments • Even drugs that make it out of phase 3 can present with significant problems once the drug is on the market
The BEC will address the failure of research models in pharma and academia • The pharma research model is struggling to deliver • Poor phase 2 success rate despite massive spending • Marketed drugs are often found to cause significant harm or achieve only marginal efficacy • Complex dynamic involving market forces make it difficult to push to fully understand the biology of disease • The academic research models is struggling to impact human health • Lack of infrastructure to translate discoveries • No coherent focus • Reward system that promotes building out in silos • The BEC aims to take the best of both models and address the deficiencies of each • Primary focus on integrating and mining data to achieve better understanding of disease • Structure that rewards working together as a team • Develops the appropriate infrastructure and functional groups to master information
Common forms of human disease are the result of complex interactions between molecular networks operating within and between tissues Heart Disease Cancer Diabetes Obesity Physiological Circuits Molecular Circuits
Classic Drug Discovery • Find gene that impacts interesting phenotype: HMGCR • Design small molecule to inhibit gene activity to favorably impact first order phenotype (e.g., LDL) and subsequently second order phenotype HMGCR Statin
Disease and drug response are complex • Networks drive common forms of human disease and drugs hit these networks by hitting intended and unintended targets that move multiple networks with and between tissues Heart Disease Cancer Diabetes Obesity
Only if we master the available information in building models that predict disease and drug response will we increase POS of drug development
Expanding Efforts to Master Biological Information via the Formation of the Bioknowledge Expert Colony (BEC) • Mission: Master biological information via the development of an open source bioknowledge platform to deliver models that predict complex system behavior, and that as a result, delivers value to Merck in the form of novel targets and biomarkers specific to a given subtype of disease • Two Dimensional Strategy: • Dimension one: Develop an open source platform to construct the most predictive networks • Best bioknowledge assembled and integrated to build most predictive networks • Open source platform crucial to win scientifically (hub and spoke model) • Viewing the platform as a BEC is important (starting with a group with similar vision/interests) • BEC initially aimed at enabling a few to translate to the many in need • BEC provides the coherence and resources needed to enable the visionary doers • Platform value achieved by being open source, attracting all collaborators, attracting the best minds, publishing, creating scientific excellence, and all of this driven by a coherent team able to tie (aggregate) data together in ways that enable pharma mission • Dimension two: Deliver value to Merck via application of the platform • Design “Applications” for clients where BEC is the vendor for proprietary problems • Clients span all levels of health care: biotech, pharmaceutical, health care delivery, patients/consumers (e.g., Navigenics, 23andMe, etc.) • Knowledge experts from BEC enable clients (help colonize?) • Because BEC builds the most predictive networks, they can best inform on *all* problems where best to place bets
A bioknowledge expert colony Commercial Application of the Bioknowledge Platform Delivers Value to Merck Cube representing complete knowledge BEC begins with substantial knowledge component (public + Merck) Open Source Platform Promotes Knowledge Base Growth
Genetics strategy is to maintain openness regarding development of an information platform and then bring value to Merck via application of the platform to high interest Merck problems Merck Pipeline App Disease Biomarkers Response Biomarkers Disease Risk Models Rules on who gets what drug Targets Building Interfaces to the Precompetitive Space Partially enabled Partially enabled Fully enabled Fully enabled Fully enabled Collaborator 3 Collaborator 2 Collaborator 1 Difficult or impossible for Merck to own this
Motivation for the open source component of the BEC • Success of the integrative approaches developed and applied to date has been driven by a strong open component • Merck’s supported our ability to collaborate openly and extensively with external partners to generate and understand complex data sets • This openness has in no way jeopardized Merck’s competitive advantage; it has in fact enabled it • Why a bigger effort that formalizes the open source component? • Drug discovery is rapidly becoming an information game • Mastery of biological information will define one’s ability to effectively compete in this game • We have mastered some dimensions here (genetics, integrative genomics, etc.) • We have no expertise in other key dimensions required to win here (epigenomics) • The field is moving fast, with an unprecedented rate of biological discovery • Demands increased flexibility to link with who we want, when we want or risk becoming ineffective • Demands flexibility to bring in technologies that are critical (epigenomics to flesh out environmental contributions to disease) • To master the information significantly more effort is needed to • Coherent group of PIs driven by comment vision • Core expertise in key areas of genetics, genomics, and epigenomics • Build and maintain a more formal bioknowledge platform • Significantly increased FTE resources in statistical genetics, network reconstruction, database architecture, software engineering, and high-end PIs capable of applying platform for bio discovery • Computing resources ultimately need to grow to the size of GOOGLE scale computing
With such a strong open source component, how does the BEC deliver high value back to Merck • Genetics effort has succeeded largely due to open, collaborative efforts • Increasing scale of integration provides clearer picture of biology of disease • Demands large-scale clinical/EMR data • Demands greater flexibility and efficiency forming collaborations • Given rate of growth and short term pressures, difficult for Merck to know how much to invest in this precompetitive space • What Merck needs to make the most informed decisions regarding drug discovery and development: • Access to all available data • Models based on these data that actually predict disease and drug response • An understanding of the data, how to integrate and how to use models to drive decisions • Merck’s greatest needs will be enabled by an open source platform • Merck derives value by application of the platform to problems specific to Merck’s business • Impossible for others to effectively apply the platform in ways that would compete with Merck efforts • Application of the platform is harder than building it and requires extensive resources and scientists who understand how to apply it • Application requires integration and iteration on many levels, and that is where true value is derived
Partner-Specific IP Partner Iteration with platform miners Disease Biomarkers Response Biomarkers Disease Risk Models Rules on who gets what drug Targets Value to Partners comes from application of the platform to Merck-specific problems Bioknowledge Platform Application
4 areas are ripe for a highly integrated, network approach • The right types of data are required to make a go of this approach in a disease area: • Well characterized clinical information • Extensive, high throughput molecular profiling data • DNA variation data and lots of genotypes • Single gene and chemical perturbation data • Building a platform to inform all of biology will fail, so it initially must be focused • The structure of the platform will define the questions that can be asked of it • The structure must be driven by the problems that will be tackled • Several areas are ripe for this approach • Area considered ripe if it can deliver targets and biomarkers in first two years, and can deliver highly predictive disease and drug response models in next 5 years • These areas include: • Obesity • Diabetes • Cardiovascular Disease • Oncolocy
OBESITY • Between Merck and the external research community: • Greater than 100,000 genotyped individuals with obesity trait information readily available • Greater than 20 mouse crosses that inform on obesity, with multiple tissues profiled readily available • Greater than 10 human tissue-specific cohorts that inform on obesity have been assembled and profiled • We have identified and extensively validated networks in humans and mouse that are causal for this disease • Almost all data is readily available in the pubic domain • We could almost immediately deliver value in terms of novel targets and biomarkers for this disease • This area demands a systems level approach to succeed • Success unlikely to be achieved without this approach
DIABETES • Between Merck and the external research community: • Greater than 100,000 genotyped individuals with diabetes trait information readily available • Greater than 20 mouse crosses that inform on obesity, with multiple tissues profiled readily available • Greater than 10 human tissue-specific cohorts that inform on obesity have been assembled and profiled • We have identified and extensively validated networks in humans and mouse that are causal for this disease • Almost all data is readily available in the pubic domain • We could almost immediately deliver value in terms of novel targets and biomarkers for this disease
Entire networks associated with T2D and supported by publicly available GWAS T2D GWAS genes: Hhex, Cdkn2b, Slc30a8, Igf2bp2 (hyp-g enrichment pv=0.018) Turquiose GPR119, GPR44 GPR120, and other GPRs Insulin (r>0.7) and LDL (r>0.5).
Similar situation for cardiovascular disease as exists for obesity and diabetes • Integration of data leading to networks underlying cardiovascular traits have been identified and are being validated in humans (table below) • Models to predict cardiovascular disease will follow • All data to carry this out will exist in next two years In Press, Nature Genetics
Cardiovascular Disease • Between Merck and the external research community: • Greater than 100,000 genotyped individuals with diabetes trait information readily available • Greater than 20 mouse crosses that inform on obesity, with multiple tissues profiled readily available • Greater than 10 human tissue-specific cohorts that inform on obesity have been assembled and profiled • We have identified and extensively validated networks in humans and mouse that are causal for this disease • Almost all data is readily available in the pubic domain • We could almost immediately deliver value in terms of novel targets and biomarkers for this disease
Similar data sets now exist for oncology as we tease apart evolution of networks that drive disease risk and progression Largest AN (green) module is broken into multiple TU (red) modules Largest TU (red) module is formed from multiple AN (green) modules
Networks that drive disease are associated with tumor DNA variation Largest AN (green) module is broken into multiple TU (red) modules Largest TU (red) module is formed from multiple AN (green) modules …and many of these changes are associated with mRNA:CNV hotspots (by chromosome)
Establishing the BEC • Focusing a group and IT infrastructure to build the bioknowledge platform is a primary objective • The structure of the platform itself is driven by high-value applications of the platform to problems of high interest to Merck • New targets prioritized against all others (pharma/biotech) • Biomarkers to predict disease risk and drug response (pharma/biotech) • Models to predict disease risk, progression and AEs to drive healthcare decisions • Data models required to effectively mine large-scale data for pharmaco-vigilance/surveillance • To make problem tractable in early stages it will be necessary to focus on 3 areas: • Diabetes/obesity • Cardiovascular • Oncology • The platform will be tuned to inform most optimally these disease areas • Other areas will be represented and available for mining • Primary effort for these areas will involve: • Construction of networks predictive of traits associated with these diseases • Mining of those networks to get at targets, biomarkers and models predicting risk, etc.
Core Functional Groups Research Programs BEC Functional Groups and PIs ComputationalBiology Human Genetics Genomics (Epigenomics) Bio/medical Informatics ComputationalBiology Statistical Genetics DNA Seq. and Expr Profiling High-Performance Computing Vivarium Systems Biology
Strategy for BEC Structure • BEC driven by scientists, not administrators • Principal Investigators leading the institute • “smaller” number of core PIs that have “permanent” membership • Number of PIs small (e.g., on the order of 8) • Performance reviewed every 5 years • Build out small teams • Set strategy/vision for the BEC • Partial PIs having a more temporal membership • Best experts in the world (Altshuler, Mootha, etc.) • Secures significant access to experts you would not get otherwise • Higher turnover rate to keep flow of strong talent that can keep BEC energized • On the order of 10 to 15 carrying a 20 or 25% load • Functional group leads • Build groups to support functional areas of the institute • Developing overall approach and ability to do systems biology • Strong computation and informatics focus to integrate groups and to build the scale of database required to drive this type of effort • Less emphasis on novel technology development, but significant effort on novel applications of new technologies (e.g., we will not invent next generation sequencer, but we may develop methylation detection application superior to existing) • Leadership team • SAB, PIs
PI Role in the BEC • PIs will lead efforts to: • Design overall platform strategy and implement • Drive overall vision of the BEC • From strategic collaborations to enable the platform: • Leverage external expertise for network reconstruction methods • Leverage external expertise for carrying out key experiments to inform disease areas • Link with larger-scale efforts in the systems space (e.g., NIH 1000 genome project, etc.) when appropriate • Build highly interactive functional groups to carry out core computational and experimental research needed to build, evolve and apply platform • Develop necessary disease area expertise to optimize application of the platform in specific areas
Application of the Bioknowledge Platform • PIs again will lead efforts to apply platform in high-interest areas defined by Merck • Oversee partnerships and lead groups that will interface with the different Merck areas to provide services • Critically important that the development of the platform and its application are tightly coupled • The platform should evolve to maximize its ability to inform in the applications space • Provide novel target identification, validation and prioritization services in the 3 core disease areas • Deliverables to Merck include high-confidence, prioritized lists of targets for a given area, complete with all information supporting targets in the list • Targets in the list will have varying levels of support • Provide biomarker identification, validation, and prioritization services in the 3 core disease areas • Provide model building services in the core areas to predict risk of disease, risk of disease progression, and drug response • Move towards providing pharmaco surveillance services in focused areas • Base technology of the BEC will be the responsibility of the PIs with input from partners
Budget Assumptions • The analysis is Rigor 1 level and is not meant to be precise, but to give a feel for the incremental differences between current spend in Genetics vs. requirements to build an open source platform • To show full impact of incremental all changes are assumed to take place on January 1 of year 1. The year 1 impact may be less then what is shown and is dependent upon the ability to build the capabilities • P&B = $350 K per person for executive level, $150 K for scientific and technical level and $80 K for support level employees • Direct Costs = Range from 30% - 60% of P&B depending on function • The analysis assumes that all growth can be accommodated in existing facilities
FTE’s and Pretax Cash Requirements to Build and Maintain Platform
Incremental FTE’s and Pretax Cash Requirements to Build and Maintain Platform
Considerations for funding the open source component of the BEC • The bioknowledge platform, given it is open source, should necessarily be considered pre-competitive • Critical to becoming the standard, ensuring it evolves rapidly, becoming best of breed • The platform itself would be available to all, but application of the platform is so highly specializesd few would be able to leverage it as the BEC could leverage it • The BEC provides the critical interface between the platform and Merck to realize value; without this the platform is only marginally useful to Merck discovery efforts • Merck may not always need access to the full platform in all areas, but will want to turn the platform crank when it needs • Given all of this, perhaps it doesn’t make sense for Merck to fund on its own the open source component of the BEC • Given it is open source, Merck funds would go to enabling a resource for competitors • What Merck will care most about is maintaining and evolving expertise required to apply platform to drug development problems of interest
Alternative Models for the BEC (not wholly owned and contained within Merck) • Private non-profit component for open source piece, and then application piece maintained within Merck • Philanthropic donors, federal agencies, universities, etc. pay for the open source component • Merck pays for team to apply platform to problems specific to its business • Independent institute with strong connections to Merck and other partners • Open source component funded as indicated above • Partners pay for application • Private non-profit component for open source with a commercial arm for application • Open source component funded as indicated above • More general application strategy covering biotech/pharmaceutical companies, EMR companies, healthcare providers, patient/consumer