190 likes | 289 Views
Genomics:GTL Information and Data Sharing Policy. Susan K. Gregurick U.S. Department of Energy Office of Science Office of Biological and Environmental Research BERAC Briefing May 19 th , 2008 Susan.Gregurick @ science.doe.gov Genomicsgtl.energy.gov.
E N D
Genomics:GTL Information and Data Sharing Policy Susan K. Gregurick U.S. Department of Energy Office of Science Office of Biological and Environmental Research BERAC Briefing May 19th, 2008 Susan.Gregurick@science.doe.gov Genomicsgtl.energy.gov
“As long as the attention to data policies and data management by funding agencies does not catch up with the rapidly changing research environment, we will continue the systemic loss and underutilization of valuable data derived from public investments.” (excerpted from Uhlir and Schröder, Data Science Journal Vol. 6, 2007).
UPSIDE: Uniform Principle for Sharing Integral Data and Materials Expeditiously • Community standards for sharing publication related data and materials state that it is an author’s obligation to not only release the data and materials but to provide these in a format that allows other scientists to build on these for future research. Sharing publication related data and materials: Responsibilities of authorship in the life sciences. National Research Council, 2003.
Principles of data sharing: A Community led approach • General: • Information and data arising from public research investment should be publicly available • Scientific: • Data and Information sharing is essential for the highly focussed Genomics: GTL research program • New technologies mean increasingly larger amounts of research data • Ownership of data generated through GTL-sponsored research lies with researchers and institutions, but needs to be shared across the program • Our role is to provide guidance and mechanisms to facilitate and support data and information sharing within the GTL program
A checklist for developing the GTL data policy. • Identify science driver(s) necessitating a formal policy • Create a working group to bring the policy to fruition • Poll GTL researcher with respect to data policy needs and developments • Research current policies and data sharing opinions/practices from literature • Draft a strawman document and define key aspects of the policy: • scope of policy (data types covered), applicability (which data falls under the policy), rules of data sharing, compliance to standards, submission to appropriate repositories • Expectation for compliance • consequences of non-compliance • Subject strawman draft policy to internal and then external round(s) of consultation followed by iterative improvements • Post final draft onto public website and publicized at GTL Awardees Meeting • Set into motion support for policy • Could include: creation/extension of data centers, physical archives, facilities, institutions, ring-fenced funds for competitive award programs, education, outreach • Monitor compliance and enforce policy • Extend policy to cover sub-areas of science/data as required • Revise policy and any implementation as required
GTL Data and Information Sharing Policy • The Office of Biological and Environmental Research (OBER) will require that all publishable information resulting from GTL funded research must conform to community recognized standard formats when they exist, be clearly attributable, and be deposited within a community recognized public database(s) appropriate for the research conducted. Furthermore, all experimental data obtained as a result of GTL funded research must be kept in an archive maintained by the Principal Investigator (PI) for the duration of the funded project. Any publications resulting from the use of shared experimental data must accurately acknowledge the original source or provider of the attributable data. The publication of information resulting from GTL funded research must be consistent with the Intellectual Property provisions of the contract under which the publishable information was produced.
Details of Data and Information Sharing Policy Effective October 1, 2008 All investigators are expected to submit their publication related information to a national or international public repository, when one exists, according to the repository’s established standards for content and timeliness but no later than 3 months after publication. This includes: • Experimental protocols, • Raw and/or processed data, as required by the repository, • Other relevant supporting materials.
Protection of Intellectual Property • For cases where information sharing standards or databases do not yet exist, the information sharing and data archiving plan provided by a project’s PI must state these limitations. • Data and information that are necessary elements of protected intellectual property and related to a pending or future patent application are explicitly exempt from public access until completion of the patenting process.
Nationally and Internationally-Accepted Databases and Ontologies • Sequence Data and Information: • Deposit and report accession number • Genbank/EMBL, UniProtkb/Swiss-Prot Protein Knowledge database • Three Dimensional Structures: • Deposit and record accession code • PDB, NAD
Microarray and Gene Expression Data • (MGED) Society: focuses on establishing standards for microarray and other functional genomicsdata, including data quality, management, annotation and exchange. • MIAME describes the Minimum Information About a Microarray Experiment that is needed to enable the interpretation of the results of the experiment unambiguously. • A number of high impact journals requiring MIAME compliant data as a condition for publishing microarray based papers (Nature, Science, PLoS,…) • GTL Microarray and Gene Expression Data (recommended): • Deposit in MIAME-compliant format • Gene Expression Omnibus, ArrayExpress, Stanford Microarray Database
proteomics and molecular interaction experiments • Proteomics Standards Initiative (PSI), a working group of the Human Proteome Organization (HUPO): defines community standards for proteomics data to facilitate data comparison, exchange and verification. • minimum information about a proteomics experiment (MIAPE) and minimum information required for reporting a molecular interaction experiment (MIMIx ) • A number of databases now accept PSI Molecular Interaction standards (BIND, DIP, HPRD, Hybrigenics, IntAct, MINT, and MIPS) • GTL ProteomicsData (recommended): • Deposit in MIAPE and MIMIx compliant format • Open Proteomic Database (OPD, PRIDE) and PEDRo (Proteome Experimental Data Repository )
Information Sharing Systems and Databases Under Development • Other Technologies (recommended): • In cases where there are no public repositories or community driven standard ontologies, data and information should be made publicly available by the PI
Protection of Human Subjects • Research using human subjects provides important scientific benefits but these benefits never outweigh the need to protect individual rights and interests. OBER will require that grantees and contractors follow the DOE principles and regulations for the protection of human subjects involved in DOE research. Minimally this will require an IRB review. These principles are stated clearly in the Policy and Order documents: DOE P 443.1A and DOE O 443.1A, which are available online at www.directives.doe.gov.
Computational Software • The International Society for Computational Biology (ISCB) recommends that funding agencies follow ISCB guidelines for open-source software at a “Level 0” availability. • ISCB states that research software will be made available free of charge, in binary form, on an “as is” basis for non-commercial use and without providing software users the right to redistribute. • OBER will follow ISCB recommendations at a Level 0 availability. • Research software (binary) is to be made accessible through either an open source license (www.opensource.org) or deposited to an open source software community such as SourceForge.
Laboratory Information Management Systems (LIMS) for Data Management and Archiving • Research projects that involve more than one senior investigator will be required to implement a LIMS or a similar type of system for data and information archiving and retrieval across the entire project. • The LIMS plan should balance the clear value of data availability and sharing within the project against the cost and effort of archive construction and maintenance.
Summary • Data and information should conform to existing community recognized standard formats wherever possible, to be clearly attributable, and to be deposited, in a timely manner, within a community recognized public database(s) appropriate for the research conducted. • OBER is committed to encouraging development of public repositories and standard ontologies for the GTL research community. • OBER recognizes that this policy necessarily will be updated to incorporate new standards, data types, and other advances. • This information and data-sharing policy and related materials can be found at genomicsgtl.energy.gov/datasharing.
GTL Knowledgebase Workshop DOE-OBER workshop GTL Knowledgebase for Systems Biology Washington DC, May 28-30, 2008 Workshop Purpose • Identify research needs and opportunities for a Systems Biology Knowledgebase to capitalize on GTL research investments. • Provide an assessment of where the science and technology now stands and where barriers to progress might exist. • Describe the directions for fundamental research that can be pursued to meet these goals: • Data and information acquisition and curation • Organization of information driven by scientific inquiry • Infrastructure and Technology
Data and Information Sharing Policy Working Group • Chair: J. Fredrickson • Members: • A. P. Arkin K. Andrews-Cramer • E. Uberbacher H. Berman • D. Platt N. Baliga • S. Kravitz B. Davison • S. Salzberg G. Anderson • J. Stanford T. Critchlow • P. Karp JBEI, GLBRC & BESC • D. Schmoyer