1 / 29

Demystifying Data & Scholarly Communication

Demystifying Data & Scholarly Communication. Presented by Aaron Collie and Hailey Mooney In collaboration with Lucas Mak Tuesday, April 26, 2011. Science Paradigms. Thousand years ago: science was empirical describing natural phenomena Last few hundred years: theoretical branch

hea
Download Presentation

Demystifying Data & Scholarly Communication

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Demystifying Data & Scholarly Communication Presented by Aaron Collie and Hailey Mooney In collaboration with Lucas Mak Tuesday, April 26, 2011

  2. Science Paradigms Thousand years ago: science was empirical describing natural phenomena Last few hundred years: theoretical branch using models, generalizations Last few decades: a computational branch simulating complex phenomena Today:data exploration (eScience) unify theory, experiment, and simulation Data captured by instrumentsOr generated by simulator Processed by software Information/Knowledge stored in computer Scientist analyzes database / filesusing data management and statistics Slide credit: Gray, J. & Szalay, A. (11 January 2007). eScience Talk at NRC-CSTB meeting. http://research.microsoft.com/en-us/um/people/gray/talks/NRC-CSTB_eScience.ppt

  3. Scholarly communication lifecycle model from Western Libraries: http://www.lib.uwo.ca/scholarship/scholarlycommunication.html

  4. What is/are data? Definition Examples

  5. Research Data • “A reinterpretable representation of information in a formalized manner suitable for communication, interpretation, or processing. Examples of data include a sequence of bits, a table of numbers, the characters on a page, the recording of sounds made by a person speaking, or a moon rock specimen.”Consultative Committee for Space Data Systems. 2002. Reference Model for an Open Archival Information System (OAIS). Washington, DC: National Aeronautics and Space Administration, p. 1-9. Available at http://public.ccsds.org/publications/archive/650x0b1.pdf • “[I]nformationused in scientific, engineering, andmedical research as inputs to generate research conclusions. This usage encompasses a wide variety of information. It includes textual information, numeric information, instrumental readouts, equations, statistics, images (whether fixed or moving), diagrams, and audio recordings. It includes raw data, processed data, published data, and archived data. It includes the data generated by experiments, by models and simulations, and by observations of natural phenomena at specific times and locations. It includes data gathered specifically for research as well as information gathered for other purposes that is then used in research.”National Academy of Sciences (U.S.), National Academy of Engineering, , and , Institute of Medicine (U.S.). (2009). Ensuring the Integrity, Accessibility, and Stewardship of Research Data in the Digital Age. Washington, D.C: National Academies Press.

  6. NSF Data Collection Categories • Research data collections • Products of one or more focused research projects • Novel data types • Small user community • May not conform to standards (file formats, metadata) • Often no intention for preservations • Small budgets • Resource or community data collections • Serve a single science community • Often establish community level standards • Intermediate budgets • Unclear if sustained support for preservation will be maintained • Reference data collections • Serve large segments of the scientific community • Broad scope and diverse set of user communities • Conforms to or creates robust universal standards • Large budgets • Long-term support for preservation National Science Board (U.S.) & National Science Foundation (U.S.). (2005). Long-lived digital data collections enabling research and education in the 21st century. Washington, D.C.: National Science Foundation. http://www.nsf.gov/pubs/2005/nsb0540/

  7. Expanding Roles

  8. You ARE managing your data... RIGHT? • Good Science • “Sponsors of university research, federal and state oversight agencies, or journals and other colleagues in the field may need or be legally entitled to review primary research data well after publication or dissemination of results.” http://rio.msu.edu/research_data.htm • “Research Data Management” • Same tune, new requirements.

  9. Mandating Data Management

  10. NSF DMP Requirements • The types of data, samples, physical collections, software, curriculum materials, and other materials to be produced in the course of the project; • The standards to be used for data and metadata format and content (where existing standards are absent or deemed inadequate, this should be documented along with any proposed solutions or remedies); • Policies for access and sharing including provisions for appropriate protection of privacy, confidentiality, security, intellectual property, or other rights or requirements; • Policies and provisions for re-use, re-distribution, and the production of derivatives; and • Plans for archiving data, samples, and other research products, and for preservation of access to them. • More Info: http://www.nsf.gov/pubs/policydocs/pappguide/nsf11001/gpg_2.jsp#dmp

  11. Why should researchers care? • Secure funding specifically for research data management • Improve and standardize data management practice and policy in your lab • Improve the impact and visibility of your research • Facilitate collaboration, increase research efficiency, and make new discoveries • Assure a greater return on investment by adapting a value chain model (e.g. one analogy would be the value-added by journal publishers)

  12. What makes data management possible? • Policy • Carrots: Data Citations, Data Papers • Sticks: Grant Funding • Standards… a whole lot of them • Infrastructure • Short-term storage • Long-term storage • Humans • Use Cases for expanded roles for Librarians (aka Services!) • “Reference Specialists” → Reference, Evaluating, and Promoting • “Subject Specialists” → Subject Expertise, Liaising, and Awareness • “Technology Specialists” → Training, Advice, and Support

  13. Policies and Standards for MSU • Research Data Guidelines (VPRGS) • http://www.vprgs.msu.edu/dataguidelines • Data management for research: Preparing for new requirements (VPRGS) • http://www.vprgs.msu.edu/node/1439 • Research Data: Management, Control, and Access (RIO) • http://rio.msu.edu/research_data.htm • http://www.lib.msu.edu/about/diginfo/ldmp.jsp

  14. Infrastructure at MSU Adapted from: http://www.vprgs.msu.edu/files_vprgs/Data%20Management%20for%20Research.pdf

  15. Use Cases from MSU Libraries Ecology / E.S. Professor History Professor Present: P.I., Digital Curation Librarian, Metadata Librarian, Bibliographer, Web Services, A.D. for Digital Information Situation: Emeritus faculty exploring options for capstone project Asked: Enhancing web presence, upgrading flat database, converting extensive code book to metadata schema Advised: databases, metadata schema, metadata mapping, complex queries and relational data, file formats • Present: P.I., Digital Curation Librarian, Metadata Librarian, A.D. for Digital Information • Situation: Researcher requested help writing Data Management Plan • Collaborative project • Asked: Integrating disparate landscape limnology data • Advised: Retention, access/sharing, embargoes, metadata standards, disciplinary repositories, archival format

  16. Core Service Image From: http://www.admin.ox.ac.uk/rdm/

  17. OpenContext.org • Data Publisher (for archeology data)! • Forming an editorial board • Vetted data • Editorial process cleans data • Sends clean and bundled data off to the CDL for long-term preservation • Distributes the processes which support data management across the publication lifecycle

  18. The value chain • Registration, which allows claims of precedence for a scholarly finding. • Certification, which establishes the validity of a registered scholarly claim. • Awareness, which allows actors in the scholarly system to remain aware of new claims and findings. • Archiving, which preserves the scholarly record over time. • Rewarding, which rewards actors for their performance in the communication system based on metrics derived from that system. • Roosendaal and Geurts1997

  19. What roles do you see for libraries and librarians to support data management, sharing and publication?

  20. What have you heard from faculty about data management, sharing and publication? What are the norms for data management, sharing and publication in your disciplines?

  21. Data Sharing Cultures Research Information Network. (2008). To Share or not to share: Publication and quality assurance of research data outputs. Research Information Network, June 2008. as cited in Griffiths, A. (2009). The publication of research data: Researcher attitudes and behaviour. The International Journal of Digital Curation, 1(4). http://www.ijdc.net/index.php/ijdc/article/viewFile/101/76

  22. The American Journal of Human GeneticsGuide for Authors Distribution of Materials and Data An implicit term and condition of publishing in AJHG is that authors be willing to distribute any materials and protocols used in the published experiments to qualified researchers for their own use. Materials include but are not limited to cells, DNA, antibodies, reagents, organisms, and mouse strains, or if necessary the relevant ES cells. These materials must be made available with minimal restrictions and in a timely manner, but it is acceptable to request reasonable payment to cover the cost of provision and transport of materials. If there are restrictions to the availability of any materials, data, or information, these must be disclosed in the cover letter and the Material and Methods section of the manuscript at the time of submission. Nucleic acid and protein sequences, single-nucleotide polymorphisms (SNPs), copy number variants (CNVs), microarray data, and macromolecular structures determined by X-ray crystallography (along with structure factors) must be deposited in the appropriate public database and must be accessible without restriction from the date of publication. The URL of the databases used must be included in the Web Resources section of the manuscript. All entry names and/or accession numbers must be included in the Material and Methods section. Microarray data should be MIAME compliant (for guidelines see http://www.mged.org/Workgroups/MIAME/miame.html). Although AJHG does not require authors to deposit genotype data to a public database, we do encourage this practice. We do ask that authors include genotype data in their supplemental materials or that a website is provided at which readers would be able to gain access to such data. If such data presentations are not possible, we ask that AJHG authors accommodate legitimate requests for population-genetics data provided that there are no IRB restrictions. Newly described SNPs should be submitted to an appropriate database such as dbSNP (http://www.ncbi.nlm.nih.gov/SNP/) prior to submission of revised manuscripts. The identification numbers should be used to describe the SNPs in the manuscript. All copy number variants (CNVs) identified in control samples should be submitted to one of two public data archives, the Database of Genomic Variants Archive (DGVa; http://www.ebi.ac.uk/dgva/page.php?page=data_submission) or the Database of Genomic Structural Variation (dbVAR; http://www.ncbi.nlm.nih.gov/dbvar/content/submission/), prior to submission of revised manuscripts. The associated identification numbers should be used to describe the CNVs in the manuscript. Please provide a figure or table that summarizes the full results of your genome-wide scan. In addition to the information that must be deposited in public databases as detailed above, authors are encouraged to contribute additional information to the appropriate databases. Authors are also encouraged to deposit materials used in their studies to the appropriate repositories for distribution to researchers. http://download.cell.com/images/edimages/AJHG/AJHG_Information_for_Authors.pdf

  23. DemographyAuthor Instructions Authors of accepted manuscripts will be asked to preserve the data used in their analysis and to make the data available to others at reasonable cost from a date six months after the publication date for the paper and for a period of three years thereafter. Authors wishing to request an exemption from this requirement (e.g., because the analysis is based on a proprietary data set) should notify the editors at the time of manuscript submission or after receiving this notice; otherwise, authors will be assumed to accept the requirement. http://www.populationassociation.org/publications/demography/

  24. Data Sharing What are reasons for publishing and sharing research data? What are reasons for NOT publishing and sharing research data?

  25. What needs to change in order for data management, sharing and publication to become a common practice? Image courtesy of http://biomat2010-8.wikispaces.com, licensed under a Creative Commons Attribution Share-Alike 3.0 License.

More Related