240 likes | 388 Views
Getting Your Bits in a Row :. Considerations for Transportation Data Curation [& Curators]. Leighton Christiansen Iowa DOT Library September 25, 2013. Overview. What is Data Curation The February 22 OSTP Memo & You Librarians & Data Curation Data: Sets and Sizes Reusing data
E N D
Getting Your Bits in a Row : Considerations for Transportation Data Curation [& Curators] Leighton Christiansen Iowa DOT Library September 25, 2013
Overview • What is Data Curation • The February 22 OSTP Memo & You • Librarians & Data Curation • Data: Sets and Sizes • Reusing data • Fragility & Preventing Loss • Librarians & Data Curation: 2
What is Data Curation? • Data curation is the active and ongoing management of (research) data through its lifecycle of interest and usefulness to scholarship, science, and education. • Data curation enables data discovery and retrieval, maintains data quality, adds value, and provides for re-use over time through activities including authentication, archiving, management, preservation, and representation. http://www.lis.illinois.edu/academics/programs/specializations/data_curation
What is Digital Curation? From the Digital Curation Centre (UK) • Digital curation involves maintaining, preserving and adding value to digital research data throughout its lifecycle. • The active management of research data reduces threats to their long-term research value and mitigates the risk of digital obsolescence. Meanwhile, curated data in trusted digital repositories may be shared among the wider UK research community. • As well as reducing duplication of effort in research data creation, curation enhances the long-term value of existing data by making it available for further high quality research. http://www.dcc.ac.uk/digital-curation/what-digital-curation
Ok. Why Should I Care? Office of Science and Technology Policy memo February 22, 2013 “Increasing Access to the Results of Federally Funded Scientific Research” “…the direct results of federally funded scientific research are made available to and useful for the public, industry, and the scientific community. Such results include peer-reviewed publications and digital data.” http://www.whitehouse.gov/sites/default/files/microsites/ostp/ostp_public_access_memo_2013.pdf
More from the Memo Objectives for Public Access to Scientific Publications “…the results of unclassified research that are published in peer-reviewed publications directly arising from Federal funding should be stored for long-term preservation and publicly accessible to search, retrieve, and analyze in ways that maximize the impact and accountability of the Federal research investment.”
More from the Memo Objectives for Public Access to Scientific Data in Digital Formats • “…digitally formatted scientific data resulting from unclassified research supported wholly or in part by Federal funding should be stored and publicly accessible to search, retrieve, and analyze….the digital recorded factual material commonly accepted in the scientific community as necessary to validate research findings including data sets used to support scholarly publications…”
Requirements • b) Ensure that all extramural researchers receiving Federal grants and contracts for scientific research and intramural researchers develop data management plans, as appropriate, describing how they will provide for long-term preservation of, and access to, scientific data in digital formats resulting from federally funded research, • c) Allow the inclusion of appropriate costs for data management and access in proposals for Federal funding for scientific research; • e) Include mechanisms to ensure that intramural and extramural researchers comply with data management plans and policies; • f) Promote the deposit of data in publicly accessible databases, where appropriate and available; • i) In coordination with other agencies and the private sector, support training, education, and workforce development related to scientific data management, analysis, storage, preservation, and stewardship; and • j) Provide for the assessment of long-term needs for the preservation of scientific data in fields that the agency supports and outline options for developing and sustaining repositories for scientific data in digital formats, taking into account the efforts of public and private sector entities.
Librarians and Data Curation • Data Curation is: • the application of library and information science (LIS) and archival theory and tools to maximize the current and future usefulness of research data • value-added services for collection development, representation and linking, discovery, access and re-use • a theoretical perspective that partners LIS and IT professionals with scientists and scholars in their research process Allen Renear, 2009
Librarians and Data Curation • Curation is needed due to: • The creation of huge digital data sets; • The desire to use and reuse digital data sets in future research; • Government mandate for federally funded projects to manage data for reuse and to avoid duplication of basic research; • Fragility of digital data.
The 100-Car Naturalistic Driving Study • Reported in 2006 • Approximately 2,000,000 vehicle miles • Almost 43,000 hours of data • 241 primary and secondary drivers participated • 12 to 13 month data collection period for each vehicle • Five channels of video and many vehicle state and kinematic variables • Vicki L. Neale, Thomas A. Dingus, Sheila G. Klauer, & Jeremy Sudweeks--Virginia Tech Transportation Institute • Michael Goodman -- National Highway Traffic Safety Administration http://www.nhtsa.gov/DOT/NHTSA/NRD/Multimedia/PDFs/Crash%20Avoidance/Driver%20Distraction/100Car_ESV05summary.pdf
Data Size: Video • 1 GB = 1,000,000,000 bytes • Uncompressed High Definition Video = 7 Gigabytes each Minute • 1 hour = 420 Gigabytes • 100 hours = 42000 Gigabytes = 42 Terabytes or 42,000,000,000,000 bytes (approx) • 43,000 hours = 18,060,000 Gigabytes = 18,060 Terabytes = 18 Petabytes (approx)
Why reuse data? • Information captured, but not studied: New Knowledge • New tools • Avoid duplication and wasting resources
Fragile Digital Data • Software Obsolescence • Versioning • Proprietary • Program discontinuation • Operating system extinction • Buyouts and Bankruptcies • Hardware Obsolescence • Format Changes • Carrier Degradation
Preventing Data Loss • Hardware and Software Preservation • Software Emulation • File Migration • Scheduled • New OS • New Format • New Carrier • Archival Formats • http://www.digitalpreservation.gov/formats/ • Disaster Planning
Data curation: What’s new for librarians? • engaging with scientists and scholars during research production cycles; • supporting data handling and management • facilitate preservation • facilitating data deposition • local or disciplinary repositories, larger federations • new collaborations with various offices • Campus IT, Research officers, Archives Allen Renear 2009
Data Curators Activities • enable data discovery and retrieval • maintain data quality • add value • provide for re-use over time • archiving • preservation Tasks • appraisal and selection • representation • authentication • data integrity • maintaining links • format conversions Allen Renear, 2009
Where do we find these librarians/data curators? • Indiana Univ. • Pratt Institute • Syracuse Univ. • UCLA • Univ. of Illinois at Urbana-Champaign • Univ. of Michigan • Univ. of North Carolina, Chapel Hill • Univ. of Arizona • Univ. of North Texas • Univ. of Tennessee • Univ. of Texas at Austin • Univ. of Toronto • Univ. of Wisconsin at Madison
What can we do? • Read NCHRP 754 • Educate Ourselves on Data Curation • Contact Library Schools • Contact Data Centers and practicing curators • Talk to our Researchers • Prepare for TRB 2014 • Prepare to DCC 2014 • Prepare for TRB 2015
What is Data Curation? • Data curation is the active and ongoing management of (research) data through its lifecycle of interest and usefulness to scholarship, science, and education. • Data curation enables data discovery and retrieval, maintains data quality, adds value, and provides for re-use over time through activities including authentication, archiving, management, preservation, and representation. http://www.lis.illinois.edu/academics/programs/specializations/data_curation
Questions? Thank you
Resources 1 Cambridge Systematics, Inc. Improving Information Management in a Transportation Agency. National Cooperative Highway Research Program. Washington, DC: TRB, September 19, 2013. http://www.trb.org/Publications/Blurbs/169522.aspx. Cragin, Melissa. “Growing the Curation Community in LIS: Data Curation Education Program & the Data Conservancy.” Purdue University, June 21, 2010. http://dataconservancy.org/wp-content/uploads/2012/02/Cragin_IATUL2010Conf.pdf. Creamer, Andrew T., and Myrna E. Morales. “A Sample of Research Data Curation and Management Courses.” Journal of eScience Librarianship, October 3, 2012. http://escholarship.umassmed.edu/jeslib/vol1/iss2/4/. Gold, Anna. “Cyberinfrastructure, Data, and Libraries, Part 1: A Cyberinfrastructure Primer for Librarians.” D-Lib Magazine, October 2007. http://www.dlib.org/dlib/september07/gold/09gold-pt1.html. ———. “Cyberinfrastructure, Data, and Libraries, Part 2: Libraries and the Data Challenge: Roles and Actions for Libraries.” D-Lib Magazine, October 2007. http://www.dlib.org/dlib/september07/gold/09gold-pt2.html. Holdren, John P. “Increasing Access to the Results of Federally Funded Scientific Research,” February 22, 2013. http://www.whitehouse.gov/sites/default/files/microsites/ostp/ostp_public_access_memo_2013.pdf. Howard, Jennifer. “Publishers Propose Public-Private Partnership to Support Access to Research.” The Chronicle of Higher Education, June 4, 2013. http://chronicle.com/blogs/wiredcampus/publishers-propose-public-private-partnership-to-support-access-to-research/44005. ———. “White House Delivers New Open-Access Policy That Has Activists Cheering - Research - The Chronicle of Higher Education.” The Chronicle of Higher Education, February 22, 2013. http://chronicle.com/article/White-House-Delivers-New/137549/. Keralis, Spencer D. C. “Data Curation Education: A Snapshot — Council on Library and Information Resources.” Accessed September 25, 2013. http://www.clir.org/pubs/reports/pub154/education.
Resources 2 Klauer, Charlie. “100-Car Naturalistic Driving Study.” Virginia Tech Transportation Institute. Accessed September 25, 2013. http://www.vtti.vt.edu/research/vrus/projects/100car/100car.html. Library of Congress. “Sustainability of Digital Formats: Planning for Library of Congress Collections,” July 24, 2013. http://www.digitalpreservation.gov/formats/. Lord, Philip, Alison Macdonald, Liz Lyon, and David Giaretta. “From Data Deluge to Data Curation.” University of Edinburgh UK e-Science All Hands Meeting 2004: Proceedings (2004): 5. http://www.allhands.org.uk/2004/proceedings/papers/150.pdf “100-Car Study, Final Approved, V03.doc - 100-car-naturalistic-study.pdf.” Accessed September 25, 2013. http://www.fmcsa.dot.gov/facts-research/research-technology/report/100-car-naturalistic-study/100-car-naturalistic-study.pdf. Neale, Vicki L., and et. al. 100-car Overview, 2005. http://www.nhtsa.gov/DOT/NHTSA/NRD/Multimedia/PDFs/Crash%20Avoidance/Driver%20Distraction/100Car_ESV05summary.pdf. NHTSA. “100-Car Naturalistic Driving Study.” National Highway Traffic Safety Administration, April 21, 2006. http://www.nhtsa.gov/About+NHTSA/Press+Releases/2006/100-Car+Naturalistic+Driving+Study. Schmelzer, Ranit. “SPARC Applauds White House for Landmark Directive Opening up Access to Scientific Research.” SPARC, February 22, 2013. http://www.sparc.arl.org/news/sparc-applauds-white-house-landmark-directive-opening-access-scientific-research Silk, Kimberly. “Data 101: A Gentle Introduction,” August 14, 2013. http://www.slideshare.net/ksilk/sla-25253256 “Specialization in Data Curation.” Graduate School of Library and Information Science. Accessed September 23, 2013. http://www.lis.illinois.edu/academics/programs/specializations/data_curation. “What Is Digital Curation? | Digital Curation Centre.” Accessed September 25, 2013. http://www.dcc.ac.uk/digital-curation/what-digital-curation.