130 likes | 249 Views
Scientific Data: A View from the US. George O. Strawn nitrd.gov. Caveat auditor. The opinions expressed in this talk are those of the speaker, not the U.S. government. Three faces of data. Big Data research initiative Open access becomes the default for U.S. government data
E N D
Scientific Data:A View from the US • George O. Strawn • nitrd.gov
Caveat auditor • The opinions expressed in this talk are those of the speaker, not the U.S. government
Three faces of data • Big Data research initiative • Open access becomes the default for U.S. government data • Public access mandated for "scientific results" supported by the U.S. government
Big Data • White House, multi-agency research initiative • Basic research, Disciplinary data, Education and training, Prizes and competitions • Joint solicitation by NIH and NSF • NIH: BD2K program, Associate director for Data Science
Data.gov • Open access to U.S. government data • "Voluntary" data.gov participation has yielded ~100,000 data sets to date • A new version of data.gov to be unveiled soon utilizes CKAN, "an open source data portal"
Public Access to Scientific Results • Both journal articles and data • Public access to journal articles pioneered by Harold Varmus at NIH • Semantic access to Medline abstracts pioneered by Tom Rindflesch at NLM
Public Access to Scientific Data • Federal agencies have submitted their "initial plans" for public access to scientific data to OSTP • NITRD may host a series of talks by the agencies on their data access plans • Plans for articulating USG scientific data and USG-supported scientific results still in process
Some issues regarding data access • Disciplinary versus multi-disciplinary, agency versus multi-agency repositories • Plain (human) access versus semantic (machine) access • A general digital object architecture? • Degrees of openness
Digital Object Architecture • An "hour glass" for data? (As the Internet was an hour glass for networks: TCP/IP at the narrow point; many applications above, many implementations below)
Digital Object Data Model & Protocol Logical interface to heterogeneous information management and storage systems Built-in strong authentication and encryption Digital Object Repository Implements the digital object data model and protocol Portal into multiple info and storage systems Security is at the object level & objects can be securely shared Current version successfully used by industry and government Handle System Highly scalable identifier resolution system for digital objects Provides referential integrity as objects move and environments change Proven and in wide use Digital Object Registry Manages metadata records about resources Assigns handles to metadata records and resources Normalizes organizational boundaries through commonly agreed API’s and metadata models Digital Object Architecture
Measuring openness • Ease of discovery (googleable, etc) • Ease of use • Extent of reusability. • Legal matters (eg, CC license, derived works friendly, etc)
Sustainability • Could we duplicate the Internet story? • Public investments create a new activity • The new activity leads to a new industry • The new industry leads to novel use cases
In conclusion • Data Intensive Science aspirations are here • Data Intensive Science is slowly emerging • One result will be to make the scientific record into a first class scientific object