1 / 30

ICSTI/ITOC 15 October 2013 Larry Lannom

ICSTI/ITOC 15 October 2013 Larry Lannom Research Data Alliance Corporation for National Research Initiatives. RESEARCH DATA ALLIANCE. Corporation for National Research Initiatives. DAITF: Enabling Technologies 21 March 2012

Download Presentation

ICSTI/ITOC 15 October 2013 Larry Lannom

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. ICSTI/ITOC 15 October 2013 Larry Lannom Research Data AllianceCorporation for National Research Initiatives RESEARCH DATA ALLIANCE Corporation for National Research Initiatives

  2. DAITF: Enabling Technologies 21 March 2012 Larry LannomCorporation for National Research Initiativeshttp://www.cnri.reston.va.us/http://www.handle.net/

  3. Enabling Technologies ID ID ID ID ID ID 010001010 010011011 010101001 101010000 010001010 010011011 010101001 101010000 ID 010001010 010011011 010101001 101010000 ID ID ID ID ID Scientists, Data Curators, End Users, Applications Datasets

  4. Enabling Technologies ID ID ID ID ID 0100 0101.. 0100 0101.. 0100 0101.. 0100 0101.. 0100 0101.. ID ID ID ID ID ID ID ID ID ID ID ID ID Scientists, Data Curators, End Users, Applications Datasets Accessed via Repositories

  5. Enabling Technologies Enabling Technologies ID ID ID ID ID 0100 0101.. 0100 0101.. 0100 0101.. 0100 0101.. 0100 0101.. Discovery ID ID ID ID ID ID ID ID ID ID ID ID ID Scientists, Data Curators, End Users, Applications Datasets Accessed via Repositories

  6. Discovery & Evaluation • Search • Metadata registries • Subject • Parties • Dates • Etc • Crawlers – more ad hoc • Citation • Formats • Permissions • Can I see it? • Can I use it? • Trust

  7. Enabling Technologies Enabling Technologies ID ID ID ID ID 0100 0101.. 0100 0101.. 0100 0101.. 0100 0101.. 0100 0101.. Discovery ID ID ID ID ID ID ID ID Access ID ID ID ID ID Scientists, Data Curators, End Users, Applications Datasets Accessed via Repositories

  8. Access • ID / reference resolution • Go from ‘subject search’ to ‘known item’ search • Access Protocols • How to get it • Protocol registries • Bootstrapping into new protocols • Authentication & Authorization • Proof of identity (tradeoff: usability vs security) • Permissions: with the object or in some external system?

  9. Enabling Technologies Enabling Technologies ID ID ID ID ID 0100 0101.. 0100 0101.. 0100 0101.. 0100 0101.. 0100 0101.. Discovery ID ID ID ID ID ID ID ID Access ID ID ID ID ID Scientists, Data Curators, End Users, Applications Interpretation Datasets Accessed via Repositories

  10. Interpretation • Registries • Schemas • Vocabularies • Formats • Available services • Useful client-side tools • Trust • Who did this? • Who owns this? • Provenance • Data Source • Processing steps • Computing environment • what is needed to trust the numbers? • Domain specific?

  11. Enabling Technologies Enabling Technologies ID ID ID ID ID 0100 0101.. 0100 0101.. 0100 0101.. 0100 0101.. 0100 0101.. Discovery ID ID ID ID ID ID ID ID Access ID ID ID ID ID Scientists, Data Curators, End Users, Applications Interpretation Datasets Accessed via Repositories Reuse

  12. Reuse • Everything from Interpretation slide + Permissions • Example from BOF: I need to understand a data set for peer review but that doesn’t give me permission to use the data • Validation • Education & Training • Integrate ‘live’ data into education and training • Repurpose data

  13. DAITF Roles? • Bring good people together on a regular basis to discuss these issues • Get agreement on vocabulary for discussing data access and interoperability? • Working groups on specific topics • Prototyping specific interoperability issues / domains • Create high-level framework, ala OAIS? Multiple frameworks? • Guides to Registries and Best Practices

  14. Research Data Alliance Plenary 2 UpdateDr. Francine BermanChair, RDA/USHamilton Distinguished Chair in Computer ScienceRensselaer Polytechnic Institute

  15. RDA Plenary 2 -- September 16-18, Washington D.C. -- 3 days of Peace, Love and Data • RDA Plenary 2 • 368 participants from 22 countries and all sectors • All-hands stakeholder talks and RDA working meeting • Data Citation Summit convened by DataCite, FORCE11,CODATA/ICST, ESIP, DCC, etc. to create a common agenda • ~5000 tweets over 3 days

  16. RDA Community Current Status: ~1300 participants from 50+ countries Albania Australia Austria Bangladesh Belgium Bolivia Botswana Brazil Bulgaria Canada China Congo {Democratic Rep} Costa Rica Czech Republic Denmark Estonia Finland France Germany Greece Iceland India Iran Ireland Ireland {Rep} Italy Japan Krygrystan Kuwait Mexico Netherlands New Zealand Norway Palestine Poland Portugal Russian Federation Rwanda Serbia Singapore Slovenia South Africa South Korea Spain Sweden Switzerland Taiwan Turkey United Arab Emirates United Kingdom United States Vatican City Venezuela Fran Berman

  17. RDA Community Building Momentum • Growth in number and scope of Interest Groups and Working Groups • New: BOFs for groups as precursor to Interest Groups • Groups beginning to “self-monitor” to promote concrete deliverables to be used and adopted • Increasing interest in more interaction and “connective tissue” between groups • Pressing To-Dos before Plenary 3: • Develop an RDA policy for IP that comes up in Interest and Working Groups • Determine the form of RDA deliverables and what’s needed in terms of an “RDA archive”

  18. Groups that Met at the RDA Plenary BOLD = new since last Plenary • Birds-of-a-Feather • Linked Data • Chemical Safety Data • Education and Skills Development in Data Intensive Science • Libraries and Research Data • Cloud Computing and Data Analysis Training for the Developing World • Working Groups • Data Type Registries • Metadata Standards • Practical Policy • Persistent Identifier Types • Data Foundations and Terminology • Data Categories and Codes • Interest Groups • Agricultural Data • Big Data Analytics • Data Brokering • Certification of Trusted Repositories (joint with ICSU-WDS) • Long tail of Research Data • Marine Data Harmonization • Community Capability Model • Data Publishing (joint with WDS) • Toxicogenomics Interoperability • Research Data Provenance • Data Citation • Metadata • Economic Models and Infrastructure for Federated Materials Data Management • Engagement • Preservation e-Infrastructure • Legal Interoperability (joint with CODATA) • Global Registry of Trusted Data Repositories and Services • Digital Practices in History and Ethnography • Data Citation Harmonization Summit • DataCite,FORCE11,CODATA/ICST, ESIP, DCC, etc.

  19. RDA Organizational Partners New RDA constituencies / stakeholders • Organizational Assembly = Organizational Members (subscription) + Organizational Affiliates (MOUs). • Organizational Advisory Board will representOrganizational Assembly. • Current Status: • Organizational Membership under discussion with Microsoft, IBM, ANDS, Australian Antarctic Data Center, Intersect, Terrestrial Ecosystems Research Network, CSC – IT, Center for Science Ltd., Oracle, STFC, CNRI, STM, EUDAT, Barcelona Supercomputer Center, Columbia University Libraries / Information Services, and many more after the Plenary • Organizational Affiliation under discussion with CODATA, WDS and others • Next 6 months (before Plenary 3) • Firm up model for Affiliates (how many, how substantive should the interaction be?) • Complete creation of legal entity to host subscriptions for Organizational Members • Elect Organizational Advisory Board at Plenary 3

  20. New Position: RDA recruiting for full-time Secretary- General RDA Constituent Groups Coming Together RDA Colloquium (National Research Agencies and Funders) RDA Membership RDA Council (overarching leadership) Technical Advisory Board (Technical oversight) Secretary-General and Secretariat (Administration and Operations) Organizational Advisory Boards and Organizational Assembly (Organizational partnerships and guidance) Working Groups and Interest Groups(impact - focused infrastructure)

  21. Next Plenaries (2X a year) • Plenary 3 will be in Dublin March 26-28 in 2014, hosted by Australia and Ireland • Plenary 4 will be in the Netherlands – late September in 2014 • Plenary 5 or 6 likely back in the U.S. (west coast?)

  22. Info:enquiries@rd-alliance.org Fran Berman

  23. Data Type Registries (DTR) Co-Chairs Larry Lannom: CNRI DaanBroeder: MPI September 2013 RDA Plenary 2 Washington, DC

  24. Goal: Interoperable Set of Data Type Registries • Data Types • Characterize data structures at multiple levels of granularity • Formats are just part of the story • Optimize interactions between data producers & consumers by having types defined and associated with the data they describe • Types should be standardized, discoverable, and unique • Type Registries • Each type registered with unique identifier • Common data model and expression • Associate with services, tools, format registries, etc. • Common API for machine consumption

  25. Schedule • 3/2013 – 9/2013 • Gathering use cases • Investigating other work in the area • First drafts of data model and functional specs for a type registry • 10/2013 – 12/2013 • Refine data model and functional specs • Deploy initial prototype • 1/2014 – 5/2014  • Finalize data model and functional specs • Deploy functional type registry for PID types • Release turnkey registry conforming to functional specs

  26. DTR Use Cases • Broad Functional Classification • Repos hold widely varying levels of data & metadata • High-level functional classification of the identified object needed to make sense of what is available, e.g., data object, metadata, repo description, contact info, etc. • Simple License Information via PID Resolution • Data set access conditions cannot be predicted based on ID • For DataCiteDOIs, a handle/type/value triple could be used to provide access information, probably through a level of indirection, resulting in a pop-up or intervening page or open linked data • Object Types as a Short-cut for Dependent Services to Match Processing Requirements to Data Objects • Using data acquisition as an example • Determine object type you are trying to build • Consult registry to index into an ontology to dynamically define required and optional properties • Does the input data have what is needed? • Registration of PID Types (in ID/Type/Value triples) for Data Processing and Interpretation • Distinguish pointers to objects from pointers to metadata from pointers to services • Enable complex client interactions as opposed to simple one-to-one re-direction

  27. One Use of Type Registries ID ID Type ID ID Type Users ID Payload Type Type ID Payload Type Payload Payload Type Payload Payload 3 4 2 1 2 3 4 1 4 Typed Data Terms:… I Agree Visualization 10100 11010 101…. Rights Data Set Dissemination Data Processing Federated Set of Type Registries Services Client (process or people) encounters unknown type Resolved to Type Registry Response includes type definitions, relationships, properties, and possibly service pointers. Response can be used locally for processing, or, optionally Typed data or reference to typed data can be sent to service provider

  28. A Few Words About CNRI • Not-for-profit organization formed in 1986 to foster research and development for the National Information Infrastructure (now internationally focused) • Major focus on management of information on networks: Digital Object Architecture • Handle System • DO Repository • DO Registry

  29. Handle System Adoption by Domain • Research Project: Early 90s • Initial US-funded digital library project (DARPA) • Library/Publishing: late 90s through 00s and continuing to grow • DSpace – turnkey digital library platform (MIT + HP) • Digital Object Identifier (DOI) for journal articles • International from the start, including Asia • Breaking out of the publisher/library ghetto: starting late 00s • Scientific data • Australian National Data Service (ANDS) • Max Planck (handles) • DataCite (DOIs) • EPIC (European Persistent Id Consortium) • EUDAT • Entertainment Industry • EIDR (DOIs) • Threshold of use and dependence brings governance and sustainability Issues • Who is CNRI? How long will they be around? • Who is in charge? • Not just a standards issue due to the global service (cf DNS)

  30. Infrastructural Governance and Sustainability • Spread Responsibility and Control from One Group to Many • Involve stakeholders • Develop financial sustainability plan • Develop an organizational model • Try to balance long-term and short-term incentives • Try to keep the organization from being captured by minority and/or moneyed interests • Build in flexibility • Independence from individual governments or industry players • DONA Foundation • Non-profit being established in Switzerland • Peer group of stakeholders will run and financially support the global infrastructure • Board of Directors will provide high-level guidance • CNRI will transfer relevant rights and technology to the Foundation and continue as 1/N stakeholders • Each stakeholder has identical responsibilities to the Foundation but otherwise independent • Governments could participate and provide their support out of general revenues • Industry could create appropriate business models • Formation in process, near term completion • Longer range objective is Digital Object Architecture approach to information system interoperability

More Related