1 / 15

Theme 3: Architecture

Theme 3: Architecture. Q1: Who houses stuff, both records and identifiers. All useful services and repositories are centralized (latency, etc.) … but centralizing content will be costly, require agreements, create liabilities re: versioning, etc. etc. – problematic as a short-term goal

westj
Download Presentation

Theme 3: Architecture

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Theme 3: Architecture

  2. Q1: Who houses stuff, both records and identifiers • All useful services and repositories are centralized (latency, etc.) … but centralizing content will be costly, require agreements, create liabilities re: versioning, etc. etc. – problematic as a short-term goal • Overall specialized repositories are proliferating, not converging • If the content stays only in the subject-specific repositories (SSRs) • Provide opt-in storage services (funding model?) • Provide audit function re: repository compliance with standards (e.g. RLG/OCLC trusted repository guidelines) • Provide information/guidance on formats (risk, migration) • Extend JHOVE for key formats important to the community

  3. Q1: Who houses stuff, both records and identifiers (cont.) • Many formats in the field … most data in a small number of formats but data in the long tail is very important (engage GDFR?) • Metadata may be more widely replicated than data • External resources (SSRs): utilize OpenURL to facilitate (and distinguish between) access to data, services, metadata, etc. for a single item – link journal-hosted data with additional/ancillary data hosted by DRIADE? • Service level agreements

  4. Q2: It is productive to process full-text for automated generation of context metadata? • Yes, but … • There a variety of ways to do this … quantitative analysis less costly, natural language processing requires more investment • More can be done if access full text is allowed (comb full text for linkages, etc.) • Portal searches can also be contextualized using a ‘bag of words’ approach to describing subfields as indexes • Combination of statistical processing, natural language processing, rise of XML-based metadata, can help • Can capture administrative/technical metadata in data flows

  5. Q3: Does storing a local copy make sense for a SSR handshaking? • Helps to assure persistent access to content (as with CiteSeer) … but comes with burden and responsibility • Data vs. application – need to secure access to underlying data … replicating AJAX-y services very, very hard • Versioning is a key issue here

  6. Q4: Is everyone in agreement with the ‘don’t compete with Google’ conclusion? • Yes and no: develop community-specific discovery environments • … but also expose content to Google (expose, contextualize, refer to domain-specific systems) – leverage commonly used interfaces • Google, Microsoft etc. now highly value highly-curated collections and are actively engaging them • Google’s current interface is the big thing now … be prepared to interface with the next big thing • Worldcat.org as an advanced discovery environment for scholarly material: including (increasingly) data

  7. Q5: What are the pros and cons of DOIs, handles, and other identifiers? • One of most important issues DRIADE will face • Persistent, actionable identifiers vs. unique identifiers in various sub-domains and individual institutions (an item will have many IDs) • Question of DOI expense, connection to publishers • Need community understanding of a ‘canonical identifier’ • Need a community discussion in terms of what is important about identifiers • Who controls/changes, software used, locally-hosted? • What cost? Branding? Need resolution data? • 3rd party assignment of persistent identifiers?

  8. Q5: What are the pros and cons of DOIs, handles, and other identifiers? (cont.) • Need to promote datasets to primary resources (not just subordinated to article) in references and discovery • For multi-file datasets – need to link to surrogate or package • Identifiers as “micro-billboards”—and generators of data about contextual use of data (resolution data)

  9. Q6: Data and applications: where does the complexity live? • Leave it up to the community to develop best practices over time • Over-engineering here will make it harder to be responsive to change • Facilitate and let practice develop within sub-communities (testbeds for innovation) • Content packaging plays a role here: bundling data with services, documentation, etc. • Utilize (and cultivate) web services and lightweight APIs to facilitate access across and between systems • Some opportunities to ‘dessicate’ replications from complex applications

  10. Q7: How does death fit into the metadata lifecycle? • ‘Tombstoning’ for dead data • Data euthanasia? • Shifts in contact info (author, data custodian)

  11. Q8: How to nurture bottom-up growth of data standards? • Help to foster individual sub-communities, and cultivation of best practices at the sub-community level that can be used to inform other efforts or the broader infrastructure • Sharing and re-use encourages consolidation of standards/best practice—cultivating mechanisms for sharing/re-use may help with achieving data consistency • Start from existing baseline standards -- perhaps offer broad generalized standards as a starting point?

  12. Theme 3: Architecture

  13. Theme 3: Architecture

  14. Theme 3: Architecture

More Related