150 likes | 258 Views
C ommunity In ventory of E arthCube R esources for G eoscience I nteroperability. CINERGI. data discovery is the most often cited issue in executive summaries on the EarthCube web site . Ilya Zaslavsky, Steve Richard and the CINERGI team. http:// workspace.earthcube.org/cinergi.
E N D
Community Inventory of EarthCubeResources for Geoscience Interoperability CINERGI data discovery is the most often cited issue in executive summaries on the EarthCube web site Ilya Zaslavsky, Steve Richard and the CINERGI team http://workspace.earthcube.org/cinergi
Goals • Large inventory of high quality information resources across disciplines, with traceable provenance, usable across EarthCube research scenarios: • datasets, catalogs, vocabularies, information models, services, process models, repositories, etc. • Make it open to the community • Organize it to enable search and integration across domains and linking between information objects • Plus links between resources, people/organizations, publications, models, workflows, software, activities, etc.
Approach • Build on high-level resource inventory started at http://connections.earthcube.org • Compile metadata for as many resources as we can (collect recommendations from geoscientists, harvest existing catalogs) • Expose through simple search interface • Use off the shelf technology: Geoportal, ISO metadata, CSW • Make it accessible through EarthCube.org
Readiness assessment 1 Also evaluated: processing services; visualization services; community consensus efforts; identifier persistence
High-level inventory and readiness assessment: viewer http://connections.earthcube.org
Resource descriptions Ye Most Excellent EarthCube Inventory System Harvest adapters Staging Database Interfaces to the world Document processing components Public access components Harvest adapters: components that connect to information sources and import descriptions of EarthCube resources into the staging database. Staging Database: document database that persists the originally harvested descriptions in their native state, as well as any additional information or updates resulting from subsequent processing/curation of the description Document processing components: components that pull documents from the staging database, perform various functions to upgrade content or transform presentation. The processed document may be pushed back to the staging database or out to the public access components Public access components: components that connect to document processors and implement external interfaces to present content for users
Then add features • Links to organizations, researchers, other systems • Validation Services • Deep registration of datasets/databases (at feature level) • Data search capabilities • Quality/interop readiness assessment • Annotation system
CINERGI Outline (without deep registration so far) Hot page Community pivots Search UI Pivot for search results pivotDB Publication CSW, ISO 19115 Extra metadata, provenance, links, annotations ATOM, GeoRSS, etc. Geoportal Linked data RDF, RDF store, eg Neo4j Search in domain systems Reporting to sources Validated triples Duplicate detection, tagging, grouping WAF w/XML ISO Curation UI Staging and curation Record editor 4. Finding ambiguities for manual curation Results of parsingProvenanceDuplicate flags 3. Spatial parser LOD parser Topic parser MongoDB, CouchDB Geoportal, etc. Staging DB: MDB Person /org parser Keyword parser Time parser geoportal Need a parser API so parsers can be added 2. Triggering parsers depending on metadata and validation results Harvesting dashboard Harvesting CSW, OAI-MPH, WAF, CKAN, other 1. Metadata validation per record ISO DC other DISCO
Challenges • Scope • Different levels of granularity • Lack of formal information models • Implicit domain semantics • Multiple metadata registry platforms and standards • Lots of data outside managed repositories • Cross-domain governance vs domain systems • Different expectations across domains (survey)
Initial inventory Resources from domain workshops and surveys + initial harvesting http://metadata.earthcube.org
Domain inventories: you are invited to participate! • All sources of data mentioned at domain end-user workshops – are included • Working with funded RCNs Step 1: Prepare an initial collection in a spreadsheet.Step 2: CINERGI will set up your community resource viewer and editing system, seeded with your collectionStep 3: Community editing, updates and curation
Short questionnaire Potential added value by a cross-domain system Integration with cross-domain search Key characteristics for CINERGI See CINERGI Survey at http://workspace.earthcube.org/data-facilities
Community Partners Development Team • Anthony Aufdenkampe: Critical Zone Observatories • Shanan Peters: stratigraphy • Bernhard Peucker-Ehrenbrink: Global River Observatories • RCN projects that plan to organize community resources • Test Enterprise Governance • Building Blocks projects working on web services, brokering solutions • Agencies • International • San Diego Supercomputer Center/UCSD • Ilya Zaslavsky, David Valentine, Tom Whitenack • Amarnath Gupta, Jeff Grethe (NIF project) • Lamont /Columbia Univ./IEDA • Kerstin Lehnert, Leslie Hsu • Arizona Geological Survey • Stephen Richard • University of Chicago • Tanu Malik • Open Geospatial Consortium • Luis Bermudez