240 likes | 329 Views
T HE US N ATIONAL V IRTUAL O BSERVATORY. Metadata and Registries: Describing and Finding VO Resources. R. Hanisch 1 , R.Plante 2 , G. Greene 1 , A.E. Linde 3 , T. McGlynn 4 , W. O’Mullane 5 , A.M.S. Richards 6 , R. Williams 7 , R. Williamson 2 , E. C. Auden 8 , K. T. Noddle 3
E N D
THE US NATIONAL VIRTUAL OBSERVATORY Metadata and Registries: Describing and Finding VO Resources R. Hanisch1, R.Plante2, G. Greene1, A.E. Linde3, T. McGlynn4, W. O’Mullane5, A.M.S. Richards6, R. Williams7, R. Williamson2, E. C. Auden8, K. T. Noddle3 1) Space Telescope Science Institute 2) National Center for Supercomputing Applications 3) University of Leicester 4) NASA Goddard Space Flight Center 5) The Johns Hopkins University 6) Jodrell Bank Observatory 7) California Institute of Technology 8) Mullard Space Science Laboratory Australia VO - ATNF
Resource Metadata • A resource is any VO entity that can be described and given a name and unique identifier • Data collection (archive) • Catalog or collection of catalogs • Organization • Software packages • Bandpass filter functions • Services • Services are VO resources that can be invoked by a user or software agent to perform some action on their behalf • Metadata describes VO resources. This metadata generally includes information the user or a computer program needs to determine if a resource is of interest and how a service is invoked. Australia VO - ATNF
Resource Metadata • Resource metadata is described by • A prose document that defines concepts independent of an encoding scheme • XML Schemas that encode metadata and metadata relationships • Draws on Dublin Core metadata • An interdisciplinary standard for core resource metadata http://dublincore.org • Can be categorized • Identity • Curation • General content • Collection/service content • Data quality • Service invocation Australia VO - ATNF
Resource Metadata Australia VO - ATNF
Resource Metadata Example Collection and service content metadata Facility Apache Point Observatory, Sloan 2.5-m Telescope Instrument Five-band clocked CCD camera Coverage.Spatial polygon (FK5, 145.17, 1.25, 235.9, 1.25, 235.9, -1.25, 145.17, 1.25) or polygon (FK5, 250.71, 66.29, 267.0, 66.29, 267.0,52.15, 250.71, 66.29) or polygon (FK5, 350.43, 1.17, 360.0, 1.17,360.0, -1.25, 350.43, -1.25) or polygon (FK5, 0.0, 1.17, 56.37, 1.17, 56.37, -1.25, 0.0, -1.25) Coverage.RegionOfRegard 0.0001 Coverage.Spectral Optical Coverage.Spectral.Bandpass u’, g’, r’, i’, z’ Coverage.Spectral.MinimumWavelength 400.e-9 Coverage.Spectral.MaximumWavelength 850.e-9 Coverage.Temporal.StartTime 1999-12-25 Coverage.Temporal.StopTime 2001-07-15 Coverage.Depth 3.e-6 Coverage.ObjectDensity 6.e4 Coverage.ObjectCount 2.e7 Coverage.SkyFraction 0.01 Resolution.Spatial 0.00028 Resolution.Spectral 5000 Resolution.Temporal 120 UCD Not Provided Format text/xml Rights Public Data quality metadata DataQuality A Uncertainty.Photometric 3.e-7 Uncertainty.Spatial 0.00003 Uncertainty.Spectral 1.e-11 Uncertainty.Temporal 0.1 Identity metadata Title Sloan Digital Sky Survey ShortName SDSS Identifier ivo://stsci.edu/mast/sdss Curation metadata Publisher Space Telescope Science Institute/MAST PublisherID ivo://stsci.edu/mast Creator Sloan Digital Sky Survey Consortium Creator.Logo http://archive.stsci.edu/images/sdss_logo.gif Contributor Sloan Digital Sky Survey Consortium Date 2001-06-15 Version SDSS EDR ReferenceURL http://archive.stsci.edu/sdss/index.html Contact.Name Archive Branch, Space Telescope Science Institute Contact.Address 3700 San Martin Drive, Baltimore, MD 21218 USA Contact.Email archive@stsci.edu Contact.Telephone +1-410-338-4547 General content metadata Subject galaxies, quasars, stars, CCD photometry, spectroscopy, redshift, sky surveys Description The Sloan Digital Sky Survey is using a dedicated 2.5-m telescope and a large format CCD camera to obtain images of over 10,000 square degrees of high Galactic latitude sky in five broad bands (u', g', r', i' and z', centered at 3540, 4770, 6230, 7630, and 9130 Å, respectively)… Source 2002AJ….123..485S Type Survey, Catalog, EPOResource ContentLevel Research Relationship mirror-of RelationshipID ivo://sdss.org/sdss/edr Required keywords shown in red Australia VO - ATNF
Resource Metadata Example Service metadata Service.InterfaceURL http://archive.stsci.edu/cgi-bin/sdss/catalog.html Service.BaseURL http://archive.stsci.edu/cgi-bin/sdss/catalog Service.HTTPResults text/xml Service.StandardID ivo://ivoa.net/Services/ConeSearch Service.StandardURL ivo://www.ivoa.net/Documents/REC/ConeSearch.html Service.MaxSearchRadius 0.2 Service.MaxReturnRecords 5000 Australia VO - ATNF
Resource Metadata: XML Schema • Classes of Resources Organization, DataCollection, Service, Registry • Specific classes inherit from generic <Resource> • Organized into separate schemas: • Core resource metadata: VOResource • Various extensions schemas containing specific types • Capable of describing… • Data centers, research organizations, missions, observatories • Data collections, archives • VO standard services: Cone Search, Simple Image Access • Existing Browser/CGI-based services Australia VO - ATNF
The Role of Resource Registries • Used to discover and locate resources—data and services—that can be used in a VO application • Registry: a list of resource descriptions • Expressed as structured metadata to enable automated processing and searching • Registries are themselves VO Resources Australia VO - ATNF
Registry Requirements • Allow user to select resources that are likely to pertain to a scientific question • Select resources based on characteristics… • Type of resource: catalogs, image archives, EPO, services • Coverage in space, time, and frequency • Where data comes from, who curates it • Dynamic: resources will come and go • Distributed: Should not depend on a single point of failure or single view of the VO. • Preserve the data providers’ control over their data • Curators control what gets registered, content, updates • Allow integration with existing resource management • Allow extension to new types of resources Australia VO - ATNF
IVOA Registry Working Group (RWG) • Common approach to registries • Work packages • Science requirements and use cases • Resource metadata • Registry interfaces • Prototyping • Distributed model for registries Australia VO - ATNF
Full Searchable Registry Local Publishing Registry Local Publishing Registry Registry Model VO Projects Full Searchable Registry Data Centers Local Searchable Registry Specialized Portals & Services Australia VO - ATNF
Full Searchable Registry Local Publishing Registry Local Publishing Registry Registry Model VO Projects harvest (pull) Full Searchable Registry Data Centers Local Searchable Registry Specialized Portals & Services Australia VO - ATNF
Full Searchable Registry Local Publishing Registry Local Publishing Registry Registry Model VO Projects harvest (pull) replicate Full Searchable Registry Data Centers Local Searchable Registry Specialized Portals & Services Australia VO - ATNF
Full Searchable Registry Local Publishing Registry Local Publishing Registry Registry Model VO Projects harvest (pull) replicate Full Searchable Registry Data Centers selective harvesting Local Searchable Registry Specialized Portals & Services Australia VO - ATNF
Full Searchable Registry Local Publishing Registry Local Publishing Registry Registry Model VO Projects Full Searchable Registry Data Centers search queries Local Searchable Registry Client Applications Specialized Portals & Services Australia VO - ATNF
Full Searchable Registry Local Publishing Registry Local Publishing Registry Registry Model VO Projects Full Searchable Registry Data Centers search queries Local Searchable Registry Client Applications Specialized Portals & Services Australia VO - ATNF
NVO Prototype Registry • To support a Data Inventory Service (DIS) What is known about a position in the sky? • Use a registry to locate and query standard services: • Cone Search Services: querying catalogs • Simple Image Access Services: querying image archives and cutout services Components • Publishing Registries • Searchable Registry • Resource Metadata • Harvesting Protocol • Populated with service descriptions Australia VO - ATNF
Publishing Registries:getting information into registries • Two publishing registries established at Caltech and NCSA. • Motivation: • Register Simple Image Access Services • Develop techniques for easy registration • Resource descriptions stored as XML documents using VOResource schema Australia VO - ATNF
Harvesting Interface • Adopted Open Archives Initiative (OAI) Protocol for Metadata Harvesting • HTTP/CGI-based protocol for exposing metadata to harvesters (e.g. searchable registries) • Advantages: • Existing, field-tested design we didn’t have to re-invent • Fairly easy to implement • Existing tools for emitting and harvesting metadata • Exposes our metadata to larger digital library community Australia VO - ATNF
Models for Registering Resources • Curator uses another site’s registry • Good for a few resources whose descriptions are fairly static e.g. @NCSA: http://nvo.ncsa.uiuc.edu/nvoregistration.html • VORegistry-in-a-box: • Deployable package that allows a data provider to run own registry “out of the box” http://nvo.ncsa.uiuc.edu/VO/software • Good for larger number of resources that might be updated often • Curator builds own OAI interface • Good for very large number of resources • Automate XML generation using site’s existing information management tools Australia VO - ATNF
Searchable Registry • Searchable Registry was set up at JHU/STScI http://skyserver.pha.jhu.edu/devel/registry • OAI harvester collects resource descriptions • from Publishing Registries at Caltech & NCSA • Loads data into relational database • SOAP Web Service interface http://skyserver.pha.jhu.edu/devel/registry/registry.asmx • Searching • Currently provides specialized querying useful for DIS • Re-harvest request • To get updated records from publishing registries Australia VO - ATNF
Local Publishing Registry Registry Model JHU/STScI Full Searchable Registry harvest (pull) Caltech Local Publishing Registry NCSA search for services DIS Data Inventory Service Australia VO - ATNF
Cone Search Service Cone Search Service Cone Search Service Local Publishing Registry Registry Model JHU/STScI Full Searchable Registry harvest (pull) Data Providers Caltech Simple Image Access Local Publishing Registry Simple Image Access NCSA search for services Simple Image Access DIS Data Inventory Service Australia VO - ATNF
Summary • We built a working prototype registry system to support an end-user VO service • Distributed Publishing and Searchable components • Encoded descriptions using emerging VO XML standard schemas • OAI Harvesting Standard deployed easily • Used to discover Cone Search and SIA services • What’s next: Interoperable registries IVOA-wide • Implement newly agreed-upon Resource Metadata standard and VOResource XML schema • Demonstrate harvesting and replication • Populate registries with broad base of VO resources • Standardize registry query interfaces Australia VO - ATNF