210 likes | 354 Views
FRBR Applied to Scientific Data. Joseph A. Hourclé 2008-Sept-22 ASIS&T PVC. About Me. Functional Requirements for Bibligraphic Records (FRBR). Reference Model for the design of bibliographic catalog systems. Defines four different concepts of ‘book’ that might be cataloged. Work
E N D
FRBR Applied to Scientific Data Joseph A. Hourclé 2008-Sept-22 ASIS&T PVC
Functional Requirements for Bibligraphic Records (FRBR) • Reference Model for the design of bibliographic catalog systems. • Defines four different concepts of ‘book’ that might be cataloged. • Work • Expression • Manifestation • Item
FRBR Group 1 Entities • Work • A distinct intellectual or artistic creation • Expression • The intellectual or artistic realization of a work in the form of alpha-numeric, … sound, image, object, movement, etc … • Manifestation • The physical embodiment of an expression of a work • Item • A single exemplar of a manifestation
What questions can we ask of each level? • Work • Who wrote it? What is the subject? • Expression • What language is it in? • Manifestation • What size is the font or book? • Item • Is the individual copy available to me?
Why ask these questons? • Work • Who wrote it? What is the subject? • Determine interest / Applicability • Expression • What language is it in? • Usability / Acccessibility (of content) • Manifestation • What size is the font or book? • Usability / Accessibility (of content within carrier) • Item • Is the individual copy available to me? • Availability / Accessibility (of the carrier)
Two Extra Entities • Sensor • Converts information about its environment to a digital signal • Observation • Data created by the sensor • Necessary to unambiguously track if two works are different interpretations of the same data
In this model … • Item • Is a logical item that might be identified via a URL. • Two items of the same manifestation would be bytewise identical copies • Manifestation • A logical embodiment, to include aspects of the carrier • How each datum is organized within the package • File format and encoding • Typically contains multiple expressions • Two manifestations of the same expression contain identical values within each datum
In this model … • Work • Calibrated state of the data • Translation of the sensor output to remove sensor issues or to physical units • Two works of the same observation would be interpretations of the same raw sensor data • Also includes catalogs and metadata • But through other expressions, not directly derived from the observation • Expression • The numeric values encoded in the file • Two expressions of the same work would have been generated from the same calibration of the observation
Limitations • Scientific Discipline • Each discipline has different requirements for attributes describing their data • Digital Objects • Does not deal with digitization from analog sources or generation of physical items • Non-Human Workflow • May need to model software and other aspects of the data workflow
Limitations • Data Collection vs. Data Granule • Do we model each successive data object, or the full set of aggregated objects? • Similar to tracking journals vs. articles • Individual Objects vs. Dynamic Packaging • Scientific archives are moving to packaging on distribution, rather than storing the data in files • Data Archives Without Attached Metadata • Metadata is tracked as a supplementary work that may be contained in the same manifestation to prepare for this eventuality
Sunspot on 15 July 2002 from the Swedish 1-m Solar Telescope on La Palma
http://virtualsolar.org/ joseph.a.hourcle@nasa.gov
Different Observations 171Å 195Å 284Å 304Å
Downsampled data 2x2 binned 5-min averages 8bit vs. 16bit pixels Lossy compression JPEG / JPEG2000 Datum extrapolation to fit a different coordinate system Any form of data loss Any form of data ‘creation’ to fill in missing data Different Expressions
Different Manifestations • Changes in Carrier / Packaging: • Different metadata attached • Different file formats • FITS vs. CDF vs. HDF • Different aggregation • individual images vs. an hourly collection
Different Items • Bytewise identical • Stored in different locations