250 likes | 368 Views
Architectural Directions for Distributed Geolibraries. Greg Janée gjanee@alexandria.ucsb.edu. Outline. (Previous) testbed design and experiences Vision & goals Architecture: foundation CRADDL FEDORA Architecture: additions Standard thesaurus interface Two standard metadata models
E N D
Architectural DirectionsforDistributed Geolibraries Greg Janée gjanee@alexandria.ucsb.edu
Outline • (Previous) testbed design and experiences • Vision & goals • Architecture: foundation • CRADDL • FEDORA • Architecture: additions • Standard thesaurus interface • Two standard metadata models • Core index/search service • Tile-based browse/aggregation service • Collection discovery service • Gazetteer service • Goals, revisited
a distributed catalog system Library set of collections client (public) services Collection set of holdings metadata reports library (internal) services Holding unique identifier Holdings have footprints Earth surface location(s) point bounding box polygon(s) Gazetteer convertsnames footprints Services accessible via HTTP Methods = URLs Metadata encoded in XML Queries Reports Testbed: Concepts
“Search Buckets”(generic query metadata) Geographic locations Dates Types Formats Originators Subject-related text Assigned terms Identifiers Reports(descriptive metadata) Collection Scan Full Browse Access Testbed: Metadata
Testbed: Services Clients • Configuration {collection-id} • Collection(collection-id) report • Query(query) query-id • Results(query-id) {holding-id} • Metadata(collection-id, holding-id, view) report Libraries • Collection report • Query(query, accumulator) query-thread • Metadata(holding-id, view) report Collections
Testbed: Implementation HTTP + GIF web browser map/footprint service client HTTP server HTTP + HTML HTML + GIF renderer (offline) webclient intermediary map server Java + XML middleware ADL middleware server vectormap data HTTP +XML Java + XML XML configurationscripts collection driver servlet engine JDBC collection collection database indexed search buckets +basic holding metadata collection metadata (complete) RDBMS XML HTTP coverage/statisticsscripts metadata accessors (one per series) data accessors (one per series) HTTP server XML XML collection metadata (static) Berkeley/DBM databases (multiple per series) local file cache SRB to SDSC
Vision • Fundamental organization of information • Self-contained, georeferenced digital objects... • ...aggregated into collections... • ...networked into libraries • The Library constitutes a “Digital Earth” • Static and dynamic content • Personal, customizable collections • Collaborative use of distributed resources • Component-based approach • Specify interfaces and protocols • Build representative services
Goals • Find • Find appropriate collections • Find items within collections • Using simplified, uniform methods • Using more refined, perhaps collection-specific methods • Assemble, structure, publish • Create and populate new collections • Structure collections using domain-specific thesauri • Make available to others • Use • Invoke operations on items • Integrate library into user application environment
get configuration, characteristics client submit query; retrieve results collection retrieve, deposit, and operate on digital objects update summary information repository index/search naming update index resolve name CRADDL D-Lib Magazine, Nov. 1998
type “image” identifier crop(x,y,w,h) subsample(factor) getThumbnail() Type crop(x,y,w,h) signature subsample(factor) implementation getThumbnail() attachments implementation “image” 3.14159 2.71828 FEDORA apologies to Christophe Blanchi
Architecture: additions • Standard thesaurus interface • Two standard metadata models • ADL-Basic: supports uniform description & search • ADL-Full: supports XML-based querying on entire metadata • Standard index/search services • Core index/search service • Tile-based browse/aggregation service • Collection metadata • Characterizes collection • Supports collection discovery • Gazetteer service
physiographic feature getTopTerms() broader broader broader narrower narrower narrower getDefinition(term) getBroaderTerm(term) mountain getNarrowerTerms(term) cliff arête etc. ridge A long and narrow upland with steep sides. related preferred summit hogback drumlin Standard thesaurus interface
ADL-Basic fields & subfields • Originator • Subject-related text • Title • Assigned terms • Type • Format • Spatial domain • Date • Time period of content • Identifier
Semantics Characteristics DATE collection metadata FGDC 2.5.1.4 DOQ 3.11 FGDC 1.1/8.2 Repeatable Optional Always present Name Name Value Source image date Source DEM date Production date Source image date Source DEM date Production date 1972-03-05 1966-01-01 1982-12-19 Semantics DATE FGDC 2.5.1.4 DOQ 3.11 FGDC 1.1/8.2 digital object client index/search service ADL-Basic general structure
ADL-Basic field definition Type • Semanticsidentifies the nature, genre, meaning, or intellectual content of the item • Contentzero or more terms drawn from identified thesauri • XML representation <!ELEMENT ...> • Collection metadata implicationscollection metadata lists all referenced thesauri • Query valuesingle term from an identified thesaurus • Query operator “is a” {(“Object Types”, “aerial photograph”), (“Geology Concepts”, “erosion”)} Type is a (“Object Types”, “image”)
ADL-Full • Encodes the full, native metadata in a standard syntactic representation • RDF • ADL’s generic encoding <source prefix=“MARC”>http://lcweb.loc.gov/marc</source> ... <group name="Data quality"> <field name="Accuracy" source="MARC:514g"> <value type="number" unit="meters">25</value> </field> <field name="Contour interval" source="MIL-B:06200a"> <value type="number" unit="meters">10</value> </field> </group>
Index/search services • Core index/search service • Based on ADL-Basic • Boolean combinations of constraints on ADL-Basic fields and subfields (only) • Support for other, more refined search services • Utilizing ADL-Basic metadata mappings • Based on ADL-Full
Aggregate statistics • Digital objects (scale-dependent) • By type, format, and date Browse/aggregation service
Collection metadata • Static • Scope and purpose, maintaining agency, etc. • Derived • Referenced thesauri • Referenced metadata standards • Native ADL-Basic metadata mappings • Statistical summarization via browse/aggregation service
Gazetteer service (1/3) • Stratford-upon-Avon • Variant names Stratford upon Avon; Stratford • Location N 52° 11’, W 1° 42’ • Feature type populated place (NIMA); town (local) • Time period 1196–present • 805 • Feature type U.S. telephone area code • 37T • Feature type UTM zone • Mississippi • Feature type drainage basin
postal codes postal codes U.K. ZIP codes countries counties counties California U.S.A. states parishes time zones Louisiana 1st order administrative areas continental plates 2nd order administrative areas national parks Gazetteer service (2/3) • Geographic namespace: spatial partition of a region into uniquely named subregions
Yogi Cabbage Patch Barnacle Bill placenames browse/aggregation service Pancake Wedge Bamm-Bamm gazetteer Gazetteer service (3/3) client map service
Goals, revisited • Find • Discovery service based on rich collection metadata • Uniform searching based on ADL-Basic • More refined searching based on ADL-Basic and ADL-Full • Browse/aggregation service • Assemble, structure, publish • Uniform use of collections • Thesauri and inheritance of collections and digital objects support customization & structure • Collections designed to span gamut big small • Use • FEDORA-like extensible digital object model