200 likes | 298 Views
Additional text in “Notes” view. Building a Distributed Geospatial Library. where we are now where we’re going what we’re facing. Greg Janée gjanee@alexandria.ucsb.edu. Goals. Digital library for georeferenced information
E N D
Additional text in “Notes” view Building aDistributed Geospatial Library where we are now where we’re going what we’re facing Greg Janée gjanee@alexandria.ucsb.edu
Goals • Digital library for georeferenced information • distributed, autonomous nodes • heterogeneous • rich services • scalable • many providers • collections, large and small • Standard components, interfaces
collection registry thesaurus collection-level search shared vocabularies library content gazetteer item-level search, metadata management data access maps placenames to locations map background imagery, layering capability The big picture collection collection item item item item *many interconnections between services* item
internal collections generic database driver Z39.50 driver proxy driver collection aggregator Library server item tracker userinterface metadata mapper harvest loader client interface (XML / Java,HTTP,RMI) middleware access control; query fan-out; query result caching & ranking collection referencing & registration collection interface (XML / Java)
Issues 1. Finding the right participation model I have a collection o’ stuff, how do I join ADL? 2. Providing a complete solution I’m a map library, I want a library-in-a-box 3. Gaining adoption How do I add spatial searching to my DL? 4. Simple, effective spatial searching I want spatial search but I’m cheap and lazy
Assumes a relational database of metadata Collection described as a view of the database ADL provides template-based report generator mapping language extensible library of composable mapping components (“paradigms”) offline software package to generate collection statistics RDBMS provider Participation via database mapping ADL node config view
Spatial Informix Geodetic blade 4 box coordinates Temporal begin, end dates single integer year Hierarchical integer codes w/ code ancestor relationships constant Textual SQL LIKE substring matching Verity text engine IIT SIRE Numeric, Identification, ... Field adaptors qualification union concatenation constant Sample paradigms
A bucket mapping "subject-related-text" : UT.Bucket("textual",UT.standardTextualOperators,P.Adaptor_Concatenation( { "tag:sio.ucsd.edu:sioexplorer/nsdl_mif_dbc/subject" : P.Textual_LikeSubstring( "nsdl.nsdl_mif_dbc", "identifier", "subject", UT.Cardinality("1"), P.TextUtils.mappings. uppercaseAlphanumericOthersToWhitespace, P.TextUtils.deleteLists.keepAll, "UPPER"), "tag:sio.ucsd.edu:sioexplorer/subject-keywords" : P.Textual_Constant( "nsdl.nsdl_mif_dbc", "identifier", UT.Cardinality("1"), ["oceanographic data", "Stephen’s baby"]) ...
A bucket mapping "subject-related-text" : UT.Bucket("textual",UT.standardTextualOperators,P.Adaptor_Concatenation( { "tag:sio.ucsd.edu:sioexplorer/nsdl_mif_dbc/subject" : P.Textual_LikeSubstring( "nsdl.nsdl_mif_dbc", "identifier", "subject", UT.Cardinality("1"), P.TextUtils.mappings. uppercaseAlphanumericOthersToWhitespace, P.TextUtils.deleteLists.keepAll, "UPPER"), "tag:sio.ucsd.edu:sioexplorer/subject-keywords" : P.Textual_Constant( "nsdl.nsdl_mif_dbc", "identifier", UT.Cardinality("1"), ["oceanographic data", "Stephen’s baby"]) ...
Database mapping: an assessment • What’s good • data stays close to provider • collection-as-DB-view parallels real-world funding situation • nobody is paid to be an ADL node • What’s bad • high bar • must have database, good metadata, reasonable data modeling, appropriate indexes • complex configuration • multiple, different representations of same info • requires superhuman diligence • complex software • generic query translator compiler
Database is internal to ADL “Universal” schema supports all buckets, bucket types automates all indexing, bucket mappings, collection statistics enforces collection policies Provider supplies metadata entire XML documents via OAI or otherwise Mapping to ADL metadata views (bucket, browse, access) still required, but... simpler, higher-level no duplication ADL node RDBMS metadata provider Participation via metadata transfer config mapper
Issue 2: providing a complete solution • ADL provides: • discovery • Missing: • ingest, editing tools • management of... • metadata • data • data services • ...and synchronization of the above • workflow • A reasonable goal (?): • ADL provides complete map library solution
Issue 3: gaining adoption • Adoption by other DLs has been difficult • features (spatial search, buckets) not separable from architecture • nobody understands buckets anyway • The world speaks Dublin Core • we don’t • close doesn’t count
Adoption strategies • New, compelling reasons to use ADL! • harvesting automates collection building • metadata mapping will support qualified Dublin Core • Our proposal to NSDL/CI: • “search semantics” profile for qualified DC • generic search framework that supports • typed searches • over federated search services
Issue 4: design philosophy • “The right thing” • 1 : interface simplicity, correctness, consistency • 2 : implementation simplicity, completeness • “Worse is better” • 1 : implementation simplicity • 2 : interface simplicity • 3 : correctness, consistency • 4 : completeness • exemplified by Unix, C (Richard Gabriel, early ‘90s)
Our approach • We have the “right” interfaces • searching based on continuous geodetic coordinates • complex spatial representations (polygons, polylines, ...) • gazetteer (content & protocol) provides mapping to names • simple! • But... implementation is very difficult • polygons, etc. make life difficult at all levels • polygons require $$$ 3rd-party software • client integration with gazetteer is difficult • still don’t have a usable gazetteer
Other approaches • We pay a big price for our approach • spatial search was motivator for typed metadata • typed metadata is responsible for much of complexity • Might other approaches be equally effective? • simplified spatial models, e.g., boxes only • other coordinate systems (discrete, coded, ...) • cataloging against fixed gazetteer w/ topological relationships
Summary • Future directions • simpler participation model • collection-level discovery • remote deployment • NSDL/CI • Legacy • production-quality software • copiously documented • no known bugs, omissions, or bottlenecks • in step with MIL
Cast of characters • Dave Valentine • client, databases, testing, deployment • Catherine Masi • MIL collection development • Rudolf Nottrott • outreach, software development • Greg Janée • overall design, core software development • Jim Frew • guru
Issues 1. Finding the right participation model I have a collection o’ stuff, how do I join ADL? 2. Providing a complete solution I’m a map library, I want a library-in-a-box 3. Gaining adoption How do I add spatial searching to my DL? 4. Simple, effective spatial searching I want spatial search but I’m cheap and lazy