260 likes | 276 Views
“ Duplicate ” Entries in Gazetteers. jordan Hastings Department of Geography University of California Santa Barbara. Names & Features (1). Naming Features in the Environment Linguistic Necessity Identity and Ownership Navigation and Wayfinding Features Cover a Large Territory
E N D
“Duplicate” Entries in Gazetteers jordan HastingsDepartment of GeographyUniversity of CaliforniaSanta Barbara
Names & Features(1) • Naming Features in the Environment • Linguistic Necessity • Identity and Ownership • Navigation and Wayfinding • Features Cover a Large Territory • Crisp or Diffuse • Compact or Extended • Tangible or Abstract
Names & Features(2) • Locations are Numerous & Various • Multiscale • Generalized • Dis-coordinated • Time-variant
Names & Features(3) • Names are Numerous & Various • Polynymous • Mis-spelled • Multilingual • Time-variant
Names & Features(4) Lake Bigler, thru 1920s Lake Bonpland (also Bondland), thru 1890s Da-ow-a-ga, thru 1850s
Feature Types (1) • Dependable Type System • Because Features are “Objects” • Because Human Mind Categorizes • Types present in Taxonomy • Hierarchy is Natural in Environment • Because Human Mind Categorizes
Feature Types(2) – Examples Cultural Environment • Nations -> States -> Provinces -> Districts
Feature Types(2) - Examples • Physical Environment • Watersources: Springs-->Seeps • Watercourses: Rivers-->Streams-->Creeks • Waterbodies: Lakes-->Ponds-->Sloughs ?Glaciers
Fundaments (1) • Definition: Gazetteer A spatial dictionary of named & typed features in the environment • Implications • Features uniquely identified • Searchable by name and type • Also searchable geospatially
Fundaments (2) • Duplicates: An approximate notion • Firm types, ±close in hierarchy • Locations ±close dependent on scale • Names ±close dependent on language … or not at all • All aspects variant in time
Fundaments (3) • Database Implications / Support • Custom Datatypes • Hierarchy • Geometry • Multiple Attribution (unlimited) • Names • Locations • Efficient Geospatial Processing
Approach(1) • Independent Measures of Duplicates • 1. Type Thesaurus Metrics • Inter-feature: hierarchy, explicit linkages • 2. Geospatial Metrics • Intra-feature: size, compactness, … • Inter-feature: distance, overlap, … • 3. Geonomial Metrics • Intra-feature: NL translation [not considered yet] • Intra-feature: stemming, soundex, substitution
Approach(2) • Unified Assessment of Duplicates • Weighted Combination of Measures • 1 Type • 2 Location(s) • 3 Name(s) • Geographic Visualization, over Maps • Final Authority of Human Cataloger
Gazetteer “Duplicates”Processing Cycle random features prep grouped features rework
Gazetteer “Duplicates”Processing Cycle random features prep grouped features rework
random features prep grouped features weigh accepted suspended featuredatabase Gazetteer “Duplicates”Processing Cycle
random features prep grouped features weigh accepted suspended featuredatabase Gazetteer “Duplicates”Processing Cycle review
Gazetteer “Duplicates”Processing Cycle random features prep grouped features rework weigh review accepted suspended post featuredatabase reject trash