On building a high performance gazetteer database Amittai Axelrod MetaCarta Inc

On building a high performance gazetteer databaseAmittai AxelrodMetaCarta Inc

Thanks to Keith Baker Kenneth Baker Michael Bukatin András Kornai

Plan of the talk • Database background • Relating geographic names and features • Handling ambiguities and inconsistencies in geographic names • Classification and storage system for geographic features

Databases • No DB (faking it with flat files) -- clumsy • Record-oriented -- still runs the world • Relational -- making headway • Object-oriented -- still very academic • For MetaCarta GazDB, relational approach made most sense: • Overlapping records (McKinley/Denali) • Need for frequent updates of subparts of records

Gazetteer production process

Conversion scripts • Enforce uniform structure on the data • Normalize across sources (e.g. lat/lon to decimal degrees, spelling, …) • Configuration required once per source • Load data in GazDB • Combination perl/SQL

Relating features and names

Other tables used in GazDB • Population • Elevation • Language • Feature type • Source/versioning info • Temporal extent • Hierarchical information • Confidence • Comments • Change logs (full auditing)

Geographic names • Internationalization • Full Unicode (UTF8) support • Maintain detail language information (SIL) • Name resolution • Canonical form (16 bits) • Display form (8 bit) • Search form (6 bit) • Authoritativeness • Explicitness

Updating a name in the GazDB

Geographic features • Spatial representations • Point, line, area, … • Functional classes • Building, field, campus, city, … • Administrative types • Nation, province, county, international org, …

Export scripts • Read GazDB • Select which fields to include in custom output • Creates .gbdm (MetaCarta format) binaries • Combination perl/SQL • Not yet general across binary output formats

Conclusions • Accept multiple sources (only configure once per source) • Fast loading of large datasets (1m entries per hour on linux desktop) • Simple update procedure • Outputting large binary custom gazetteers for different purposes at extreme speeds (1m entries per minute)

On building a high performance gazetteer database Amittai Axelrod MetaCarta Inc

On building a high performance gazetteer database Amittai Axelrod MetaCarta Inc

Presentation Transcript

Building High Performance Teams

High Performance Green Building

Building High Performance Classrooms

Building a Database on S3

Gazetteer

High Performance Building Standard Regulations

Building a High Performance Team

Building a Database

Gazetteer database

On building a high performance gazetteer database Amittai Axelrod MetaCarta Inc

C-JDBC: a High Performance Database Clustering Middleware

Building a High Performance Workplace

Open source, high performance database

Building a High Performance Team

Building a High Performance Team

High Performance Database Design

Ferrier Builders, Inc., building High Performance homes as

Building High-Performance Enterprise XML Applications with Oracle Database 10g

Building high-performance text classifiers on a limited labeling budget

Building a database on PSC a Dutch perspective

Building a High Performance Team