500 likes | 743 Views
Civic Location Data eXchange Format (CLDXF). Christian Jacqz Director, MassGIS , Commonwealth of Massachusetts Member NENA Core Services / Data Structures / CLDXF Work Group. Purposes of CLDXF.
E N D
Civic Location Data eXchange Format (CLDXF) Christian Jacqz Director, MassGIS, Commonwealth of Massachusetts Member NENA Core Services / Data Structures / CLDXF Work Group
Purposes of CLDXF • Support the exchange of address data by providing “definitive set of core civic location data elements” • Ensure portability of address data • Permit efficient design of software systems • Meet functional needs of call-routing and dispatch • Does not include all elements needed for local address data management • No address ID, no metadata, no data quality checks
Purposes of CLDXF • Map a profile between IETF PIDF-LO and NENA PIDF - Presence Information Data Format • “hello, it’s me and I’m waiting for an answer” LO - Location Object • “this is exactly where I am” • coordinate location or civic address • CLDXF added two (minor) elements to PIDF-LO and dropped six elements
Purposes of CLDXF • Map elements to FGDC address standard FGDC - Federal Geographic Data Committee United States Thoroughfare, Landmark, & Postal Address Data Standard • Sponsored by NENA and URISA, managed by Census • Over 10 years in development • More complex than CLDXF • Provide illustrative examples of parsing • There’s a lot of weird addresses out there!!
Why address standard is so important • Data standardization per CLDXF will greatly facilitate “matching” between address records How to ensure that two records that refer to the same address can be matched in a database, without human intervention? • Street name match is most important • Unit matching is most difficult Matching between datasets goes beyond the explicit goals of the standard but is (in my view) a tremendously important benefit of implementing the standard However, remember that addressing authority has final say on name - additional content standards may be required • Fourth v. 4th
US Census Local Parcels Field data collection Localupdates address standard Commercial data provider Tax List Utilities ALI DB Voter List Why address standard is so important • A standard allows for automated matching between many different address lists & mapping sources • ALI database may not provide a complete list geographic tabular
How is CLDXF different from other standards? • No abbreviations • except State and Country • More levels of geography • Municipalities, communities, neighborhoods • Boundaries matter! • Complete parsing of street names • Fixes deficiencies in existing telco and USPS formats
How is CLDXF different? • Covers all possible numbering schemes • Number prefix, number, number suffix • Provides structure for subaddressinformation • Solves “kitchen sink” problem • Supports precision in address down to room & seat
How is CLDXF different? • XML standard • XML is a extensible markup language – documents must be “well-formed” with nested tags etc. • About data, not presentation • Additionally, XML schemas and namespaces validate an XML document and ensure elements are unique <note> <to>Ed</to> <from>Martha</from> <heading>Reminder</heading> <body>Take some time off!</body> </note>
How is CLDXF different? • Data elements vs. database fields • In XML you have required or optional elements in a database, the field layout is fixed and records can have null values • In XML, nested hierarchy of tags is specified in a schema in a database table, there is no hierarchy (although parent-child relationships are sometimes supported) • In XML, tags may be allowed to repeat within a “record”in a database, one record has one value in one field
About each element • CLDXF <-> PIDF-LO correspondence • What is it? (and definition source) • Examples • Data type • Does it have a domain? • Mandatory/conditional/optional • How many of this element? • Notes
CLDXF element groups • Country, State, and Place Name • Street Name • Address Number • Landmark Name • Subaddress • Address Descriptor
Country, State, and Place Names The easy ones – large geographies, well-defined legal status • Country Name / Country (Country) – mandatory • two-letter ISO code • State Name / State (A1) – mandatory • two-letter USPS code • Place Name / County (A2) - mandatory • The name of county or county-equivalent where the address is located.
Country, State, and Place Names • Where is a given street? • What place names are needed to make street names unique? • Incorporated Municipality (A3) – mandatory (“unincorporated” as default) • The general-purpose local governmental unit where the address is located • Must have legally established boundaries. • Need domain of muni names.
Addresses and boundaries • “…where the address is located.” All these structures are located in Cambridge but addressed in Belmont. You can’t list an address for Grove Street in Cambridge – because this Grove Street is not in Cambridge and there very well might be another Grove Street that is Grove Street CAMBRIDGE BELMONT
Addresses and boundaries • “…where the address is located.” In which municipality is this address located?
More Kinds of Place Names • Unincorporated Community (A4) - optional • Within an incorporated municipality, or in an unincorporated portion of a county • If not mapped, may be difficult to use • Distinguish from landmark – not single use or under single ownership and control. • Neighborhood Community (A5) – optional • Neighborhood, subdivision or small commercial area. • Postal Community Name and ZIP Code (PCN, PC) -optional but strongly recommended
National Domains for Place Names • Country, state, county, postal town and zip code have domains • Local or statewide domains for incorporated municipalities • Include type of place e.g. ‘Township of North Hampden’ v. ‘Borough of North Hampden’ • Mapping of boundaries makes use of any place name much more useful
How MA uses place names • All of MA is incorporated municipalities (A3) • Survey level boundaries, legally defined • “MSAG community” is the geography in MA that ensures the uniqueness of street names (A4) • A4 boundaries are mapped, and strictly nested within A3 • Distinguish from PSAP boundaries – existing MSAG has a real problem with this • Zip codes are useful, but a nightmare to map
Parts of Street Name ( PIDF-LO element ) • Street Name Pre Modifier ( PRM ) • Street Name Pre Directional ( PRD ) • Street Name Pre Type ( STP ) • Street Name Pre Type Separator (added to US profile of PIDF-LO to match FGDC) ( STPS ) • Street Name ( RD ) • Street Name Post Type ( STS ) • Street Name Post Directional ( POD ) • Street Name Post Modifier ( POM )
Familiar elements • No abbreviations. IMHO, this is a very good thing. • Example: “N JOHNSON TR”Is it “NORTH JOHNSON TRAIL”or “NEIL JOHNSON TERRACE” • Any list of abbreviations will need constant maintenance • Domains for Pre/Post Types at http://technet.nena.org/nrs/registry/_registries.xml
A Few Twists on Familiar Elements • Two types • Multiword types • Local Knowledge Required
Not-so-familiar elements • Modifiers • Separated from name, not a type word or phrase • Separated from name, before or after directional
Not-so-familiar elements (continued) • Street Name Pre Type Separator • Added to match FGDC Separator Element • Preposition or prepositional phrase that “separates” pre type from name • ‘northbound’ and ‘southbound’ modifiers
Content standards to support matching • In CLDXF, local addressing authority has broad discretion about what goes into the name • Domains apply types and directionals, not modifers • IMHO, NENA should recommend best practices for street name content, such as: • No abbreviations (maybe except often mis-spelled honorifics “LIEUTENANT” = “LT”, “MONSIGNOR” = “MSGR”) • Use “official” name including special characters (“MARY’S WAY”, note that CLDXF supports these) • Have a rule for numbering – e.g. “First” through “Tenth” , then “11th” and up
Issues with domains • If you are trying to support legacy systems with linear geocoding • What do you do with feature names like “Apartments” “Commons” that don’t properly refer to a linear, drivable feature • All kinds of things can appear on a street sign that are legitimate streets with no type or implicit type (the latter is “BROADWAY”) “BLUE FIN” “SAIL-A-WAY” “ASSINNASHAMAYAK”
Parsing Street Names using CLDXF • Open source parser to implement CLDXF • Process street names as raw inputs • Identify possible element types for each word or phrase using lookup of abbreviations (~1000 records) to domains for directionals and types, also listing of base names which could be otherwise interpreted • Enforce ordering of elements • Score viable candidates • Annotate invalid records
Very simplified automated parsing example • “E ST” lookup: Element order: Rules: RD POM PRD POD RD Cartesian product X RD, STS STP or STS – must have type RD – must have name
What was the point of that?! • CLDXF can be implemented in code to standardize street names and to deal with all aspects of parsing and matching except: • Alternate spellings of base name • MLK Blvd v. Martin Luther King Blvd • MsgrOBrien v. Monsignor Martin J. O’Brien • Msschsts Ave v. Massachusetts Ave • Concatenation of full street name and subaddress • Location , unit, building or other info • Ambiguous sequence of address number • 47 | A J Handy Drive v. 47 A | J Handy Drive
Address Number • What is an address number? • Ideally, the number part indicates a location in sequence along a road, respecting parity • At a minimum, the full address number uniquely identifies one of the following • a site or a group of structures • a single structure • a part of a structure • or some other location like an undeveloped parcel with reference to a named street
Address Number – further thoughts • Unfortunately, address numbers are often used to encode other kinds of information: • Sector • Cross street or block • Building, Floor, Unit Decoding the pattern may be useful • Splitting the full number into prefix, number and suffix should preserve the sequence information, if any • Zero should not be used to indicate no address number
Weirdo numbering • sample number parsing – odd cases • Again, if possible, decode the assignment • Mileposts:
Parsing Quiz • 123 North Street
Parsing Quiz • Tunnel Massachusetts Bay Transit Authority Green Line Haymarket to North Station
Parsing Quiz • 289 ½ Broadway South
Parsing Quiz • 22 A West Virginia Avenue
Parsing Quiz • Interstate Highway 495 northbound
Parsing Quiz • Old State Route 1
Parsing Quiz • A-17 Warren Street Court
Parsing Quiz • Avenue C Loop
Parsing Quiz • Summit County Road 99
Parsing Quiz • 72 Road to the River
Parsing Quiz • 14-16 Main Street (trick question)
Landmarks and landmark parts • Landmark: “Name by which a prominent feature is publicly known” • Landmark part added by CLDXF as extension of PIDF-LO. Usually involves a geographic hierarchy. • Landmark part is a repeating tag, so doesn’t neatly translate to fields • Order is not specified, (e.g. smallest -> largest) parts concatenated with spaces
One way to manage landmark parts • If you are managing “sites” as a separate geographic layer, with sub-sites and named buildings mapped: • When is a something “a prominent feature, publicly known”? • When is something a building and when is it a landmark? • Note: a landmark is a complete, valid address
Subaddress elements (PIDF-LO) • In USPS or ALI database, typically unstructured info • CLDXF provides hierarchy of building, floor, unit • FGDC allows for flexibility in typing subaddress components; CLDXF suggests type word included
Subaddress issues • Not always clear what goes into “building” v. “landmark” • “generic” identifiers, like numbers or letters, go into building field, whereas names go into landmark field, but “publicly known?” • Many ways “building,” “floor” or “unit” can be represented or abbreviated in inputs • Identifiers can be encoded into unit field – this may be an area for content standards • “#5” “Apt. 5” “Unit 5” “No. 5” all refer to Unit 5 • “7B” “A-5C” “B12” all contain reference to building or floor as well as unit
One last element • Not part of the address, but an attribute • Domain is - http://www.iana.org/assignments/location-type-registry/locationtype-registry.xml