1 / 49

Civic Location Data eXchange Format (CLDXF)

Civic Location Data eXchange Format (CLDXF). Christian Jacqz Director, MassGIS , Commonwealth of Massachusetts Member NENA Core Services / Data Structures / CLDXF Work Group. Purposes of CLDXF.

lavada
Download Presentation

Civic Location Data eXchange Format (CLDXF)

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Civic Location Data eXchange Format (CLDXF) Christian Jacqz Director, MassGIS, Commonwealth of Massachusetts Member NENA Core Services / Data Structures / CLDXF Work Group

  2. Purposes of CLDXF • Support the exchange of address data by providing “definitive set of core civic location data elements” • Ensure portability of address data • Permit efficient design of software systems • Meet functional needs of call-routing and dispatch • Does not include all elements needed for local address data management • No address ID, no metadata, no data quality checks

  3. Purposes of CLDXF • Map a profile between IETF PIDF-LO and NENA PIDF - Presence Information Data Format • “hello, it’s me and I’m waiting for an answer” LO - Location Object • “this is exactly where I am” • coordinate location or civic address • CLDXF added two (minor) elements to PIDF-LO and dropped six elements

  4. Purposes of CLDXF • Map elements to FGDC address standard FGDC - Federal Geographic Data Committee United States Thoroughfare, Landmark, & Postal Address Data Standard • Sponsored by NENA and URISA, managed by Census • Over 10 years in development • More complex than CLDXF • Provide illustrative examples of parsing • There’s a lot of weird addresses out there!!

  5. Why address standard is so important • Data standardization per CLDXF will greatly facilitate “matching” between address records How to ensure that two records that refer to the same address can be matched in a database, without human intervention? • Street name match is most important • Unit matching is most difficult Matching between datasets goes beyond the explicit goals of the standard but is (in my view) a tremendously important benefit of implementing the standard However, remember that addressing authority has final say on name - additional content standards may be required • Fourth v. 4th

  6. US Census Local Parcels Field data collection Localupdates address standard Commercial data provider Tax List Utilities ALI DB Voter List Why address standard is so important • A standard allows for automated matching between many different address lists & mapping sources • ALI database may not provide a complete list geographic tabular

  7. How is CLDXF different from other standards? • No abbreviations • except State and Country • More levels of geography • Municipalities, communities, neighborhoods • Boundaries matter! • Complete parsing of street names • Fixes deficiencies in existing telco and USPS formats

  8. How is CLDXF different? • Covers all possible numbering schemes • Number prefix, number, number suffix • Provides structure for subaddressinformation • Solves “kitchen sink” problem • Supports precision in address down to room & seat

  9. How is CLDXF different? • XML standard • XML is a extensible markup language – documents must be “well-formed” with nested tags etc. • About data, not presentation • Additionally, XML schemas and namespaces validate an XML document and ensure elements are unique <note> <to>Ed</to> <from>Martha</from> <heading>Reminder</heading> <body>Take some time off!</body> </note>

  10. How is CLDXF different? • Data elements vs. database fields • In XML you have required or optional elements in a database, the field layout is fixed and records can have null values • In XML, nested hierarchy of tags is specified in a schema in a database table, there is no hierarchy (although parent-child relationships are sometimes supported) • In XML, tags may be allowed to repeat within a “record”in a database, one record has one value in one field

  11. About each element • CLDXF <-> PIDF-LO correspondence • What is it? (and definition source) • Examples • Data type • Does it have a domain? • Mandatory/conditional/optional • How many of this element? • Notes

  12. CLDXF element groups • Country, State, and Place Name • Street Name • Address Number • Landmark Name • Subaddress • Address Descriptor

  13. Country, State, and Place Names The easy ones – large geographies, well-defined legal status • Country Name / Country (Country) – mandatory • two-letter ISO code • State Name / State (A1) – mandatory • two-letter USPS code • Place Name / County (A2) - mandatory • The name of county or county-equivalent where the address is located.

  14. Country, State, and Place Names • Where is a given street? • What place names are needed to make street names unique? • Incorporated Municipality (A3) – mandatory (“unincorporated” as default) • The general-purpose local governmental unit where the address is located • Must have legally established boundaries. • Need domain of muni names.

  15. Addresses and boundaries • “…where the address is located.” All these structures are located in Cambridge but addressed in Belmont. You can’t list an address for Grove Street in Cambridge – because this Grove Street is not in Cambridge and there very well might be another Grove Street that is Grove Street CAMBRIDGE BELMONT

  16. Addresses and boundaries • “…where the address is located.” In which municipality is this address located?

  17. More Kinds of Place Names • Unincorporated Community (A4) - optional • Within an incorporated municipality, or in an unincorporated portion of a county • If not mapped, may be difficult to use • Distinguish from landmark – not single use or under single ownership and control. • Neighborhood Community (A5) – optional • Neighborhood, subdivision or small commercial area. • Postal Community Name and ZIP Code (PCN, PC) -optional but strongly recommended

  18. National Domains for Place Names • Country, state, county, postal town and zip code have domains • Local or statewide domains for incorporated municipalities • Include type of place e.g. ‘Township of North Hampden’ v. ‘Borough of North Hampden’ • Mapping of boundaries makes use of any place name much more useful

  19. How MA uses place names • All of MA is incorporated municipalities (A3) • Survey level boundaries, legally defined • “MSAG community” is the geography in MA that ensures the uniqueness of street names (A4) • A4 boundaries are mapped, and strictly nested within A3 • Distinguish from PSAP boundaries – existing MSAG has a real problem with this • Zip codes are useful, but a nightmare to map

  20. Parts of Street Name ( PIDF-LO element ) • Street Name Pre Modifier ( PRM ) • Street Name Pre Directional ( PRD ) • Street Name Pre Type ( STP ) • Street Name Pre Type Separator (added to US profile of PIDF-LO to match FGDC) ( STPS ) • Street Name ( RD ) • Street Name Post Type ( STS ) • Street Name Post Directional ( POD ) • Street Name Post Modifier ( POM )

  21. Familiar elements • No abbreviations. IMHO, this is a very good thing. • Example: “N JOHNSON TR”Is it “NORTH JOHNSON TRAIL”or “NEIL JOHNSON TERRACE” • Any list of abbreviations will need constant maintenance • Domains for Pre/Post Types at http://technet.nena.org/nrs/registry/_registries.xml

  22. A Few Twists on Familiar Elements • Two types • Multiword types • Local Knowledge Required

  23. Not-so-familiar elements • Modifiers • Separated from name, not a type word or phrase • Separated from name, before or after directional

  24. Not-so-familiar elements (continued) • Street Name Pre Type Separator • Added to match FGDC Separator Element • Preposition or prepositional phrase that “separates” pre type from name • ‘northbound’ and ‘southbound’ modifiers

  25. Content standards to support matching • In CLDXF, local addressing authority has broad discretion about what goes into the name • Domains apply types and directionals, not modifers • IMHO, NENA should recommend best practices for street name content, such as: • No abbreviations (maybe except often mis-spelled honorifics “LIEUTENANT” = “LT”, “MONSIGNOR” = “MSGR”) • Use “official” name including special characters (“MARY’S WAY”, note that CLDXF supports these) • Have a rule for numbering – e.g. “First” through “Tenth” , then “11th” and up

  26. Issues with domains • If you are trying to support legacy systems with linear geocoding • What do you do with feature names like “Apartments” “Commons” that don’t properly refer to a linear, drivable feature • All kinds of things can appear on a street sign that are legitimate streets with no type or implicit type (the latter is “BROADWAY”) “BLUE FIN” “SAIL-A-WAY” “ASSINNASHAMAYAK”

  27. Parsing Street Names using CLDXF • Open source parser to implement CLDXF • Process street names as raw inputs • Identify possible element types for each word or phrase using lookup of abbreviations (~1000 records) to domains for directionals and types, also listing of base names which could be otherwise interpreted • Enforce ordering of elements • Score viable candidates • Annotate invalid records

  28. Very simplified automated parsing example • “E ST” lookup: Element order: Rules: RD POM PRD POD RD Cartesian product X RD, STS STP or STS – must have type RD – must have name

  29. What was the point of that?! • CLDXF can be implemented in code to standardize street names and to deal with all aspects of parsing and matching except: • Alternate spellings of base name • MLK Blvd v. Martin Luther King Blvd • MsgrOBrien v. Monsignor Martin J. O’Brien • Msschsts Ave v. Massachusetts Ave • Concatenation of full street name and subaddress • Location , unit, building or other info • Ambiguous sequence of address number • 47 | A J Handy Drive v. 47 A | J Handy Drive

  30. Address Number • What is an address number? • Ideally, the number part indicates a location in sequence along a road, respecting parity • At a minimum, the full address number uniquely identifies one of the following • a site or a group of structures • a single structure • a part of a structure • or some other location like an undeveloped parcel with reference to a named street

  31. Address Number – further thoughts • Unfortunately, address numbers are often used to encode other kinds of information: • Sector • Cross street or block • Building, Floor, Unit Decoding the pattern may be useful • Splitting the full number into prefix, number and suffix should preserve the sequence information, if any • Zero should not be used to indicate no address number

  32. Weirdo numbering • sample number parsing – odd cases • Again, if possible, decode the assignment • Mileposts:

  33. Parsing Quiz • 123 North Street

  34. Parsing Quiz • Tunnel Massachusetts Bay Transit Authority Green Line Haymarket to North Station

  35. Parsing Quiz • 289 ½ Broadway South

  36. Parsing Quiz • 22 A West Virginia Avenue

  37. Parsing Quiz • Interstate Highway 495 northbound

  38. Parsing Quiz • Old State Route 1

  39. Parsing Quiz • A-17 Warren Street Court

  40. Parsing Quiz • Avenue C Loop

  41. Parsing Quiz • Summit County Road 99

  42. Parsing Quiz • 72 Road to the River

  43. Parsing Quiz • 14-16 Main Street (trick question)

  44. Landmarks and landmark parts • Landmark: “Name by which a prominent feature is publicly known” • Landmark part added by CLDXF as extension of PIDF-LO. Usually involves a geographic hierarchy. • Landmark part is a repeating tag, so doesn’t neatly translate to fields • Order is not specified, (e.g. smallest -> largest) parts concatenated with spaces

  45. One way to manage landmark parts • If you are managing “sites” as a separate geographic layer, with sub-sites and named buildings mapped: • When is a something “a prominent feature, publicly known”? • When is something a building and when is it a landmark? • Note: a landmark is a complete, valid address

  46. Subaddress elements (PIDF-LO) • In USPS or ALI database, typically unstructured info • CLDXF provides hierarchy of building, floor, unit • FGDC allows for flexibility in typing subaddress components; CLDXF suggests type word included

  47. Subaddress examples

  48. Subaddress issues • Not always clear what goes into “building” v. “landmark” • “generic” identifiers, like numbers or letters, go into building field, whereas names go into landmark field, but “publicly known?” • Many ways “building,” “floor” or “unit” can be represented or abbreviated in inputs • Identifiers can be encoded into unit field – this may be an area for content standards • “#5” “Apt. 5” “Unit 5” “No. 5” all refer to Unit 5 • “7B” “A-5C” “B12” all contain reference to building or floor as well as unit

  49. One last element • Not part of the address, but an attribute • Domain is - http://www.iana.org/assignments/location-type-registry/locationtype-registry.xml

More Related