250 likes | 394 Views
1st Workshop of COST Action C21: "Ontologies for Urban Development: Interfacing Urban Information Systems" Building an Address Gazetteer on top of an Urban Network Ontology. J.Nogueras-Iso, F.J.López, J.Lacasta, F.J.Zarazaga-Soria, P.R.Muro-Medrano Geneva, 6-7 November 2006. Outline.
E N D
1st Workshop of COST Action C21: "Ontologies for Urban Development: Interfacing Urban Information Systems"Building an Address Gazetteer on top of anUrban Network Ontology J.Nogueras-Iso, F.J.López, J.Lacasta, F.J.Zarazaga-Soria, P.R.Muro-Medrano Geneva, 6-7 November 2006
Outline • 1. Introduction • 2. A typical use-case: IDEZar • 3. Ontology building using a manual mapping • 4. Ontology building using an automated approach • 5. Conclusions
1. Introduction • The increasing relevance of geographic information for decision-making and resource management in diverse areas promoted the creation of Spatial Data Infrastructures (SDI) • SDI: a coordinated approach to technology, policies, standards, and human resources necessary for the effective acquisition, management, distribution and utilization of GI at different organization levels and involving both public and private institutions • Gazetteer Service • A typical component of an SDI • Directory of instances of a class or classes of features containing some information regarding position • Looks up geographic feature locations based on geographic identifiers
Address Gazetteer Service • In SDIs for local administrations such as a city council, • address gazetteer services represent one of the most important services that the councils must offer to their citizens • An Address Gazetteer Service • Specialized on Urban Network Features (addresses) • The councils are responsible for the management of urban networks, and these networks are used as reference information for other services at national level such as cadaster or census services
Creation of the contents of a gazetteer • It usually requires combining multiple repositories • The same feature (concept) is stored in different repositories, each of them contributing with a different piece of attribute information • Typical problems of heterogeneity • Different data models (roles, granularity), encoding • Our proposal to deal with heterogeneity in this context: • Build an urban network ontology upon existing feature types taxonomies
2. A typical use-case: IDEZar • The IDEZar Project is the result of a collaboration agreement signed in March 2004 between the City Council and the University of Zaragoza • Zaragoza is a medium-sized city (some 650000 inhabitants), in the northeast of Spain (capital of Aragón), growing fast in extension and population. The municipality is about 1000 km2 and includes several towns • Objective: development of a local SDI for Zaragoza • To facilitate, increase and coordinate the use of spatial data by the Council • To develop applications for the citizens and to provide them with access to public sector information
IDEE (National SDI) IDEZar (Local SDI) IDEAr (Aragón – Regional SDI) • <<WMS>> • Base • Street maps • <<WMS>> • Environment-Thematic • Agenda 21, protected areas... • <<WMS>> • IDEE-Base • Base map up to 1:25000 of Spain • <<WMS>> • Base • Orthoimages • <<WCAS>> • Catalog • <<WMS>> • Urban-Thematic • Public services (libraries, police stations...) • Private services (pharmacies, parkings...) • <<Gazetteer>> • IDEE-Nomenclátor • Toponyms • <<Gazetteer>> • Street names GeoPortal • <<Route planner>> • Arriving at Zaragoza Street Map and Gazetteer IDEZar Service Architecture http://www.zaragoza.es/idezar/
IDEZar AYTO Addresses ranges Statistics Office TVIAN National Statistics Institute TVIAN Street types Street names Informatics Office AYTO Zaragoza City Council Electoral Census Inhabitant Census Addresses Addresses Maps Tax Office SIGLA Urban Planning Office AYTO,SIGLA Site development updates Town planning updates Addresses updates Street names Addresses Maps Property Census Amends (streets, addresses) National Cadaster Office SIGLA Address related repositories • Multiple repositories • Not very different models • Feature = name + type + additional info (location, range, …) • But different taxonomies for urban network feature types • Not specially synchronized
IDEZar AYTO Addresses ranges Statistics Office TVIAN National Statistics Institute TVIAN Street types Street names Informatics Office AYTO Zaragoza City Council Electoral Census Inhabitant Census Addresses Addresses Maps Tax Office SIGLA Urban Planning Office AYTO,SIGLA Site development updates Town planning updates Addresses updates Street names Addresses Maps Property Census Amends (streets, addresses) National Cadaster Office SIGLA Address related repositories • Statistics Office repository • Inhabitant/poll census, exchanges from/to National Statistics Institute • TVIAN (Tipo de Vía Normalizada): standardized network feature types of the National Statistics Institute
IDEZar AYTO Addresses ranges Statistics Office TVIAN National Statistics Institute TVIAN Street types Street names Informatics Office AYTO Zaragoza City Council Electoral Census Inhabitant Census Addresses Addresses Maps Tax Office SIGLA Urban Planning Office AYTO,SIGLA Site development updates Town planning updates Addresses updates Street names Addresses Maps Property Census Amends (streets, addresses) National Cadaster Office SIGLA Address related repositories • Cadaster Office repository • Land/Tax management, exchanges from/to National Cadaster Office • SIGLA: network feature types of the Cadaster office
IDEZar AYTO Addresses ranges Statistics Office TVIAN National Statistics Institute TVIAN Street types Street names Informatics Office AYTO Zaragoza City Council Electoral Census Inhabitant Census Addresses Addresses Maps Tax Office SIGLA Urban Planning Office AYTO,SIGLA Site development updates Town planning updates Addresses updates Street names Addresses Maps Property Census Amends (streets, addresses) National Cadaster Office SIGLA Address related repositories • Informatics Office repository • Central repository used for assignation of new street names • AYTO: Network feature types of the council
Gazetteer content creation • Why do we need to combine both 3 repositories? • Not all features are in the 3 repositories • Attribute information is distributed in the different repositories
Gazetteer content creation II • Problems found while combining • Matching can not be based uniquely on feature names • 2 features may differ in typology but not in name (Spain square vs Spain avenue) • Which is the most appropriate feature type taxonomy for the gazetteer contents? • Solution proposed: define a urban network ontology • An ontology defines explicitly the concepts and relations between these concepts in a domain • This ontology will provide a unified model of the feature types that can be found in this domain • Making the necessary mappings to the particular taxonomies use in the different council offices or external organizations
TVIAN AYTO SIGLA How to build up the ontology • The construction of ontologies upon existing vocabularies is a classical and widely used approach • The underlying problem (ontology alignment) • How to find the relationships that hold between the entities represented in different taxonomies • Two approaches for the ontology construction • Manual mapping approach • Automated approach
AYTO (City Council) SIGLA (Cadaster) RESIDENTIAL DEVELOPMENT PEDESTRIAN STREET COUNTRY HOUSE (SOUTH OF SPAIN) SQUARE SQUARE PEDESTRIAN STREET SEGMENT MINOR ROAD STREET STREET TVIAN AYTO SIGLA MINOR ROAD Concepts Acronyms “CL” “AN” “CN” “CM” “PZ” “PL” “CLP” “CLTP” “CL” “CN” 3. Manual Mapping approach • Matching of terms (names + acronyms) between the different taxonomies • Difficulties: lack of semantic descriptions • Categories of matches • Exact match • Partial match: one concept is broader or narrower No match • Provisional match: taxonomy errors (homonyms) imply erroneous matches
TVIAN AYTO SIGLA URBISOC A more flexible approach • Previous approach • Too time expensive and with little scalability • Improvement • Use of well-established shared common core ontology and make mappings between the distinct sources and this common core • New experiment: Use of URBISOC thesaurus • a thesaurus focused on Spanish terminology for Town Planning • developed by the CINDOC/CSIC institute (Centre for Scientific Information and Documentation / Spanish National Research Council)
A more flexible approach II • Use of Towntology ontology editor • Focused on ontology construction • Storage of concepts with several definitions that are in a process of selection and characterization • Although improving scalability, still time expensive and error prone
TVIAN AYTO SIGLA generated 4. Ontology building using an automated approach • Why? • Manual mappings are time expensive • Some mappings may not be successful because content creators have not assigned the correct feature type • Technique proposed • Formal Concept Analysis (1980, Wille &Ganter …) • It enables the extraction of a hierarchy of concepts from the feature instances contained in the source repositories
Basics of FCA • Definition of formal contexts, triple (G,M,I) • G: objects • M: attributes • I: binary relation between G and M, incidence matrix • It is possible to extract formal concepts • Given AG and BM, a pair (A,B) is a formal concept if and only if • the set of all attributes shared by the objects in A is identical with B • A is also the set of all the objects which have in common with each other the attributes in B • Additionally it is possible to establish a subconcept-superconcept relation • (A1,B1)(A2,B2) A1A2 ( B2B1)
Applying FCA • How to obtain a unique repository of instances, i.e. the formal context required by FCA? • Traditional datalinking has been applied to the feature instances contained in the different databases • based on the analysis of the lexical and spatial similarities of feature attributes • Transform the datalinking matrix into the incidence matrix • Each checked cell (match of source features) generates an object/instance in the incidence matrix • The columns correspond with the transformation of urban network feature type codes (e.g., AYTO CODE, SIGLA CODE) into proper attributes with boolean values
2718 features 18 AYTO codes 4318 features 35 SIGLA codes Datalinking matrix Incidence matrix Replace by code
Obtain the concept lattice NEXT CLOSED SET algorithm (Ganter 87) Incidence matrix FCA supremum (least common superconcept) Concept Lattice AYTO_PL SIGLA_PZ (square) SIGLA_AV (avenue) … SIGLA_CL (street) Only attributes SIGLA_CL AYTO_AN (carfree designed street) SIGLA_CL AYTO_CLP (pedestrianized street) SIGLA_CL AYTO_CL (traffic allowed street) AYTO_AV SIGLA_AV (traffic allowed avenue) SIGLA_AV AYTO_AVP (pedestrian avenue) infimum (greatest common subconcept) Applying FCA
Results • Experiment: combining COUNCIL_FEATURE and CADASTER_FEATURE databases • A concept lattice of 36 concepts from the original 53 concepts • Identification of equivalent concepts in in both taxonomies, • e.g., square (PL in AYTO and PZ in SIGLA) • And also subconcept-superconcept relations. • E.g., identification of street as a broader concept in SIGLA (CL), which has narrower concepts in the AYTO • traffic-allowed streets (CL) • pedestrianized streets (CLP) • Or carfree-designed streets (AN).
5. Conclusions • FCA approach seems to be more flexible • Dynamic building of the ontology (at least, a draft) • We don’t need to define the concepts, we just need to observe the data that exists • We have created a domain specific ontology that facilitate the interoperability (synchronization, update and merge) of the separate repositories • Future lines • Improve the efficiency of the method • Enrich the generated concepts with commonalities found in other feature attributes of the instances (e.g., geometry, perimeter, area) • Apply to other domains • Hydrology: NMA vs Water Agency repositories
Advanced Information Systems Laboratory http://iaaa.cps.unizar.es