190 likes | 319 Views
Semantic Interoperability for Geographic Information Systems. The Illinois DLI Project. Tobun Dorbin Ng Artificial Intelligence Lab http://ai.bpa.arizona.edu The University of Arizona DLI Project-wide Workshop (Berkeley) January 5-6, 1998. Challenge for DLs.
E N D
Semantic Interoperability for Geographic Information Systems The Illinois DLI Project Tobun Dorbin Ng Artificial Intelligence Lab http://ai.bpa.arizona.edu The University of Arizona DLI Project-wide Workshop (Berkeley) January 5-6, 1998
Challenge for DLs • Information Infrastructure Technology and Application (IITA) Working Group • workshop in May 1995 • The “Grand Challenge” • interoperability at a deep semantic level • providing DL users with a coherent view of heterogeneous autonomously managed resources
Semantic Interoperability • “The ability of a user to access, consistently and coherently, similar (though autonomously defined and managed) classes of digital objects and services distributed across heterogeneous repositories, with federating or mediating software compensating for site-by-site variations.” • Provides systems for cross-correlating items of information across multiple sources to solve problems
Semantic Interoperability Environment Concept Spaces Level of Semantic Abstraction Information Consultant GUI fine-grained Communication Concept Communication Multi- media • Semantic • Analysis • Reasoning • by • Spreading • Activation • Browsing coarse-grained Category Spaces Create Access Agent Interface Collaboration Landsat AVHRR Full Text Data Management DEM Aerial Photo Abstract Data Communication Structured Types Image Video Voice Text Distributed, Heterogeneous Database Collections An Architecture for Scalable Semantic Interoperability
Semantic Components • Structure: nodes and links Fine-grained Concept Spaces Concepts Categories Category Spaces Coarse-grained
Geographic Information Systems Testbed • GeoRef Information Services, American Geological Institute (AGI) • 350K records, 400 Mbytes, 1981-1995 • GeoRef Thesaurus, 27K terms • Geo-referenced records • Petroleum Abstracts Service, University of Tulsa • 500K records, 400 Mbytes, 1984-1995 • Compendex, Engineering Information, Inc. • 22K records, 50 Mbytes, 1992-1995 • 42 geoscience-related domain areas
GIS Testbed (cont’d) • Aerial Photos, UC at Santa Barbara • 1000 images, 32 Mbytes each • Geo-referenced • Advanced Very High Resolution Radiometer (AVHRR), NASA • 1993 global data, 2 Gbytes • Geo-referenced • Geographic Name Information System (GNIS), US Geological Survey • 56K place names & their geographic coordinates
Semantics in Text • Term Phrases as Concepts • Automatic Indexing • Extract term phrases from unstructured free text • Form term phrases from adjacent words • Apply stopwords • Structural Fields • Pre-assigned indices • Author names • Vector Space Model • Term & document frequencies
Semantics in Image • Image Tiles and Regions as Concepts • Create Image Tiles: 128x128 pixel subsets • Extract Features using Gabor Filters • Gabor filters: scale tunable edge & line detectors • Apply in 6 orientations & 5 scales • A tile: 30 pairs of means and standard deviations • Segment Image using Texture Flow Analysis • Group adjacent tiles with similar textures • Determine texture flow with direction & energy • Define boundaries by opposite orientations
Semantics in Satellite Numerical Data • AVHRR Data from NASA’s Pathfinder • Afternoon observations over all land and coastal zones • Spatial resolution: 8 km • 5 channels of electromagnetic spectrum • Vegetation Density as Concept • Normalized Difference Vegetation Index (NDVI) • Non-vegetation (-1.0) to green vegetation (+1.0) • Temperature as Concept • Convert the radiances from channels 4 & 5 • Use GNIS to name each 8-km unit
Semantic Analysis: Concept Space • Co-occurrence Analysis Algorithm
Semantic Analysis: Category Space • Kohonen Self-organizing Maps (SOM) Algorithm • Initialize input & output nodes, connection weights • Present record (vector of N features) in order • Compute distances to all nodes: • Select winning node j* (minimum dj) & update weights to node j* and neighbors • Label regions in category space • Apply the above steps recursively for large regions
System Implementation • Analysis using 32-node SGI Origin2000 • Textual concept spaces: 15 hr, 32 nodes • Feature Extraction: 100 images, 24 hrs, 32 nodes • Texture category space: 28 images, 6 hrs, 1 node • AVHRR category space: California, 2 hrs, 1 node • Web Interface • Java front-end • CGI-bin servers for all information retrieval • Server size: 7 Gbytes • Text (2.5 Gbytes), image (4 Gbytes), AVHRR (0.5 Gbytes)
User Study 1: Textual Concept Space • Concept vs. Keyword Search • 12 subjects with geoscience backgrounds • Each subject performed 4 searches using both • concept search: use concepts to retrieve documents • keyword search: use keywords to retrieve documents • Decisions judged by a subject expert • Recall of concept search (53%) was significantly better than that of keyword search (37%) • Precision of concept search (38%) was no worse than that of keyword search (36%)
User Study 2: Textual Category Space • Browse 2-dimensional hierarchical category space • 12 subjects with geoscience backgrounds • Qualitative study • Positive feedback: • Spatial factor & color • Beneficial to non-experts • Novelty of graphical representation • Negative feedback: • No search capability • No systematic organization of terms
User Study 3: Image Analysis • 3 experiments • Similarity Analysis: human visual perception vs. Euclidean distance on Gabor features • Segmentation: human vs texture flow analysis • Categorization: human vs SOM algorithm • 10 subjects in each experiment • 10 images used, each has 192 tiles • Decisions judged by a remote sensing expert
User Study 3 (cont’d) Recall Precision subj sys subj sys Similarity Analysis 78% 66% 43% 48% Segmentation 60% 53% 67% 53% Categorization 40% 42% 35% 34% • Positive findings: • System is as good as human in retrieving images • Set of Gabor features is a good representative of texture • Room for improvement: • Need other low level image features (shape, contrast) • Need a better similarity measure