230 likes | 410 Views
Exploring Situated Geoscience Concepts Boyan Brodaric Geological Survey of Canada Mark Gahegan GeoVISTA Center, Penn State Geography. Outline. Description of situated concepts Empirical analysis of geoscience mapping data Implications and conclusions.
E N D
Exploring Situated Geoscience ConceptsBoyan BrodaricGeological Survey of Canada Mark GaheganGeoVISTA Center, Penn State Geography
Outline • Description of situated concepts • Empirical analysis of geoscience mapping data • Implications and conclusions
Ontologies are a means of representing concepts found in scientific theories Theories Knowledge representation Information: real-time, archives, analyses Informatics resources People Collaboration, visualization, education resources Observations, measurements, experiments Instrumentation Models, simulations Supercomputing Geoscience Cyberinfrastructure Connectivity resources • E.g. GEON, SCEC, LEAD, CUAHSI, …
Upper-level • universal • definitional • identified by logic endurant geographic object • Domain-level • domain specific • spans geospace-time • definitional (e.g. NASC, 1983) • identified by 'essences' geologic formation (Millikan, 2000) • situated in geospace-time • situational • identified by 'histories' formation X • Individuals • single entity rock body #1 Geoscience Ontology • Abstraction levels for geoscience concepts granite country species aircraft granite of X western country human Airbus 320 granite of #1 Canada Boyan this Airbus 320
conception concept mind(s) kind (Millikan, 2000) • Human-scientist situations • data collection strategy • prior knowledge • capacities • beliefs, dispositions, etc. • Natural situations • unique natural history: • local physical conditions • processes and events artifacts of situations (human, natural) in data artifacts of situations (human, natural) in data ? Situated Concepts • Situation change concept change
definition-driven:clusters do not overlap due to separating conditions ('essences') prototypes: representative or prototypical properties that denote the concept prototypes: representative or prototypical properties that denote the concept clusterX data-driven:central tendency and typical properties align (natural situations) property B cluster Y situation-driven: cluster changes due to shifting context, concept evolution property A Data Clusters and Concepts • Types of data clusters and related concepts theory-driven:central tendency and ideal properties might not align: (human situations)
Goals and Objectives • Empirical support for situated geospatial concepts • Corroborate notion of situated concepts empirically • Expand understanding of situated concepts • Explore situated geoscience concept development • Study how such concepts are inferred from observed data in fieldwork • Look for situation effects in geoscience concept development Compare: property clusters & prototypes, thematic & geographic shifts
clusters field data visualization x1 x2 x3 x4 … x100 LVQ SOM SOM Sammon MMD d-t graph Analysis and Viz Methods • Clusters analyzed with: • Supervised classification (LVQ) • Unsupervised classification (SOM) • Mean median distance (MMD) • Visualization with: • Topology preserving SOM view • Distance preserving Sammon map • Distance-time graph this work
mmd1 = d23 + d13 + d13 + d34 p4 p1 4 p2 d34 d13 d23 mmd1 = measure of the diameter of the central tendency for a cluster mmd1 = measure of diameter of central tendency for a cluster p3 d23 = median distance between p2 and all other points Analysis Method 1 • Mean Median neighbour Distance 1 (intra-cluster) • Mean (mmd1) of the median distances (dij) between each point (pi) and every other point within a single cluster • Euclidean distance metric between points
cluster Y cluster X p4 d24 = median distance between p2 and all points in cluster Y p1 p5 d24 d51 d51 = median distance between p5 and all points in cluster X p2 p6 mmd2= measure of distance between central tendencies of two clusters mmd3 = measure of the distance (similarity) between central tendencies of two clusters mmd3 = d14 + d24 + d34 d43 + d51 + d61 ½( + ) p3 3 3 Analysis Method 2 • Mean Median neighbour Distance 2 (inter-cluster) • Mean (mmd2) of the median distances (dij) between points (pi) in X,Y • Euclidean distance metric between points • Asymmetry: mean (mmd3) of mmd2 (X,Y) and mmd2 (Y,X)
new points added in a time interval new points added in time interval new points added in time interval new points added in time interval extent extent core core cluster contraction similarity increase cluster stability similarity stability extent extent core core cluster stability similarity stability cluster expansion similarity decrease Viz Method • Interpreting Distance-Time graphs • Indicate cluster size / similarity change in time
Concepts (property and object prototypes) This work • standardized digital field data (rock composition, orientation, samples, photos,…) • GSC et al. mapping in 1998-99 individuation clustering classification Gr Ss cluster (properties) class (objects) • studying region concept development Case Study
convert to First Normal Form • > 1600 observations (vectors) • > 80 sparse dimensions (attributes) • 3 map classes (concepts A, B, C) • 3 geologists • ignore individuation data • space dominated by rock type weighting 20000 Plutonic … 20420 syenogranite 20460 monzogranite 20480 granodiorite … 21000 monzonite … 21020 quartz monzonite … 40000Hypabyssal … 60000Volcanic … 80000Metamorphic … 100000Sedimentary … n: mean dimensions populated per data vector (= 3.17). e: length of edge of hyperplane (= 10). d: minimum distance between hypercubes. n½e: maximum distance in a hypercube. Therefore: d > n½e d > 3.17½(10) d > 17.8 Set d = 20 e n½e e d Data Preparation • Convert site data to numeric vector • convert to First Normal Form • > 1600 observations (vectors) • > 80 sparse dimensions (attributes) • 3 map classes (concepts A, B, C) • 3 geologists • ignore data needed only for individuation
concept A concept B • individuals' clusters differ • difference in A > B • individuals' clusters differ • difference in A > B • individuals' clusters differ • difference in A > B • individuals' clusters differ • difference in A > B Previous Results (Brodaric & Gahegan 2002; 2004) • Impact of human situations • Geologists develop variably distinct clusters for concepts A and B
clusters proximal to prototype cluster sizes > prototype-cluster distances New Results • Data-driven indicators for concept C • Prototype and central tendency converge
prototype and clusters are distant New Results • Theory-driven indicators for concept B • Prototype and central tendency diverge
abrupt shift in thematic position abrupt shift in geographic position abrupt shift in geographic position abrupt change in location New Results • Response to natural situations • Correlation in time for shifts in thematic and geographic position
data duplicates present data duplicates removed New Results • Frequency of data occurrence • Abrupt shifts exaggerated with removal of duplicates in the data
Ontologies • Delineate situated and non-situated domain concepts • Capture historical human-natural situations in representation of meaning for situated concepts E.g. identify data used for concept discovery in data-driven concepts concept discovery concept discovery concept discovery concept discovery Implications • Ontologies • Delineate situated and non-situated domain concepts • Capture historical human-natural situations in representation of meaning for situated concepts
Implications • Map quality • Toward metrics for measuring quality of concept development • In contrast with classification quality and individuation quality • Methods and metrics might be used for: Identifying individuals with certain expertise for mapping crews Monitoring conceptual consistency of mapping crews
Conclusions and Future Work • Empirical support for geoscience concepts driven by theories, data, and structured around prototypes • Empirical support for situated geoscience concepts • Good support for impact of human (socio-cognitive) situations • Some support for impact of natural situations on such concepts • Potential for quantitative metrics and methods for evaluating quality of situated concepts • Future Work • Trace prototype development alongside cluster development, to better differentiate data-driven and theory-driven concepts, and transitions • In-situ shadowing and formal (vs ad-hoc) exit interviews • More rigorous development of mmd