Probabilistic Latent Semantic Analysis as a Potential Method for Integrating Spatial Data Concepts

Probabilistic Latent Semantic Analysis as a Potential Method for Integrating Spatial Data Concepts R.A. Wadsworth1, A.J. Comber2, P.F. Fisher2 • Centre for Ecology and Hydrology, Lancaster, UK • Dept of Geography, Leicester University, UK

We want to understand how the environment is changing. But, natural resource inventories constantly develop new base-lines. Therefore we want some way to know how similar two categories are so we can decide whether inconsistencies are change or error. Motivation

First we just asked people (domain experts) “are ‘a’ and ‘b’, similar or dis-similar or you’re not sure?” But, the domain expert has to make lots of choices, sometimes domain experts aren’t available, you don’t know why they think concepts are similar (or not), etc. so ... (Very) simple text mining – the more words used in common in two categories the more similar they are. Earlier approaches

In the proceedings we use land-cover categories, but, We’re all here because of Andrew ... So, what does his writing tell us about the underlying concepts behind his work? Case Study

Used the English language abstracts from the papers provided on his web site. This is a biased sample, do the other papers contain concepts not covered by the English language work? Do they contain collaborations I’ve missed? However, just want to illustrate the process ... Case Study – the data

Case Study – the data Red dots – collaborators Blue squares – papers in this analysis

Text Mining Andrew’s Abstracts “Object orientated modelling in GIS” “Processes in cadstre” “A formal model of correctness in cadstre” “Surveying mapping and LIS education in the USA” “Surveying education for the future” “Expert systems for GIS”

Guessing what the axis mean

If we knew what the underlying (hidden, latent) concepts are, we might be able to understand why two categories are considered to be similar. Why latent analysis?

It is a “generative model” Assumes: documents describe themes and words are associated with themes We observe the frequency of words in documents P(d,w) = P(d)∑zєZP(w|z)P(z|d) Therefore, we try and model what latent variables (z’s) exist. Probabilistic Latent Semantic Analysis

In practice similar to clustering but ... “Documents are not assigned to clusters, they are characterized by a specific mixture of factors with weights P(z|d). These mixing weights offer more modelling power and are conceptually very different from posterior probabilities in clustering models and (unsupervised) naive Bayes models.” Thomas Hofmann 1999 Probabilistic Latent Semantic Analysis

PLSA – iterative, stochastic

Nine Latent Themes in Andrew’s Work Cadastral systems, metadata and cartography? “B” “C” “A”

Latent Themes in Andrew’s work Education and Technology? “D” “E”

Latent Themes in Andrew’s work Decisions and Directions? “G” “F”

Latent Themes in Andrew’s work Data? “I” “H”

Simple text mining allows you to relate categories to each other, but, not always easy to say why. PLSA gives some indication of the underlying (fundamental?) themes, but, how stable or useful are the results ...? Conclusions

Thank you

Probabilistic Latent Semantic Analysis as a Potential Method for Integrating Spatial Data Concepts

Probabilistic Latent Semantic Analysis as a Potential Method for Integrating Spatial Data Concepts

Presentation Transcript

Spatial Data Analysis: Intro to Spatial Statistical Concepts

Latent Semantic Analysis (LSA)

Latent Semantic Indexing: A probabilistic Analysis

Latent Semantic Analysis

IR Models: Latent Semantic Analysis

Lecture 5: Probabilistic Latent Semantic Analysis

Lecture 5: Probabilistic Latent Semantic Analysis

Indexing by Latent Semantic Analysis

Bayesian Learning for Latent Semantic Analysis

Probabilistic Latent Semantic Analysis

Latent Semantic Analysis

Introducing Latent Semantic Analysis

Latent Semantic Indexing (mapping onto a smaller space of latent concepts)

Latent Semantic Indexing: A probabilistic Analysis

Latent Concepts and the Number Orthogonal Factors in Latent Semantic Analysis

Latent Semantic Analysis (LSA)

Spatial Data Analysis: Intro to Spatial Statistical Concepts

Latent Semantic Analysis