1 / 32

NaturalGeo : Final Presentation

NaturalGeo : Final Presentation. Dr Kristin Stock and Mr Javid Yousaf University of Nottingham. Project Goal. To develop methods for natural language spatial querying How to map natural language expressions to queries. car parks beside the river. What kinds of expressions?.

Download Presentation

NaturalGeo : Final Presentation

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. NaturalGeo: Final Presentation Dr Kristin Stock and Mr Javid Yousaf University of Nottingham

  2. Project Goal To develop methods for natural language spatial querying How to map natural language expressions to queries. car parks beside the river

  3. What kinds of expressions? • the car park beside the river • a field on the corner of the lane • the route follows the lane • the hall is on this quadrangle • the tramline at town x • the park contains trails can be used as queries generically (e.g. all car parks that are beside a river), or with a place name (the car park beside the River Trent)

  4. Scope • Containment • Collocation (same place) • Adjacency • Alignment • Object parthood • Sidedness

  5. Why? • Easier access to OS data products, vs. • Limited place name/postcode search; • Advanced and complex tools. • Extraction of location from text documents. • Potentially, generation of language descriptions. • Easier access = increased potential for data use.

  6. What has already been done? • Mainly mathematical models for specific natural language terms. e.g: • Topology, fixed formal model • Some models that include context, like near (e.g. model density etc).

  7. What does NaturalGeo add? • Takes a ‘whole of language’ approach. • Considers context.

  8. How? • Memory/instance based learning. • Use a store of expressions whose interpretation is known. • For next expressions, find most semantically similar known expression, and use that interpretation.

  9. How do we represent interpretations? • Geometric Configuration Ontology (GCO). • 50 types of geometric configurations between pairs of objects • Each defined with a query.

  10. GCO profiles We can represent the meaning of a geospatial expression using a GCO profile

  11. GCO profiles and queries • Then we can create a query based on the GCO profile. • Query composition required. • Decision between: • conjunctive inclusion (multiple concepts to represent the relation) • eliminating some relations due to weakness in selection GCOConceptx⋀ GCOConcepty⋀ GCOConceptz

  12. How do we know what the GCO profiles are for an expression? • Questionnaire of 2000 expressions. • Users selected best diagrams, diagrams depict GCO concepts. • So we have GCO profiles for 2000 expressions. • Use some as ‘known’ expressions, the rest for evaluation.

  13. Interpreting a new expression • In simple terms: • For expression x, we find the most similar known expression y. • We know the GCO profile for y. • GCO profile for x = GCO profile for y. • But, we may look at a the most similar group of expressions (and their GCO profiles) to try to get best results.

  14. The big question… • How do we find the ‘most similar’ expression?

  15. First, we parse the expression… • Identify: • Locatum(object being located) • Relatum(object used as a reference) • Verb • Preposition • Spatial adverb • Division nouns for relatum and locatum (e.g. part of) • Div noun adjective the station isright by the side of the river

  16. Then, we compare like with like… the stationisright by the side of the river the station is locatedin the city centre using 4 comparison methods

  17. Method 0: Baseline • similarity score = count of matching components/max number of components the station isright by the side of the river 0 0 1 0 the station is locatedin the city centre similarity score = 1/6 = 0.16667

  18. Method 1: Word Distribution Similarity • similarity score = ∑ word distribution similarity of element pairs/max number of populated elements • cosine method the station isright by the side of the river 0.3 0.5 1 0.6 the station is locatedin the city centre similarity score = 2.4/6 = 0.4

  19. Method 2: Ontology-based Similarity • similarity score = ∑ (1-normalised semantic distance) of element pairs/max number of populated elements • dependent on ontology structure. the station isright by the side of the river 0.3 0.5 1 0.6 the station is locatedin the city centre similarity score = 2.4/6 = 0.4

  20. Method 3: Geolinguistic Factor Similarity • Same as method 2 for all elements except relatum and locatum. • For relatum and locatum, we determine similarity of geolinguistic factors, not of the feature types themselves. • Geolinguistic factors, factors thought to be significant in use of language • image-schemata • geometry type • liquid/solid • scale • axial structure…..

  21. the station isright by the side of the river 0.2 0.5 1 0.6 the station is locatedin the city centre image schemata: 1 shared, 3 max = 1/3 geometry: 0 shared, 1 max axial structure: 0 shared, 1 max scale: 2 shared, 3 max = 2/3 liquid/solid: 0 shared, 1 max Total 1/5 = 0.2 c.f. street/river, could be 0.7 or 0.8

  22. LAGO • Geolinguistic factors contained in the Linguistically Augmented Geospatial Ontology (LAGO). • Extends OS ontologies with geolinguistic factors.

  23. Analysis (1) • Broad measures of success: • Similarity of GCO profiles for most highly matched expressions. • Most similar expressions should have most similar GCO profiles, if similarity is being measured correctly. • Using simple measures: • correlation (pearson) between our score and GCO similarity (spearman) – should be maximised (<=1) • average difference between our score and GCO similarity – should be minimised.

  24. Analysis (2) • Which method is best? • How does the size of the kb affect results? • Which elements (relatum, locatum, verb) have the greatest impact on the results? • Which geolinguistic factors have the greatest impact on the results? • How does the success of the method vary with different spatial relations? • How transferable is the method to different spatial relations and what is required of the knowledgebase?

  25. To do (June) • General refinements/improvements: • Parsing of expressions. • Method 1: cosine method (DISCO) returns very low numbers • Method 2: Wordnet network distance matching, WS4J methods not very good, implement our own method. • Matching of terms to LAGO, currently uses hyponyms, hypernyms and synonyms, inconsistent ordering. • Improve speed. • Improve/extend overall measures of success, currently: • spearman corrcoeff of similarity score and GCO profile correlation (pearson) (trying to maximise) • average of difference between similarity score and GCO profile correlation (trying to minimise)

  26. And then… • Analysis. • Query composition methods. • Methods for selecting the best GCO profile (single best match, or combined multiple matches?). • Comparison with mathematical models. • Refine methods: weightings? different measures of similariy? • More geolinguistic factors. • Richer geolinguistic factor model.

  27. Conclusions • Proof of concept so far. • Framework set up. • Now we have the opportunity to test, refine and further develop the method. • Then, the next goal: • Can we use the data we have in the kb (2000 expressions) to discover patterns and infer GCO profiles for expressions for which there are no close matches in the kb (e.g. new spatial relations etc)?

More Related