1 / 36

Linked data: P redicting missing properties

Klemen Simonic , Jan Rupnik , Primoz Skraba { klemen.simonic , jan.rupnik , primoz.skraba }@ijs.si. Linked data: P redicting missing properties. Overview. Linked Data (Motivation for the work) Problem Definition Approaches Results. An example. Linked Data.

megara
Download Presentation

Linked data: P redicting missing properties

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. KlemenSimonic, Jan Rupnik, PrimozSkraba {klemen.simonic, jan.rupnik, primoz.skraba}@ijs.si Linked data:Predicting missing properties

  2. Overview Linked Data (Motivation for the work) Problem Definition Approaches Results

  3. An example

  4. Linked Data • connect related data that was not previously linked • practicefor exposing, sharing, and connecting pieces of data and information • How: • URI (Uniform Resource Identifier) • RDF (Resource Description Framework)(description of how to model/present the data)

  5. Linked Data, tiny example

  6. Linked Data, tiny example

  7. Linked Data, one dataset • Nodes are resources • Edgesarerelations • Edge Labelsareproperties

  8. Linked Data cloud diagram

  9. DBpedia DBpediaextractedthe information from the infoboxesfrom the Wikipedia websites Resource Properties Literal Resource

  10. DBpedia DBraw contains all the properties from all the infoboxeswithin the English Wikipedia articles DBmapped the properties are unified(mapped onto a DBpediaontology). Semantic of properties:PlaceOfBirth = BirthPlace The data is much cleaner and is better structured than the raw properties dataset.

  11. Freebase An entity graph of people, places and things, built by people. • Colloborativeknowledge base • Property schemas • GoogleKnowledge graph

  12. Scale of Datasets DBpedia 3.7 version (additional properties and resources may be added in the meanwhile) Mesy and noisy dataset (Large number of different properties because they are not unified ) Largest and most structured dataset (Large number of edges and objects, and relatively small number of properties)

  13. Missing properties Problem: What are the missing properties for Fiat? For a given resource, we want a rank of missing properties by likelihood.

  14. Approach • Similar objects • Measure of similarity • Neighborhood • Ranking function

  15. Approach Ranking = weighted average of the k nearest-neighborobjects’ property frequency vectors. General framework (Kernel smoother): We can replace d with normalizedkernel function. (More math on this topic is in the paper.) The function g(o) depends on the choice of measure of closeness d(o,oi).

  16. Evaluation protocol • The evaluation procedure: • For a givenobject, we deleteone or more of its properties, denoting (o, {p1, …, pk} ) • Run the recommendation algorithm for the object • Compute several evaluation metrics

  17. Evaluation metrics • Inverse rank (IRank) = • Top 5 = • Top 10 =

  18. Measure of Closeness • Local Measures: local graph properties • Baselines: • Random Objects • Objects with Common Properties • Property Co-occurrence • Global Measures: global graph properties • Exogenous Measures: external information (text)

  19. Local Graph Measures • We focus on a local description, based on the property distributions: • PropertyCount • DirPropertyCount • NeighbDirProperyCount

  20. Random objects Choose uniformly at random some number of objects in the network

  21. Objects with common properties Take the objects which share a minimum number of properties with the query object The number of shared properties is taken as the weight for the object

  22. Property Co-occurence Approximate resource similarities through property co-occurrence patterns Only pairwise co-occurrences are considered for the purposes of scalability and feasibility of estimation

  23. Our method Each object is described by DirPropertyCountvector The similarity is determined by the computing the dot product between DirPropertyCount vectors

  24. Comparison

  25. Other Measure of Closeness • Local Measures:local graph properties • Baselines: • Random Objects • Objects with Common Properties • Property Co-occurrence • Global Measures: global graph properties • Exogenous Measures: external (no graph) information

  26. Global Graph Measures • We use two global measures of closeness based on graph geodesics and graph diffusion: • (We treat the graph as a simple undirected graph. We also remove all the literals and constants from the set of nodes to remove unintuitive paths.) • Shortest path length • The length of a shortest path between two objects • We calculate the distances corresponding to the k nearest objects • Exponential diffusion kernel • Based on computing the matrix exponential of the graph adjacency matrix A • Parameter α controls how local/global the similarities are • Takes into account both the total number of paths between nodes as well as their respective lengths • Robustmeasure

  27. Exogenous Measures • Independent of the graph structure • Rely on additional external information about the objects • Helpful for nodes with little connections in the graph • Textual information: • For some of the objects, we have extended abstracts describing the objects • TF-IDF weighting + cosine similarity

  28. Results - IRank

  29. Results - Top10

  30. In vs. Out properties

  31. Deleting several properties Method: DirPropertyCount vector Dataset: DBraw We remove a fixed fraction of in and out properties

  32. Degradation – nodes / edges The negative effect of deleting a fraction of edges or nodes from the network

  33. Degradation – properties The effect of deleting K most frequent properties from the network

  34. Conclusion • Method for predicting missing properties • Use kernel smoother • Measure similarityin a number of different ways: • Local properties • Global graph structure • External data (text) • Extensive experimentation • Investigate more on combining measures • More details about the research is in the paper: • Linked data:Predicting missing properties[machine learning] • Predicting Instance Properties inLinked Data[semantics of data]

  35. Take home message • Big redundancy / regularity in the data • Local measures perform well • Scale changes the structure -> we need different method

  36. Questions ? What’s Your Message?

More Related