1 / 46

Using linked data to interpret tables

Using linked data to interpret tables. Varish Mulwad , Tim Finin , Zareen Syed and Anupam Joshi University of Maryland, Baltimore County November 8, 2010. Interpreting a table. http://dbpedia.org/class/yago/NationalBasketballAssociationTeams. dbprop:team.

cadee
Download Presentation

Using linked data to interpret tables

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Using linked data to interpret tables VarishMulwad, Tim Finin, ZareenSyed and Anupam JoshiUniversity of Maryland, Baltimore County November 8, 2010

  2. Interpreting a table http://dbpedia.org/class/yago/NationalBasketballAssociationTeams dbprop:team http://dbpedia.org/resource/Allen_Iverson Map numbers as values of properties

  3. Interpreting a table @prefix dbpedia: <http://dbpedia.org/resource/> . @prefix dbpedia-owl: <http://dbpedia.org/ontology/> . @prefix yago: <http://dbpedia.org/class/yago/> . "Name"@en is rdfs:label of dbpedia-owl:BasketballPlayer . "Team"@en is rdfs:label of yago:NationalBasketballAssociationTeams . "Michael Jordan"@en is rdfs:label of dbpedia:Michael Jordan . dbpedia:Michael Jordan a dbpedia-owl:BasketballPlayer . "Chicago Bulls"@en is rdfs:label of dbpedia:Chicago Bulls . dbpedia:Chicago Bulls a yago:NationalBasketballAssociationTeams .

  4. Use Cases Intelligent querying over data Create a ‘Semantic’ knowledge-base

  5. Use Cases @prefix dbpedia: <http://dbpedia.org/resource/> . @prefix dbpedia-owl: <http://dbpedia.org/ontology/> . @prefix yago: <http://dbpedia.org/class/yago/> . "Name"@en is rdfs:label of dbpedia-owl:BasketballPlayer . "Team"@en is rdfs:label of yago:NationalBasketballAssociationTeams . "Michael Jordan"@en is rdfs:label of dbpedia:Michael Jordan . dbpedia:Michael Jordan a dbpedia-owl:BasketballPlayer . "Chicago Bulls"@en is rdfs:label of dbpedia:Chicago Bulls . dbpedia:Chicago Bulls a yago:NationalBasketballAssociationTeams . • Data Integration • Search / Query over tables Convert legacy data into Semantic Web formats Confirm/Verify existing knowledgeAdd new knowledge to the LOD cloud

  6. Motivation and Related Work

  7. We are laying a strong foundation for the Semantic Web … … but an old problem haunts us …

  8. Chicken ? Egg ? … No Chicken ? • ~ 14.1 billion tables, 154 million with high quality relational data (Cafarella et al. 2008) • 305,632 Datasets available as CSV or spreadsheets on Data.gov (US) + 7 Other nations establishing open data • Where is structured data ?

  9. Automate the process • Not practical for humans to encode all this into RDF manually • We need systems that can generate data from existing sources

  10. Related Work • Database to Ontology mapping (Barrasa, scar Corcho, & Gmez-prez 2004), (Hu & Qu 2007), (Papapanagiotou et al. 2006), and (Lawrence 2004) • Mapping Relational databases to RDF [W3C working group – RDB2RDF]

  11. Related Work • Mapping spreadsheets to RDF [RDF123, XLWrap] • Practical and helpful systems but … • Require significant manual work • Do not generate linked data • Interpreting web tables to answer complex search queries over the web tables (Limaye et al. 2010)

  12. T2LD Framework T2LD Framework Predict Class for Columns Linking the table cells Identify and Discover relations

  13. T2LD Framework Predict Class for Columns Linking the table cells Identify and Discover relations

  14. Predicting Class Labels for column Class Class 1 Class for the column Class 2 Class 3 Class 4 Instance

  15. Knowledge Base Yago Wikitology1 – A hybrid knowledge base where structured data meets unstructured data 1 – Wikitology was created as part of Zareen Syed’s Ph.D. dissertation

  16. Querying the Knowledge–Base Types {dbpedia-owl:Place,dbpedia-owl:City,yago:WomenArtist,yago:LivingPeople,yago:NationalBasketballAssociationTeams } 1. Chicago Bulls 2. Chicago 3. Judy Chicago 1. Philadelphia 2. Philadelphia 76ers 3. Philadelphia (film) {dbpedia-owl:Place, dbpedia-owl:PopulatedPlace, dbpedia-owl:Film,yago:NationalBasketballAssociationTeams …. ….. ….. } 1. Houston Rockets 2. Houston 3. Allan Houston {……………………………………………………………. }

  17. Scoring the classes Possible Classes for the column - dbpedia-owl:Place dbpedia-owl:City yago:WomenArtist yago:LivingPeople yago:NationalBasketballAssociationTeams dbpedia-owl:PopulatedPlace dbpedia-owl:Film… … … [Chicago, dbpedia-owl:City] [Philadelphia, dbpedia-owl:City] [Houston, dbpedia-owl:City] …. …. [Chicago,dbpedia-owl:Film] [Philadelphia,dbpedia-owl:Film] … … … E.g. Processing class – “Chicago,yago:NationalBasketballAssociationTeams” String Chicago: (R = 1) Chicago Bulls {yago:NationalBasketballAssociationTeams} [PR = 6] (R = 2) Chicago {dbpedia-owl:PopulatedPlace, dbpedia-owl:City} [PR = 5] (R = 3) Judy Chicago {yago:WomenArtist,yago:LivingPeople} [PR = 4] Score = w x ( 1 / R ) + (1 – w) x (Normalized Page Rank) [Chicago, yago:NationalBasketballAssociationTeams] = (0.25 x 1 / 1 ) + (0.75 x 6 / 7) = 0.892

  18. T2LD Framework Predict Class for Columns Linking the table cells Identify and Discover relations

  19. Machine Learning based Approach Requery KB with predicted class labels as additional evidence Generate a feature vector for the top N results of the query Table Cell + Column Header + Row Data + Column Type A second classifier decides whether to link or not Classifier ranks the entities within the set of possible results Select the highest ranked entity Link to the top ranked instance Link to “NIL”

  20. Learning to Rank • We trained a SVMrank classifier which learnt to rank entities within a given set Similarity Measures • Levenshtein distance • Dice Score Feature Vector • Wikitology Score • PageRank • Page Length Popularity Measures

  21. “To Link or not to Link … ’’ • A second SVM classifier • Feature vector included the feature vector of the top ranked entity and additional two features – • The SVMrank score of the top ranked entity • The difference in scores between the top two ranked entities

  22. T2LD Framework Predict Class for Columns Linking the table cells Identify and Discover relations

  23. Identify Relations Rel ‘A’ Rel ‘A’ Rel ‘A’, ‘C’ Rel ‘A’, ‘B’, ‘C’ Rel ‘A’, ‘B’

  24. Relation between columns Michael Jordan - Chicago Allen Iverson - Philadelphia Yao Ming - Houston dbprop:team Candidate relations dbprop:teamdbprop:draftTeam dbprop:team dbprop:draftTeam dbprop:team

  25. Scoring the relations dbprop:teamScore:3 Candidates: dbprop:teamdbprop:draftTeam dbprop:draftTeamScore: 0 dbprop:team Michael Jordan - Chicago Allen Iverson – Philadelphia Yao Ming - Houston dbprop:teamdbprop:draftTeam dbprop:draftTeamScore:1 dbprop:team

  26. T2LD Framework Predict Class for Columns Linking the table cells Identify and Discover relations

  27. Annotating web tables for the Semantic Web

  28. Table as linked RDF @prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> . @prefix dbpedia: <http://dbpedia.org/resource/> . @prefix dbpedia-owl: <http://dbpedia.org/ontology/> . @prefix yago: <http://dbpedia.org/class/yago/> . "Name"@en is rdfs:label of dbpedia-owl:BasketballPlayer . "Team"@en is rdfs:label of yago:NationalBasketballAssociationTeams . "Michael Jordan"@en is rdfs:label of dbpedia:Michael Jordan . dbpedia:Michael Jordan a dbpedia-owl:BasketballPlayer . "Chicago Bulls"@en is rdfs:label of dbpedia:Chicago Bulls . dbpedia:Chicago Bulls a yago:NationalBasketballAssociationTeams . “Team”@en is rdfs:label of dbpedia-owl:Team . “Team” is the common / human name for the class dbpedia-owl:Team dbpedia:Chicago_Bulls a yago:NationalBasketballAssociationTeams . dbpedia:Chicago_Bulls is a type (instance) yago:NationalBasketballAssociationTeams

  29. Results

  30. Dataset summary * The number in the brackets indicates # excluding columns that contained numbers

  31. Dataset summary

  32. Dataset summary

  33. Evaluation for class label predictions

  34. Evaluation # 1 (MAP) • Compared the system’s ranked list of labels against a human ranked list of labels • Metric - Mean Average Precision (MAP) • Commonly used in the Information Retrieval domain to compare two ranked sets

  35. Evaluation # 1 (MAP) 80.76 % System Ranked: 1. Person2. Politician3. President Evaluator Ranked: 1. President2. Politician3. OfficeHolder

  36. Evaluation # 2 (Recall) System Ranked: 1. Person2. Politician3. President Evaluator Ranked: 1. President2. Politician3. OfficeHolder Recall > 0.6 (75 %)

  37. Evaluation # 3 (Correctness) • Evaluated whether our predicted class labels were “fair and correct” • Class label may not be the most accurate one, but may be correct. • E.g. dbpedia-owl:PopulatedPlace is not the most accurate, but still a correct label for column of cities • Three human judges evaluated our predicted class labels

  38. Evaluation # 3 (Correctness) Column – NationalityPrediction – MilitaryConflict Column – Birth PlacePrediction – PopulatedPlace Overall Accuracy: 76.92 % • A category-wise breakdown for class label correctness

  39. Evaluation for linking table cells to entities

  40. Category-wise accuracy for linking table cells Overall Accuracy: 66.12 %

  41. Relation between columns • Idea – Ask human evaluators to identify relations between columns in a given table • Pilot Experiment – Asked three evaluators to annotate five random tables from our dataset • Evaluators identified 20 relations • Our accuracy – 5 out of 20 (25 % ) were correct

  42. Conclusion and Future Work

  43. Conclusion • We have demonstrated that it is possible to develop a automated framework for converting tables & spreadsheets to linked data • Extending and adapting this framework for Open government data • Discovery of new relations between entities

  44. References • Cafarella, M. J., Halevy, A., Wang, D. Z., Wu, E., Zhang, Y., 2008. Webtables:exploring the power of tables on the web. Proc. VLDB Endow.1 (1), 538-549. • Barrasa, J., Corcho, O., Gomez-perez, A., 2004. R2o, an extensible and semantically based database-to-ontology mapping language. In Proceedings of the 2nd Workshop on Semantic Web and Databases(SWDB2004). Vol. 3372. pp. 1069-1070. • Hu, W., and Qu, Y. 2007. Discovering simple mappings between relational database schemas and ontologies. In Aberer, K.; Choi, K.-S.; Noy, N. F.; Allemang, D.; Lee, K.-I.; Nixon, L. J. B.; Golbeck, J.; Mika, P.; Maynard, D.; Mizoguchi, R.; Schreiber, G.;and Cudre-Mauroux, P., eds., ISWC/ASWC, volume 4825 of Lecture Notes in Computer Science, 225238. Springer. • Papapanagiotou, P.; Katsiouli, P.; Tsetsos, V.; Anagnostopoulos, C.; and Hadjiefthymiades, S. 2006. Ronto: Relational to ontology schema matching. In AISSIGSEMIS BULLETIN.

  45. References • Lawrence, E. D. R. 2004. Composing mappings between schemas using a reference ontology. In In Proceedings of International Conference on Ontologies, Databases and Application of Semantics (ODBASE), 783800. Springer • Han, L.; Finin, T.; Parr, C.; Sachs, J.; and Joshi, A. 2008. RDF123: from Spreadsheets to RDF. In Seventh International Semantic Web Conference. Springer. • Han, L., Finin, T., Yesha, Y., 2009. Finding semantic web ontology terms from words. In: Proceedings of the Eight International Semantic Web Conference. Springer. • Limaye, G., Sarawagi, S., Chakrabarti, S.: Annotating and searching web tables using entities, types and relationships. In: Proc. of the 36th Int'l Conference on Very Large Databases (VLDB). (2010)

  46. This work was supported by:

More Related