1 / 47

Linked Data: Survey of Adoption

Linked Data: Survey of Adoption. Aidan Hogan. Day 2 Session 2. Linked Open Data …so, what’s out there?. The Web of Data !. August 2007. November 2007. February 2008. March 2008. September 2008. March 2009. July 2009. September 2010.

yamin
Download Presentation

Linked Data: Survey of Adoption

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Linked Data: Survey of Adoption Aidan Hogan Day 2 Session 2

  2. Linked Open Data …so, what’s out there?

  3. The Web of Data! August 2007 November 2007 February 2008 March 2008 September 2008 March 2009 July 2009 September 2010 Images from:http://richard.cyganiak.de/2007/10/lod/; Cyganiak, Jentzsch

  4. Publications Media User-generated Government Cross-Domain Geographic Life sciences

  5. Anatomy of the LOD cloud: cross-domain • Freebase • ~300 million triples • General knowledge • User contributed • Acquired by Google • OpenCalais • ~4.5 million triples • Thomson Reuters export • OpenCyc • ~2 million triples • Upper ontology concepts • DBpedia • ~1 billion triples • Exports from Wikipedia • Central hub • Yago • ~19 million triples • Smaller/more precise data from Wikipedia • WordNet • ~4.5 million triples • Synonyms, etc.

  6. User-generated Cross-Domain

  7. Anatomy of the LOD cloud: user-generated • semanticweb.org • ~50 thousand triples • SemWeb related topics • Semantic Media Wiki! • Revyu • ~20 thousand triples • User contributed reviews • FlickrWrappr • ~56 million triples • Exports from photo site • DogFood • ~200 thousand triples • SemWeb confs. and papers • RDF ohloh • ~700 thousand triples • Exports from open-source development site

  8. Publications User-generated

  9. Anatomy of the LOD cloud: publications Library Exports • DBLP • ~28 million triples • Com. Sci. publications • DBLP • ~28 million triples • Com. Sci. publications • DBLP • ~28 million triples • Com. Sci. publications • ePrints • ~8.4 million triples • ePrints exporter Academic Publications

  10. User-generated Life sciences

  11. Anatomy of the LOD cloud: life-sciences • Drug Bank • ~800 thousand triples • Detailed pharmacology for FDA-approved drugs • Sider • ~200 thousand triples • Drug side-effects • DailyMed • ~200 thousand triples • Detaileddrug info from NLM • DiseaseSome • ~91 thousand triples • Disorders and disease • LinkedCT • ~7 million triples • Clinical trials info • UniProt • 100’s millions triples • Info on proteins and sequences • PubMed • 800 million triples • HCLS publications

  12. Geographic Life sciences

  13. Anatomy of the LOD cloud: geographical • GeoNames • ~100 million triples • 10 million places with lat, long, population, subdivisions, post-codes, etc. • 2000 U.S. Census • ~1 billion triples • Population statistics per geographical location • Linked Sensor Data • ~1 billion triples • Sensor observations from 20 thousand weather observatories • Linked GeoData • ~3 billion triples • OpenStreetMap geolocations

  14. Government Geographic

  15. Anatomy of the LOD cloud: governmental • UK Legislation • ~2 billion triples • UK primary and secondary legislation info • NASA • ~100 thousand triples • Spacecraft, star catalogues, etc. • EuroStat • ~5 million triples • Various statistics for EU countries • UK Postcodes • ~27 million triples • Every UK postcode • GovTrack • ~13 million triples • US Congress bills, sponsorship, voting records

  16. Media Government

  17. Anatomy of the LOD cloud: media • Music (Various) • 100’s millions triples • MySpace • AudioScrobbler • MusicBrainz • discogs • LastFM • Music (Various) • 100’s millions triples • MySpace • AudioScrobbler • MusicBrainz • discogs • LastFM • Music (Various) • 100’s millions triples • MySpace • AudioScrobbler • MusicBrainz • discogs • LastFM • Music (Various) • 100’s millions triples • MySpace • AudioScrobbler • MusicBrainz • discogs • LastFM • Music (Various) • 100’s millions triples • MySpace • AudioScrobbler • MusicBrainz • discogs • LastFM • Poképédia • ~115 thousand triples • Everything you ever wanted to know about Pokémon (but were afraid to ask) • BBC Programmes • ~60 million triples • Extensive info on BBC TV and radio programmes • New York Times • ~400 thousand triples • Extensive news vocabulary and cat. schemes • Linked Movie Database • ~6 million triples • Movie database • Open (smaller) version of IMDb

  18. Publications Media User-generated Government Cross-Domain Geographic Life sciences

  19. Linked Open Data?

  20. Graph Structure (i): Clustering Life-sciences (esp. Bio2RDF) Publications (esp. RKB) Core Image from http://blog.larkc.eu/?p=1941;C. Guéret

  21. Graph Structure (ii): Interlinkage Interactive http://gromgull.net/2010/01/swball/swball.svg;G.A. Grimnes

  22. Graph Structure (iii): owl:sameAs linkage Interactive http://inkdroid.org/empirical-cloud/;E. Summers

  23. Linked Open Data?

  24. Licencing • CC Attribution, non-commercial, share alike • Public Domain (No IP) • CC Attribution, Share-alike • GFDL: GNU Free Documentation • CC Attribution • Custom (ad-hoc) • No known licence Image by L. Dodds

  25. SPARQL • 66% of the datasets have a SPARQL endpoint • 35% offer an RDF dump See http://www.w3.org/wiki/SparqlEndpoints

  26. Linked Open Data?

  27. Data Overview • 207 datasets • 68 (33%)  published directly by data producers • 137 (67%)   published by third-parties Info from http://www4.wiwiss.fu-berlin.de/lodcloud/state/:; Bizer, Jentzsch, Cyganiak

  28. Data Overview • 207 datasets • ~28 billion triples • Highest volume from large legacy producers Info from http://www4.wiwiss.fu-berlin.de/lodcloud/state/:; Bizer, Jentzsch, Cyganiak

  29. Data Overview • 207 datasets • ~28 billion triples • ~395 million links • More links in more focused domains • (Or datasets by the same group) Info from http://www4.wiwiss.fu-berlin.de/lodcloud/state/:; Bizer, Jentzsch, Cyganiak

  30. Linked Open Vocabularies?

  31. (Linked) Vocabularies Overview … • Formalised using RDFS and OWL standards introduced yesterday • (Typically OWL Full) … Image fromhttp://blog.dbtune.org/public/.081005_lod_constellation_m.jpg:; Giasson, Bergman

  32. (Linked) Vocabularies: Dublin Core (DC) • Dublin Core • Models terms for personal information Table fromhttp://dublincore.org/documents/dcmi-terms/

  33. (Linked) Vocabularies: FOAF • Friend Of A Friend • Models terms for personal information Image fromhttp://www.deri.ie/fileadmin/images/blog/:; Breslin

  34. (Linked) Vocabularies: SIOC • Semantically Interlinked Online Communities • Models terms for online communities and presence Image fromhttp://rdfs.org/sioc/spec/ :;Bojārs, Breslin et al.

  35. (Linked) Vocabularies: SKOS • Simple Knowledge Organization System • Metavocabulary for concepts schemes Image fromhttp://www.w3.org/TR/swbp-skos-core-guide:; Miles, Brickley

  36. (Linked) Vocabularies: FOAF+SIOC+SKOS • Example of how vocabularies can interleave Image fromhttp://sioc-project.org/node/158;; Breslin

  37. (Linked) Vocabularies: DOAP • Description Of A Project • Models terms for projects(research, software, etc.) Image fromhttp://code.google.com/p/baetle/wiki/DoapOntology ;; Breslin

  38. (Linked) Vocabularies: Music Ontology • Models terms for music artists, songs, albums etc. • (very detailed) Image fromhttp://musicontology.com/;; Raimond, Giasson

  39. (Linked) Vocabularies: GoodRelations (i) • Models terms for e-commerce, products, offerings etc. Image fromhttp://www.heppnetz.de/projects/goodrelations/primer/;; Hepp

  40. (Linked) Vocabularies: GoodRelations (ii) • Models terms for e-commerce, products, offerings etc.

  41. (Linked) Vocabularies: Music Ontology • Classes and properties for Wikipedia export • Cross-domain • 272 classes • 1,300 properties • (Too big to show) • Used to model structured info-boxes in Wikipedia Seehttp://wiki.dbpedia.org/

  42. (Linked) Vocabularies Overview … …

  43. (Linked) Vocabularies: Interlinkage Interactivehttp://labs.mondeca.com/dataset/lov/;; Vatant, Vandenbussche

  44. “In order to make it as easy as possible for client applications to process your data, you should reuse terms from well-known vocabularies wherever possible.  You should only define new terms yourself if you can not find required terms in existing vocabularies. ... It is common practice to mix terms from different vocabularies.” LOD Vocabulary Usage “How to Published Linked Data on the Web” Bizer, Cyganiak, Heath http://www4.wiwiss.fu-berlin.de/bizer/pub/LinkedDataTutorial/

  45. LOD Vocabulary Usage Info fromhttp://www4.wiwiss.fu-berlin.de/lodcloud/state/:; Bizer, Jentzsch, Cyganiak

  46. LOD Vocabulary Usage • Preferential Attachment: more commonly used classes and properties are more likely to be used by others • Self-organising phenomenon/emergence • Causes power-law (long-tail) distributions... Property Usage Class Usage log/log scale

  47. Linked Open Challenges? • ...still many open challenges (and opportunities) • Linked Data still in it’s infancy (<4 years old) • Publishing Linked Data • How to generate and maintain links to other datasets • Modelling issues when decoupled from applications • Economic issues: who pays for server overheads? • Revenue streams? Incentives? • Social issues: community-driven, collaborative, knowledge-bases • Consuming Linked Data • Scalability (10’s of billions of triples) • Dealing with low data quality (Web data) • Heterogeneous data • many vocabularies • different URI naming schemes • Getting value from Linked Data through applications!!

More Related