260 likes | 371 Views
Metrics-Driven Approach for LOD Quality Assessment . 2014-May-07. Outline. What is t he problem?. What have others done? . What is our solution?. Does it work?. What is the problem?. Linked Open Data (LOD): Realizing Semantic Web by interlinking existing but dispersed data
E N D
Metrics-Driven Approach for LOD Quality Assessment 2014-May-07
Outline • What is the problem? What have others done? What is our solution? Does it work?
What is the problem? • Linked Open Data (LOD): • Realizing Semantic Web by interlinking existing but dispersed data • Main components of LOD: • URIs to identify things • RDF to describe data • HTTP to access data
What is the problem? Datasets: 295 Triples:over 30,000,000,000 (30 B) Links:over 500,000,000 (500 M)
What is the problem? Inclusion Criteria for publishing and interlinking datasets into LOD cloud • resolvable http/https URIs • Presented in one of the standard formats of Semantic Web (RDF, RDFa, RDF/XML, Turtle, N-Triples) • Contains at least 1000 triples • Connected via at least 50 RDF links to the existing datasets of LOD • Accessible via RDF crawling, RDF dump, or SPARQL endpoint Is dataset ready to publish?
What is the problem? Idea of the LOD: Publishing first, improving later Results in: quality problems in the published datasets Missing link: Data Quality evaluation before release
What have others done? Data quality in the Context of LOD Validators Quality Assessment of Published data • General Validators • Parsing and Syntax • Accessibility / Dereferencability • Classifying quality problems of LOD • Using metadata for quality assessment • filtering poor quality data (WIQA) • Semantic Annotation using ontologies
What have others done? Limitations of related works: • Syntax validation, not quality evaluation • Not scalable • Not full automated • Evaluation after publishing
What is our solution? Proposing a set of metrics for Inherent quality assessment of datasets before interlinking to LOD cloud
2. Proposing Metrics Example: Goal: Assessment of the consistency of a dataset in the context of LOD Question: What is the degree of conflict in the context of data value? Metric: The number of functional properties with inconsistent values
3. Developing LODQM • LODQM: Linked Open Data Quality Model • 6 Quality dimensions • 32 Metrics
5. Empirical Evaluation 5.1 5.2 5.3 5.4 5.5 5.6 5.7
5. Empirical Evaluation √ √ √ • Result: • Three pairs of metrics are correlated: • {IFP, Im_DT} • {Im_DT, Sml_Cls} • {Inc_Prp_Vlu, IF} • The others are independent
5. Empirical Evaluation √ √ √ √
5. Empirical Evaluation √ √ √ √ √ √
5. Empirical Evaluation √ √ √ √ √ √ √ • Result: • Only one pair of quality dimensions is correlated: • {Interlinking, Syntactic accuracy} • The others are independent
6. Quality Prediction Result: 20 out of 32 metrics are selected • Using Neural Network Method: • MultiLayerPerceptron
Appreciative of your Attention and Comments