1 / 28

K-relevance

K-relevance. Measuring source relevance in data integration query. Queries, relations and sources. K-relevance is defined for queries, which query one or more relations. Every relation is based on data extracted from one or more external sources.

Download Presentation

K-relevance

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. K-relevance Measuring source relevance in data integration query

  2. Queries, relations and sources • K-relevance is defined for queries, which query one or more relations. • Every relation is based on data extracted from one or more external sources. • The data in a relation may be not up-to-date. (the data from some sources may be extracted from previous versions of these sources)

  3. Relations and sources • Every tuple in the relation is based on exactly one source, and has a column which contain reference to the source. • Example:

  4. Relations and sources • One source may be used by more than one relation. • Example: positiveNumsnegativeNums

  5. Relations and sources - example

  6. Source information is needed • If an user thinks that there is mistake in the query results, knowledge on which sources the query results are based may help in finding the origin of the mistake. • If an sources can’t ever contribute to the query results, there is no need to extract data from it. • If a source can contribute to the query result regardless of the other sources, there may be need to extract the data from it more frequently.

  7. Query results and sources • Every tuple in the query results is a join of tuples – one tuple for each relation. • The sources of the resulting tuple is an union of the sources of the joining relations.

  8. 0-relevance – the actual data sources • The union of the sources for all the tuples in the query results, is called the 0-relevant sources • If the query result is empty, there are no tuples in the results, so there are no 0-relevant sources.

  9. 0-relevance - example

  10. 0-relevance - example SELECT allNums.n FROM allNums,evenNums WHERE allNums.n≤evenNums.n The 0-relevant sources:{nums1,nums2}{nums2}= {nums1,nums2}

  11. 0-relevance via relation • For relation R,if its tuple with source S has joined to create result tuple, then S is 0-relevant via R. • Example: {nums1,nums2} are 0-relevant Via allNums. {nums2} is 0-relevant Via evenNums

  12. Definition: Potential tuple • “Potential tuple” for a relation is any tuple which fit the schema of the relation. (it may actually exist in the relation). • For example, for the relation R(string, int) every tuple of the form (string s,int i) is potential tuple. • For a relation which contain source column, every potential tuple which has S in this column is called potential tuple from S • Note:every “real” tuple in R is also potential tuple, because it fits the schema of R.

  13. ∞-relevance via relation • If there are • a potential tuple from the source S for the relation R • and potential tuples for the other relations in the query • which can join to satisfy the query and create a resulting tuple,S is called ∞–relevant source via R

  14. ∞-relevance • The union of the ∞-relevant sources via the relations in the query, are the ∞-relevant sources of the query. • Note: the ∞-relevant sources are independent of the data in the relations, and depend only on the query and the sources of the queried relations.

  15. ∞-relevance • Every source of the relations is ∞-relevant, unless there are constraints in the query on the source column. • Note: the data sources of the relations are shared: if S is source of R1, it is also source of R2 • Therefore, if there are no constraints on the source column of one of the relations, all of the sources are ∞-relevant.

  16. ∞-relevance - example • For example, if the data sources are {src1.html,src2.html} in the query SELECT A.x FROM A,B WHERE A.source!=‘src1.html’ AND A.x < B.x • There is no possible tuple for A from src1 which will satisfy the query • There are • possible tuple for A from src2 (for example, {x=1,src=src2}) • and possible tuple for B (for example, {x=2,src=src1}) • which satisfy the query and create the resulting tuple (1) •  src2 is ∞-relevant via A.

  17. ∞-relevance - example • the data sources are {src1.html,src2.html} SELECT A.x FROM A,B WHERE A.source!=‘src1.html’ AND A.x < B.x • There are • possible tuple for B from src1 (for example, {x=2,src=src1}) • and possible tuple for A (for example, {x=1,src=src2}) • which satisfy the query and create the resulting tuple (1) •  src1 is ∞-relevant via B. • There are • possible tuple for B from src2 (for example, {x=3,src=src2}) • and possible tuple for A (for example, {x=2,src=src2}) • which satisfy the query and create the resulting tuple (1) •  src2 is ∞-relevant via B.

  18. ∞-relevance - example • {src2} is ∞-relevant via A • {src1,src2} are ∞-relevant via B • {src2} {src1,src2}={src1,src2} are the ∞-relevant sources of the query

  19. k-relevance • Assume the query is to m relations. • If there are • potential tuple from the source S for the relation R • and other (at most) k-1 potential tuples for (at most) k-1 relations (one tuple for each relation) • And real tuples for each of the remaining relations in the query which can join to create resulting tuple in the query, S is called k-relevant source via R.

  20. K-relevance • The union of the k-relevant sources via all relations in the query, is called the k-relevant sources of the query. • Note:If k is greater than or equal to m (the number of queried relations), k-relevance is equal by definition to ∞-relevance, because all of the joining tuples may be potential tuples, and there is no need to join with real tuples.

  21. K-relevance - notes • If S is k-relevant, it means that k potential tuples (one of them from S) can join with m-k real tuples to satisfy the relation. • k+1 potential tuples can also join with m-k-1 real tuples, because real tuple is also potential tuple by definition. • Therefore, K-relevance is monotone: every k-relevant source is also k+1 relevant source.

  22. K-relevance - example • The sources are {sigcomm.html,sigmetrics.html} • The query is: SELECT Papers.title FROM Authors,Papers WHERE Papers.author= Authors.name AND Authors.org=‘MIT’ AND Papers.title like '%Ubiquitous%‘ AND Papers.src=Authors.src

  23. K-relevance - example SELECT Papers.title FROM Authors,Papers WHERE Papers.author= Authors.name AND Authors.org=‘MIT’ AND Papers.title like '%Ubiquitous%‘ AND Papers.src=Authors.src • The relations are: • The query result are empty,Because there is no tuple in Authors with org=‘MIT’. • Therefore, there are no 0-relevant sources. • Moreover, even if any source will add a tuple to Papers, the result will be empty because the tuple won’t be able to join with any tuple in Authors. • Therefore, there are no 1-relevant sources via Papers.

  24. K-relevance - example SELECT Papers.title FROM Authors,Papers WHERE Papers.author= Authors.name AND Authors.org=‘MIT’ AND Papers.title like '%Ubiquitous%‘ AND Papers.src=Authors.src • If sigcomm.html will add the tuple (sigcomm.html, John, MIT, john@google.com) to Authors, it can join with the first tuple from papers. Therefore, sigcomm.html is 1-relevant via Authors. • However, every tuple from sigmetrics.html, even (sigmetrics.html,John,MIT,john@google.com) can’t join with any tuple from Papers, because all the tuples in Papers have ‘sigcomm’ in the source column. • Therefore, the 1-relevant sources for the query are {sigcomm.html}

  25. K-relevance - example SELECT Papers.title FROM Authors,Papers WHERE Papers.author= Authors.name AND Authors.org=‘MIT’ AND Papers.title like '%Ubiquitous%‘ AND Papers.src=Authors.src • The potential tuples: • (sigmetrics.html,Todd, MIT, todd@msn.com) from sigmetrics.html in Authors • And (sigmetrics.html, Todd, Boost Ubiquitous Access) in Papers • Can join to create the result tuple (Boost Ubiquitous Access). • Therefore, sigmetrics.html is 2-relevant source via Authors.

  26. K-relevance - example SELECT Papers.title FROM Authors,Papers WHERE Papers.author= Authors.name AND Authors.org=‘MIT’ AND Papers.title like '%Ubiquitous%‘ AND Papers.src=Authors.src • sigmetrics.html is also 2-relevant source via Papers: • The potential tuples: • (sigmetrics.html, Todd, Boost Ubiquitous Access) from sigmetrics.html in Papers • And (sigmetrics.html,Todd, MIT, todd@msn.com) in Authors • Can join to create the result tuple (Boost Ubiquitous Access). • Sigmetrics.html is 2-relevant source of the query. • Sigcomm.html is also 2-relevant source of the query, because it’s 1-relevant source and k-relevance is monotone.

  27. K-relevance – example - conclusion • There are no 0-relevant sources. • The only 1-relevant source is {sigcomm.html} • The 2-relevant sources are {sigcomm.html,sigmetrics.html} • The query queries only 2 relations, therefore the ∞-relevant sources are {sigcomm.html,sigmetrics.html}

  28. K-relevance - summary • A source is 0-relevant if tuple extracted from it to one or more of the queried relations has joined to create a tuple in the query results. • A source is ∞-relevant if a potential tuple from it, in one of the relations, can join with potential tuples in the other ralations to satisfy the query and create a tuple in the results. • A source is k-relevant if a potential tuple from it, in one of the relations, can join with potential tuples in at most (k-1) of the other ralations, and with real tuples in the remaining relations to satisfy the query and create a tuple in the results.

More Related