80 likes | 97 Views
Learn how to estimate selectivity for triple patterns with different predicates and objects using histograms and hash functions.
E N D
Selectivity Estimation Example Mohammad Farhan Husain
Example Data R1, R2, … , R8 are resources i.e. URIs P1 and P2 are predicates, also URIs L1, L2, … , L5 are literals R = Total number of unique resources = 8 T = Total number of triples = 8 TP1 = Total number of triples having predicate P1 = 5 TP2 = Total number of triples having predicate P2 = 3 For any query: Selectivity of a bound subject s = sel(s) = 1 / R = 1 / 8 = 0.125 Selectivity of predicate P1 = sel(P1) = TP1 / T = 5 / 8 = 0.625 Selectivity of predicate P2 = sel(P2) = TP2 / T = 3 / 8 = 0.375 Selectivity of unbound subject and predicate and object = 1.0
Example Histogram for P1 Suppose there is a hash function which assigns the object values of triples having predicate P1 in two bins in the following manner: Bin 1 contains: L1, L2 and R2 Bin 2 contains: R4 and L3
Example Histogram for P2 Suppose the same hash function assigns the object values of triples having predicate P2 in two bins in the following manner: Bin 1 contains: L5 Bin 2 contains: L4 and R1
Selectivity Estimation for Triple Pattern Example with Bound Predicate • Triple Pattern: ?s P1 L2 • Estimated selectivity = sel(s) x sel(P1) x sel(L2) = 1.0 x 0.625 x sel(P1, L2) = 1.0 x 0.625 x (h1(P1, L2) / TP1) = 1.0 x 0.625 x (Height of Bin 1 / TP1) = 1.0 x 0.625 x (3 / 5) = 0.375 • Here, h1(P1, L2) denotes the bin of the histogram of predicate P1 where the hash function puts L2 in.
Selectivity Estimation for Triple Pattern Example with Unbound Predicate • Triple Pattern: ?s ?p L2 • Estimated selectivity = sel(s) x sel(p) x sel(L2) = 1.0 x 1.0 x {∑Pi ϵ P sel(Pi, L2)} = 1.0 x 1.0 x {sel(P1, L2) + sel(P2, L2)} = 1.0 x 1.0 x {h1(P1, L2) / TP1 + h1(P2, L2) / TP2} = 1.0 x 1.0 x {Height of Bin 1 of P1 Histogram / TP1 + Height of Bin 1 of P2 Histogram / TP2} = 1.0 x 1.0 x {3 / 5 + 1 / 3} = 0.933 • Note that the hash function always puts the value L2 into bin 1. That is why we pick the height of Bin 1 of the histogram for P2 even though P2 does not have the value L2 as its object in any of the triples.
Selectivity Estimation for Triple Pattern Example with Unbound Object • Triple Pattern: ?s P1 ?o • Estimated selectivity = sel(s) x sel(P1) x sel(o) = 1.0 x 0.625 x 1.0 = 0.625