1 / 7

Result of N Categorical Variable Regional Co-location Mining

November 11, 2008. Result of N Categorical Variable Regional Co-location Mining. Dataset, 4 class lattices: 0.3,0.4,A ,B ,C ,D 0.2,0.3,A ,B,C,D. As P. As Fe Se . As SeFeF. As F. Regional Co-Location Mining Framework for q Binary Variables.

chesna
Download Presentation

Result of N Categorical Variable Regional Co-location Mining

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. November 11, 2008 Result of N Categorical VariableRegional Co-location Mining Dataset, 4 class lattices: 0.3,0.4,A,B ,C ,D 0.2,0.3,A ,B,C,D AsP AsFeSe AsSeFeF AsF

  2. Regional Co-Location Mining Frameworkfor q Binary Variables Example: Co-location set CS={A,B,C}; CoLoc-interestingness({A,B,C},r)= ((r,A) 1)((r,B) 1)((r,C) 1) Dataset O Remark: Strength is computed by comparing r with the whole Dataset O. r • Remarks: • Have to iterate over all possible co-location sets • Interestingness of the region is the interestingness of the maximum valued co-location set

  3. Region Interestingness • Region interestingness is assessed by computing the most prevalent pattern: • Region interestingness solely depends on the most interesting co-location set for the region.

  4. Example of a Result

  5. Co-Location Mining Frameworkfor q Binary Class Variables Version1 O – dataset rO – a region oO – object in the dataset O CS= {C1,…,Cr} – set of binary class variables that form base patterns; oCo.C=true th– class multiplier interestingness threshold, default-value 1   [0, ∞) – form parameter, default value 1 CCS be a single class variable BCS – a co-location set P(B) is a predicate over B that restricts the set of co-location sets considered; e.g. P(B)=|B|<5 or P(B)=AsB (“only look for patterns involving high arsenic”) (r,C)=(|{or|oC}|)/|r|)/(|{oO|oC}|/|O|) – C’s probability multiplier in r; high interestingness is associated with high multipliers z(C,r)= If (r,C)>th then ((r,C)-th) else 0 – normalized interestingness for C in r k(B,r)= CBz(C,r) – normalized interestingness of co-location set B in r i(r)=maxBS & |B|>1 and P(B)k(B,r) – region interestingness; maximum normalized interestingness observed for subsets BCS constrained by P Reward(r)= i(r)*|r|b

  6. Co-Location Mining Frameworkfor q Binary Class Variables Version2 O – dataset rO – a region oO – object in the dataset O CS= {C1,…,Cr} – set of binary class variables that form base patterns; oCo.C=true th1 – co-location set interestingness threshold   [0, ∞) – form parameter, default value 1 CCS be a single class variable BCS – a co-location set P(B) is a predicate over B that restricts the set of co-location sets considered; e.g. P(B)=|B|<5 or P(B)=AsB (“only look for patterns involving high arsenic”) (r,C)=(|{or|oC}|)/|r|)/(|{oO|oC}|/|O|) – C’s probability multiplier in r; high interestingness is associated with high multipliers k(B,r)= CB (r,C) –interestingness of co-location set B in r i’(r)=maxBS & |B|>1 and P(B)k(B,r) – region interestingness; maximum interestingness observed for subsets BCS constrained by P i(r)=IF i’(r)> th THEN (i’(r)-th) ELSE 0 –normalized region interestingness (th>=1;  form parameter) Reward(r)= i(r)*|r|b

  7. Datasets and Program Interface • Discretize the z-score normalized variable as follows: • z(A)1: A • -1 z(A) 1: A • Otherwise: A • The transformed dataset therefore have the form: <longitude, latitude, <class-variable>+ ) • Limit Co-location sets we are looking for in experiments to “” and “” class variables, to make it comparable to the continuous approach • Limit Co-location Sets to sizes 2-4 in the experiments! • Possibly conduct experiments with large sets using a single seed pattern; e.g. D • Therefore the program inputs of the categorical regional collocation mining versions should include: • k’ ---the maximum set size considered • The seed pattern, e.g. B, if we have a seed pattern; if we do not have a seed pattern is given all sets of sizes 2,…,k’ will be considered • Pattern list considered

More Related