1 / 16

Spatial Proximity of Structural Data Attributes

This study delves into the realm of data analysis, focusing on machine learning, clustering, classification, and continuity concepts while exploring proximity sets and non-isotopic vector space structures. It also discusses collaborative filtering and proximity structures in the context of predicting user ratings in collaborative filtering systems and analyzing remotely sensed imagery domains.

mcarrion
Download Presentation

Spatial Proximity of Structural Data Attributes

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Spatial Proximity of Structural Data Attributes Maria Canton, William Perrizo Dept. of CS, North Dakota State University. CATA 2007 – Honolulu, Hawaii

  2. Data analysis can be broken down into two parts, Querying and Data Mining. Data Mining can be broken down into 2 parts, Machine Learning and Association Rule Mining Machine Learning can be broken down into 2 parts, Clustering and Classification. Clustering can be broken down into 2 parts, Isotropic (round clusters) and Density-based

  3. So Machine Learning begins by identifying Near Neighbor Set(s), NNS. In Isotropic Clustering, round sets are identified (disk shaped Near Neighbor Sets about a center). In Density Clustering, cores are identified (dense NNSs) then pieced together by overlap. Classification is always based on continuity which is necessarily Near Neighbor Set based.

  4. Classification We classifying a sample based on its NNS class histogram (AKA, k Nearest Neighbor or kNN classification) or We identify isotropic NNSs of centroids (AKA, k-means) or We build decision trees whose leaves are disjoint Training Subsets whose histograms classify samples falling to that leaf or we find class boundaries (e.g. SVM) which distinguish NNSs in one class from rest.

  5. Continuity Recalling the definition of continuity: >0 >0 : d(x,a)<  d(f(x),f(a))< or said using Near Neighbor Sets,  NNS about f(a)  NNS about a that maps inside it. In a Database, class values are descrete ( finite) and thus Nearest Neighbor Sets (Proximity Sets) are fundamental to Machine Learning.

  6. Near Neighbor Sets of a set Given a similarity, s:RRReals (e.g., s(x,y) = s(y,x) and s(x,x)  s(x,y) x, y  R ) and an extension to disjoint subsets of R (e.g., single link / complete link / average link...) and C  R, a k-disk of Cis (a k Nearest Ngbr Set of C) disk(C,k)  C : |disk(C,k)C'| = k and s(x,C)  s(y,C) xdisk(C,k), ydisk(C,k)

  7. C C r1 C For C = {a} r1 r1 a r2 r2 skin(C,k) disk(C,k) - C skin stands for "s k immediate neighbors" and is also a kNNS of C cskin(C,k) allskin(C,k)sclosed skin, and ring(C,k)= cskin(C,k) - cskin(C,k-1) disk(C,r1) {xR | s(x,C)r1}, skin(C,r1) disk(C,r1) - C ring(C,r2,r1) disk(C,r2) - disk(C,r1)  skin(C,r2) - skin(C,r1). Given a [psuedo] distance, d, rather than a similarity, just reverse all inequalities.

  8. xyshad xyshad  xoyy = xoyy = xoyy |y| |y| yoy |y|2 y  x A useful non-isotopic vector space proximity structure Theshadow vector made by a vector x on another vector y, denoted xyshadow or just xyshad is the dot product of x with a unit vector in the y direction times that unit vector.

  9. xyshad y xyperp  x The perp vector Theperpendicular vector made by a vector x on another vector y, denoted xyperpendicular or justxyperp = difference of x and its yshadow. xyperp x - xyshad |xyperp|2 = |x|2 - |xyshad|2 xyshad (xyperp) are linear in x

  10. xyshad y xyperp  x Proximity Structures based on shad and perp In collaborative filtering, e.g., predicting the rating, um, of a movie, m, by a user, u, from ratings given by users, v, let's consider users as spatial vectors of ratings over movie dimensions ( Netflix prize) The other users, v, provide signals for predicting um. Note that a user, v, whose ratings are: vn= un+1 for all movies, n, that u has already rated, is just as strong a prediction signal as one with exactly matching ratings, vn= unnSupp(u) In standard collaborative filtering, such vs (I will call them +1 signals) are filtered out as not being proximal to u.

  11. xyshad xyshad vm- (1/n)SignedLength(v-u)shad = xoyy yoy y(1..1)=1|y|2=nxshad= xo11=kxk1=x1xperp=x-x1 y=1 n n xyperp  x=v-u Pure Signals in Collaborative Filters proximity structures Filter out all collaborators except exact match signals, +1 signals and -1 signals (collectively called pure signals), as non-proximal? For this we use y=(1,1,...,1) RatingPrediction-v = SignedLength(v-u)shad= |v-u|cos = (v-u)o(1/n ) = vo1/n -uo1/n = vk/n -uk/n = (n) (v-u) xyperp x - xyshad

  12. Remotely Sensed Imagery domains • Spatial domain functionals, used in analyzing remotely sensed imagery, take into account pixels’ structural attributes as well as neighborhood conditions. • Using the programming utility, TM-Mine, we find the following.

  13. VI (vegetation index) NDVI (normalized difference) TVI (transformed veg index) NIR / R (NIR – R) / (NIR + R) {[(G-B)/(G+B)+0.5]^0.5}*100 P4 3.0GHz – dataset size of 2.10 X 10E8 142.5 seconds 307.5 seconds 442.0 seconds Execution Times for Band Functionals of Different Complexities on a Full TM Scene of 210,000,000 Pixels

  14. Execution Times for Band Funtionals of Different Complexities on a Full TM Scene of 210,000,000 Pixels

  15. Execution Times of Pixel-Matching 1 to 6 Bands

  16. Thankyou

More Related