150 likes | 453 Views
Yoav Freund, Sanjoy Dasgupta University of California, San Diego 2008. 2013. 01.07( 월 ) Jeonbuk National Univ. DBLAB 김태훈. Random projection trees and low dimensional manifolds. Contents. Introduction Detailed overview An RP-Tree-MAX adapts to assouad dimension.
E N D
Yoav Freund, SanjoyDasguptaUniversity of California, San Diego 2008 2013. 01.07(월) Jeonbuk National Univ. DBLAB 김태훈 Random projection trees and low dimensional manifolds
Contents • Introduction • Detailed overview • An RP-Tree-MAX adapts to assouad dimension. • An RP-Tree-MEAN adapts to local covariance dimension.
Introduction • A k-d Tree is spatial data structure that partitions into hyperrectangular cells. • k-d Tree는 hyperrectangularcells 속 파티션들의 공간적인 데이터 구조 • It is built in a recursive manner, splitting along one coordinate direction at a time. • k-d Tree는 한 방향을 따라서 한번에 분리되는 재귀적인 방법을 이용
Introduction • The succession of splits corresponds to a binary tree whose leave contain the individual cells in . • 이 분리의 연속은 각 셀들을 포함하고 있는 잎의 이진 트리와부합함. suppose that *. The dots are points in a database. *. The cross is a query point q.
Introduction • K-d Tree requires D level in order to halve the cell diameter. • K-d 트리는 각 반경을 나누기 위해서 D level을 요구 • If the data lie in , itcould take 1000 levels of the tree to bring the diameter of cells down to half that of the entire data set. • 만약 data가 주어 졌을 경우 1000 level 을 내려가야 함. This would require data points!
Introduction • Thus k-d trees are susceptible to the same curse of dimensionality. • 그래서 k-d tree는 차원의 저주를 받을 정도로 민감. • However, a recent positive development in machine learning has been realization that a lot of data which superficially lie in a very high-dimensional space , actually have low intrinsic dimension. • 하지만 최근 machine learning에서 깨닫게 되었는데 많은 데이터들이 주어졌을 때 실제로는 매우 높은 는 낮은 고유한 차원을 가짐. • d << D • d(nonparameter실제 주어지는 데이터)보다 D차원에 더 민감함
Introduction • In this paper, we are interested in techniques that automatically adapt to intrinsic low dimensional structure without having to explicitly learn this structure. • 이 논문에서는 명시적으로 이 구조에 배울 필요 없이 관심 있는 테크닉인 자동적으로 적응하는 고유의 저차원구조에 대해서 서술 하고자 함.
Detailed overview • Both k-d trees and RP trees are built by recursive binary splits. • K-d tree와 RP tree는 재귀적으로 이진으로 분리되서만듬. • The core tree-building algorithm is called MakeTree, and takes as input a data set S • 이 코어 트리 빌딩 알고리즘은 MakeTree라고 불리는데 이것은 어떤 집합셋인S가 Rd에 속하는 input 데이터를 가짐.
MakeTree algorithm procedureMakeTree(S) if |S| < MinSize return (Leaf) Rule ← ChooseRule(S) LeftTree← MakeTree({x ∈ S : Rule(x) = true}) RightTree← MakeTree({x ∈ S : Rule(x) = false}) return ([Rule, LeftTree, RightTree])
K-d tree version procedureMakeTree(S) if |S| < MinSize return (Leaf) Rule ← ChooseRule(S) LeftTree← MakeTree({x ∈ S : Rule(x) = true}) RightTree← MakeTree({x ∈ S : Rule(x) = false}) return ([Rule, LeftTree, RightTree]) procedureChooseRule(S) comment: k-d tree version choose a coordinate direction Rule() := ≤ median({: ∈ S}) return (Rule)
RP-tree version PCA • 임의의 방향을 선정해서 중점을 기준으로 방향을 선택.
RP-tree Max version procedureMakeTree(S) if |S| < MinSize return (Leaf) Rule ← ChooseRule(S) LeftTree← MakeTree({x ∈ S : Rule(x) = true}) RightTree← MakeTree({x ∈ S : Rule(x) = false}) return ([Rule, LeftTree, RightTree]) procedureChooseRule(S) comment: RPTree-Max version choose a random unit direction v ∈ pick any x ∈ S; let y ∈ S be the farthest point from it choose δ uniformly at random in [−1, 1] ·/ Rule() := ≤ ( return (Rule
RP-tree Mean version procedureMakeTree(S) if |S| < MinSize return (Leaf) Rule ← ChooseRule(S) LeftTree← MakeTree({x ∈ S : Rule(x) = true}) RightTree← MakeTree({x ∈ S : Rule(x) = false}) return ([Rule, LeftTree, RightTree])