70 likes | 206 Views
Detecting Distance-Based Outliers in Streams of Data. Fabrizio Angiulli and Fabio Fassetti DEIS, Universit `a della Calabria. Problem Definition.
E N D
Detecting Distance-Based Outliers in Streams of Data Fabrizio Angiulli and Fabio Fassetti DEIS, Universit `a della Calabria
Problem Definition • Definition 3.1 (Distance-Based Outlier).Let S be a set of objects, obj an object of S, k a positive integer, and R a positive real number. Then, obj is a distance-based outlier (or, simply, an outlier) if less than k objectsin S lie within distance R from obj. • The neighbors of an object obj that precede obj in the stream and belong to the current window arecalled preceding neighbors of obj. • The neighbors of an object obj that follow obj in the stream and belongto the current window are called succeeding neighbors of obj.
Problem Definition • If the number of succeeding neighbors of obj isless than k, obj could become an outlierdepending on thestream evolution. • Conversely, since obj will expire beforeits succeeding neighbors, inliers having at least k succeeding neighbors will be inliers for any stream evolution. Suchinliers are called safe inliers.
Information of ISB • n.obj : a data stream object. • n.id: the identifier of n:obj, that is the arrival time ofn:obj. • n.count after : the number of succeeding neighbors of • n.obj. This field is exploited to recognize safe inliers. • n.nn_before: a list, having size at most k, containingthe identifiers of the most recent preceding neighborsof n.obj. At query time, this list is exploited to recognize the number of preceding neighbors of n.obj.