780 likes | 973 Views
SensorKDD 2008 Sunday, 24 th August, 2008. Spatio-Temporal Outlier Detection in Precipitation Data. Elizabeth Wu, Wei Liu, Sanjay Chawla The University of Sydney, Australia. What is a spatio-temporal outlier? Motivation Previous Work Contributions Our Approach Future Work. Outline.
E N D
SensorKDD 2008 Sunday, 24th August, 2008 Spatio-Temporal Outlier Detection in Precipitation Data Elizabeth Wu, Wei Liu, Sanjay ChawlaThe University of Sydney, Australia
What is a spatio-temporal outlier? Motivation Previous Work Contributions Our Approach Future Work Outline
What is a Spatio-Temporal Outlier? • “A spatio-temporal object whose thematic attribute values are significantly different from those of other spatially and temporally referenced objects in its spatial and/or temporal neighborhoods.” – Cheng and Li (2006) 5 4 3 2 1 5 4 3 2 1 5 4 3 2 1 5 4 3 2 1 5 4 3 2 1 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 t=4 t=5 t=2 t=3 t=1
What is a spatio-temporal object? • “A time-evolving spatial object whose evolution or ‘history’ is represented by a set of instances (o_id, si, ti) where the spacestamp si is the location of object o_idat timestamp ti.” - Theodoris et. al. (1999) • Simply put, • A point becomes a line • A 2D region becomes a 3D region time time y co-ordinate y co-ordinate x co-ordinate x co-ordinate
Data Figure: Stations used to produce gridded precipitation fields • South American precipitation data (NOAA) • 10 years (1995-2004) • 2.5 x 2.5° grids • 31 latitude x 23 longitude divisions • 713 grids total • 2,609,580 possible data values • Missing data – spatially and temporally • El Niño Southern Oscillation Data (NOAA) • Southern Oscillation Index (SOI) • Measures the difference in Sea Surface Temperature (SST) between Tahiti and Darwin • The lower the score, the more intense an El Niño event
Why would we be interested in moving outlier regions in precipitation data? Knowing the location, time and duration of past extreme precipitation events helps to understand and prepare for future events. We can analyse how different phenomenon interact. E.g. ENSO and precipitation. Motivation
Spatial Scan Statistics Used to find spatial outliers Cluster detection using the spatial scan statistic in spatio-temporal point data (Iyengar, 2004) Exact-Grid and Approx-Grid (Agarwal et. al., 2006) Uses the Kulldorff Spatial Scan Statistic Finds the highest discrepancy region (by location and size) in a spatial grid dataset. Spatio-temporal outlier detection (Birant and Kut, 2006) Limited to finding outliers over a single time period. time y co-ordinate x co-ordinate Previous Work
Extended Exact-Grid and Approx-Grid to find the top-k outliers in a single time period. Developed the Outstretch & RecurseNodes algorithm to find outliers that repeatedly appear over several time periods. Apply to South American Precipitation data. Analyse the behaviour of the outliers against the El Niño Southern Oscilation (ENSO). Contributions
Our Approach • Find the top-k outliers in a spatial grid for each time period • Extend Exact-Grid and Approx-Grid algorithms • Use Oustretch to find spatial outliers which extend over several time periods. • Use RecurseNodes to extract the sequences from the Outstretch tree.
Finding the top-k outliers • Find every possible region size and shape in the grid. • Get each region’s discrepancy value to determine which is a more significant outlier. • Our extension keeps track of the top-k regions rather than just the top-1. right left top bottom
Kulldorff Scan Statistic • Uses two values: • Measurement – Number of incidences of an event • E.g. In how many cells is precipitation extreme? • M – for the whole dataset • m(p) - for the cell p • mR = ΣpєR m(p) / M • Baseline – Total population at risk • I.e. How many cells have we recorded values for? • B – for the whole datasetb(p) - for the cell p • bR = ΣpєR b(p) / B • We find the discrepancy for local region R by subsitution into: • When mR > bRd(mR, bR) = mRlog(mR/bR) + (1-mR)log((1-mR)/(1-bR)) • Otherwise d(mR, bR) = 0
Kulldorff Scan Statistic: Example • M = 6 = total # cells with “1” in entire grid • ΣpєRm(p) = 4= total # cells with “1” in R • mR = ΣpєRm(p)/M = 0.67 • B = 16= total # cells in entire grid • ΣpєRb(p) = Sum of b’s in region = 4= total # cells in R • bR = ΣpєRb(p)/B = 0.25 • Result: d(mR, bR) = 0.3836 4 3 2 1 1 2 3 4 4 3 2 1 1 2 3 4
Finding the top-k outliers: Exact-Grid right left top bottom
Finding the top-k outliers: Exact-Grid right left top bottom
Finding the top-k outliers: Exact-Grid right left top bottom
Finding the top-k outliers: Exact-Grid right left top bottom
Finding the top-k outliers: Exact-Grid right left top bottom
Finding the top-k outliers: Exact-Grid right left top bottom
Finding the top-k outliers: Exact-Grid right left top bottom
Finding the top-k outliers: Exact-Grid right left top bottom
Finding the top-k outliers: Exact-Grid right left top bottom
Finding the top-k outliers: Exact-Grid right left top bottom
Finding the top-k outliers: Exact-Grid right left top bottom
Finding the top-k outliers: Exact-Grid right left top bottom
Finding the top-k outliers: Exact-Grid right left top bottom
Finding the top-k outliers: Exact-Grid right left top bottom
Finding the top-k outliers: Exact-Grid right left top bottom
Finding the top-k outliers: Exact-Grid right left top bottom
Finding the top-k outliers: Exact-Grid right left top bottom
Finding the top-k outliers: Exact-Grid right left top bottom
Finding the top-k outliers: Exact-Grid right left top bottom
Finding the top-k outliers: Exact-Grid right left top bottom
Finding the top-k outliers: Exact-Grid right left top bottom
Finding the top-k outliers: Exact-Grid right left top bottom
Finding the top-k outliers: Exact-Grid right left top bottom
Finding the top-k outliers: Exact-Grid right left top bottom
Finding the top-k outliers: Exact-Grid right left top bottom
Finding the top-k outliers: Exact-Grid right left top bottom
Finding the top-k outliers: Exact-Grid right left top bottom
Finding the top-k outliers: Exact-Grid right left top Keeps moving top and bottom lines until all regions have been examined between the left and right lines… bottom
Finding the top-k outliers: Exact-Grid right left top bottom
Finding the top-k outliers: Exact-Grid right left top bottom
Finding the top-k outliers: Exact-Grid right left top bottom
Finding the top-k outliers: Exact-Grid right left top bottom
Finding the top-k outliers: Exact-Grid right left top bottom
Finding the top-k outliers: Exact-Grid right left top bottom
Finding the top-k outliers: Exact-Grid right left top bottom
Finding the top-k outliers: Exact-Grid right left top bottom
Finding the top-k outliers: Exact-Grid right left top bottom
Finding the top-k outliers: Exact-Grid right left top bottom