600 likes | 714 Views
6350 Spatio -temporal Data Processing Course Overview. Yan Huang huangyan@unt.edu. Basic Information. Instructor: Yan Huang ( huangyan at unt.edu) Meeting place and time: M 2:30-:520pm B157 Office hours: M 12:30-2:30pm. Basic Information. TA : Sasi Koneru (SasiKoneru@my.unt.edu)
E N D
6350 Spatio-temporal Data Processing Course Overview Yan Huang huangyan@unt.edu
Basic Information • Instructor: Yan Huang (huangyan at unt.edu) • Meeting place and time: M 2:30-:520pm B157 • Office hours: M 12:30-2:30pm
Basic Information • TA: Sasi Koneru (SasiKoneru@my.unt.edu) • Office hours: Monday 10:00 AM to 2:00 PM, F208
Evaluation • The evaluation scheme will be • class participation 10% • paper analysis and presentation - 25% • project - 40%. • Term paper – 30%
Classroom policy • No computers or laptops unless told so.
Paper Analysis I • Collect 5 or more papers in one sub-area • Write short summaries for 3 (100-200 words) • Make a 15 minutes presentation on what you learn on this topic • The presentation will take an integrated approach where you introduce the motivation of the three papers, give a precise problem definition, compare and contrast the ways the 3 papers approach the problem and how they validate their results, present conclusions, and point to some future directions if you can identify
Paper Analysis II • Choose and present one paper from the reading list • Collect two questions from each group • Ask two questions yourself • Lead group discussion • Detail instructions are available from: • http://www.cse.unt.edu/~huangyan/6350/paperAnalysis.txt • One paper every week
Find Related Work • Need to know the key words • May need to explore and refine during your search • Often you can find electronic version of the papers, especially for publications related to computer science • Author’s website • ACM digital library • IEEE xplore • Springer Online • Google scholar • You school typically subscribes to these publishers • Search from a computer with IP address belonging to your school
Computer Science Bibliography Collections • CiteSeer • http://citeseer.ist.psu.edu/ • DBLP • http://www.informatik.uni-trier.de/~ley/db/ • Google Scholar • http://scholar.google.com/ • ACM Digital Library • http://portal.acm.org/dl.cfm • IEEE Xplore • http://portal.acm.org/dl.cfm
Term Project • ACMGIS CUP 2014 • Team of up-to 2 person • March 03, 10 minutes presentation on algorithm design and cost analysis • Score is based on normalized grade you get from submission.
Term Paper • Two choices • Term paper • Survey paper
Term paper • Research oriented • Key components: • Problem Statement, Significance of the problem • Related Work and Our Contributions • Proposed Approach • Validation of listed contributions (experimental, analytical) • Conclusions and Future Work
Survey paper • Key components • Problem Statement, Significance of the problem • Our Contributions (usually it is the categorization/classification of the research literature) • A classification of the papers related to the problem. Use a concept hierarchy, figures, and diagrams if necessary. • Summarize, classify, contrast, and compare the research literature according to your classification scheme • A summary of the trend and future work of this line of research. • Conclusion.
Spatial Databases (SDBMS) • Traditional (non-spatial) database management systems provide: • Persistence across failures • Allows concurrent access to data • Scalability to search queries on very large datasets which do not fit inside main memories of computers • Efficient for non-spatial queries, but not for spatial queries • Non-spatial queries: • List the names of all bookstore with more than ten thousand titles. • List the names of ten customers, in terms of sales, in the year 2001 • Use an index to narrow down the search • Spatial Queries: • List the names of all bookstores with ten miles of Minneapolis • List all customers who live in Tennessee and its adjoining states • List all the customers who reside within fifty miles of the company headquarter
Value of SDBMS • Examples of non-spatial data • Names, phone numbers, email addresses of people • Examples of Spatial data • Census Data • NASA satellites imagery - terabytes of data per day • Weather and Climate Data • Rivers, Farms, ecological impact • Medical Imaging • Exercise: Identify spatial and non-spatial data items in • A phone book • A Product catalog
User, Application domains • Many important application domains have spatial data and queries. Some Examples follow: • Army Field Commander: Has there been any significant enemy troop movement since last night? • Insurance Risk Manager: Which homes are most likely to be affected in the next great flood on the Mississippi? • Medical Doctor: Based on this patient's MRI, have we treated somebody with a similar condition ? • Molecular Biologist:Is the topology of the amino acid biosynthesis gene in the genome found in any other sequence feature map in the database ? • Astronomer:Find all blue galaxies within 2 arcmin of quasars. • Exercise: List two ways you have used spatial data. Which software did you use to manipulate spatial data?
SDBMS • A SDBMS is a software module that • can work with an underlying DBMS • supports spatial data models, spatial abstract data types (ADTs) and a query language from which these ADTs are callable • supports spatial indexing, efficient algorithms for processing spatial operations, and domain specific rules for query optimization • Example: Oracle Spatial data cartridge, ESRI SDE • can work with Oracle DBMS • Has spatial data types (e.g. polygon), operations (e.g. overlap) callable from SQL3 query language • Has spatial indices, e.g. R-trees • IBM: Spatial Option • Informix: Spatial Datablade
SDDMB vs. GIS • GIS is a software to visualize and analyze spatial data using spatial analysis functions such as • Search Thematic search, search by region, (re-)classification • Location analysis Buffer, corridor, overlay • Terrain analysis Slope/aspect, catchment, drainage network • Flow analysis Connectivity, shortest path • Distribution Change detection, proximity, nearest neighbor • Spatial analysis/Statistics Pattern, centrality, autocorrelation, indices of similarity, topology: hole description • Measurements Distance, perimeter, shape, adjacency, direction • GIS uses SDBMS • to store, search, query, share large spatial data sets
SDBMS vs. GIS • SDBMS focuses on • Efficient storage, querying, sharing of large spatial datasets • Provides simpler set based query operations • Example operations: search by region, overlay, nearest neighbor, distance, adjacency, perimeter etc. • Uses spatial indices and query optimization to speedup queries over large spatial datasets. • SDBMS may be used by applications other than GIS • Astronomy, Genomics, Multimedia information systems, ...
Issues in SDBMS • Spatial data model • Query language • Query processing • File organization and indices • Query optimization, etc.
Spatio-temporal Databases • Add temporal dimension • Examples: • Trajectories • Evolving region • Moving points
Geo-stream databases • Many data are generated continuously • Transaction data • Network monitoring • Financial application • Most recent data are commonly queried in a one-pass fashion • Monitoring • Aggregation • Database system provides abstractions and declarative languages that stream processing can benefit from
Stream Application • Environmental monitoring • Patient monitoring • Finance • Network monitoring • Click-streams • Transaction monitoring • Traffic analysis • Moving object queries • Sensor network • RFID
Sample Applications • Environmental monitoring • Notify me when UV is high, temperature is low • Traffic monitoring • Traffic jam: aggregated speed much below speed limit on a road segment for extended time • Accident: vehicle on unintended space, e.g. high way for longer than expected time • Click-streams • Find the school districts of the houses that the user browses the most.
Geo-streams • Current streams systems lack native spatial support • Spatial stream queries are common in • traffic monitoring • environment monitoring • moving object databases
Route prediction • Next position • Next stop • The entire route • Application: • Mobile commerce • Save energy • Traffic notification
Location-based social networking • Social networking with location • Loopts • Google latitude • Geocache • Social dynamics • Iphone applications
Volunteer Geographic Information System • OpenStreetMap, • Wikimapia • Foursquare • Trapster
Spatio-temporal Analytics • The analysis of data with both spatial and temporal information • The data are spatially and/or temporally correlated "Everything is related to everything else, but near things are more related than distant things."
Why do we need spatio-temporal analytics • Analytics help us to describe what happened in the past, understand what is happening now, predict what will happen in the future, and make decisions. • The proliferation of sensor devices makes spatio-temporal information a fundamental component for almost every analytical applications
Types of Spatio-Temporal Analytics Methods • Visualization and exploratory analysis • Segmentation (classification and clustering) • Outlier analysis • Colocation mining • Dependency analysis • Trend discovery
Data Visualization and Exploratory Analysis • Map querying task • Static query (one-time query using map tools available on the interface) • Dynamic query[36] (setup of event alert conditions) • Spatial constraints are expressed using the map, while temporal constraints are expressed as linear time moments[37] • Map animation[38] • Focusing, linking and arranging views[39] • Map iteration[40] • Existential changes[25] • Location changes • Attribute Changes
Segmentation methods • Classification[41] • Spatial classification: decision tree, Bayesian, ANN… • Temporal classification: decision tree, Bayesian, ANN… • Temporal extensions to spatial classification/ Spatial extension to temporal classification • Clustering[42] • Spatial clustering: partitioning method, hierarchical method, density based method, and grid-based method. • Temporal clustering • Interactive spatio-temporal clustering: perform clustering spatially or temporally and then test whether the cluster exist in both dimensions (EMM Test[43]) • Simultaneous spatio-temporal clustering: space-time scan[44]
More on Spatio-Temporal Clustering • Model-based clustering[46] • define a multivariate density distribution and look for a set of fitting parameters for the model. • Distance-based method • Moving object similarity search • Density-based method • DBSCAN extensions, OPTICS[47] • Flocks and convoy • Moving clusters[47] • Applications: movement data, cellular networks, environment data…
Spatio-Temporal Outlier Analysis • Definition of outliers • “spatial-temporal object whose thematic attribute values are significantly different from those of other spatially and temporally referenced objects in its spatial or/and temporal neighborhoods”. • Methods[48] • Clustering-based approach • Distance based approach • Computational geometry based approach • Spatial scan based approach
Co-Location Mining • Colocation mining finds subset of Boolean features located in spatial proximity • Methods[50] • Data mining-based approach • Spatial statistical approach • Buffer-based model • Temporal extension: mixed-drove approach, weighted window-based model[51]
Other methods • Association rule mining • Spatial preprocessing is required to discretize spatial measurements • Methods[49] • Bayesian networks • Hieratical approach • Trend discovery • Regression • Sequence mining
List of Current Spatio-Temporal Analytics Tools • Commercial • ESRI ArcGIS series • Microsoft SQL Spatial +StreamInsight • Other commercial tools • Open source/free software • Descartes and CommonGIS • MapServer • Other free tools
ESRI ArcGIS Series • ArcGIS desktop and server provide most advanced and complete toolkit • Has many extensions for different domains • Can use APIs to develop extensions, web or desktop applications for customized needs. Many other commercial tools such as CUBE[9] are built on top of ArcGIS.
ESRI ArcGIS Desktop and Server Extensions[1] • 3D Extension (Desktop and Server) • Analyze terrain data, model subsurface features, view and analyze impact zones, determine optimum facility placement, share 3D views, create a 3D virtual city. • Geostatistical Extension (Desktop and Server) • Visualize, model, and predict spatial relationships. • Link data, graphs, and maps dynamically. • Perform deterministic and geostatistical interpolation. • Evaluate models and predictions probabilistically
ESRI ArcGIS Desktop and Server Extensions • Network Extension (Desktop and Server) • Dynamically model realistic network conditions and solve vehicle routing problems • Multipoint optimized routing, time-sensitive, turn-by-turn driving directions , allocation of service areas, determining the fastest fixed route to the closest facility • Schematics Extension (Desktop and Server) • Rapid checking of network connectivity • Automatically generate schematics
ESRI ArcGIS Desktop and Server Extensions • Spatial extension (Desktop and Server) • Comprehensive, raster-based spatial modeling and analysis. • Survey Extension (Desktop) • Capture, edit, and leverage land records using proven survey methodologies • Tracking Extension (Desktop) • Create time series visualizations so you can analyze information relative to time and location
ESRI Domain-Specific Solutions • ESRI Business Analyst Online • Web-based solution that combines GIS technology with extensive demographic, consumer spending, and business data for the entire United States to deliver on-demand, boardroom-ready reports and maps • Perform drive-time analysis • Analyze trade areas • Evaluate sites • Identify most profitable customers and reach customers