340 likes | 434 Views
A predictive model for frequently viewed tiles in a Web map. Sterling Quinn MGIS Candidate ESRI ArcGIS Server Product Engineer Mark Gahegan Faculty Advisor. Introduction. This project presents a model for predicting high-traffic areas of a Web map
E N D
A predictive model for frequently viewed tiles in a Web map Sterling QuinnMGIS CandidateESRI ArcGIS Server Product Engineer Mark GaheganFaculty Advisor
Introduction • This project presents a model for predicting high-traffic areas of a Web map • Model output indicates where server-side cache of map tiles should be created
Project objectives • Describe server-side caching of map tiles • Describe the need for selective caching • Present a predictive model for popular areas of the map • Describe ways the model could be used and evaluated
Organizing large maps in manageable “tiles” is not new • Large paper map series are indexed in organized grids • CGIS, a pioneering GIS, used “frames” to organize data (right) From Tomlinson, Calkins, & Marble, 1976, p. 56.
Other techniques for organizing maps in tiles or grid systems • Pyramid technique successively generalizes rasters in groups of four cells (right) • Quadtree structures index datasets in a hierarchy of quadrants From De Cola & Montagne, 1993, p. 1394.
The modern map tile • JPG or PNG image • Standard square dimensions (256 x 256 or 512 x 512) • Stored in large “caches” on the server at multiple scales
Server-side caching of map tiles is new • Traditional map servers (ArcIMS, WMS) draw the image on the fly • Can take a while if the map is complex • Cached map tiles give extremely fast performance • Tiled maps allow users to retrieve just the needed pieces of the map
Advent of tiled maps and server-side caching • Microsoft Terra Server an early deployment of massive amounts of cached imagery tiles • Google Maps serves cached map tiles with AJAX techniques to create a “seamless” Web mapping experience
Tiles in Google Maps quickly retrieved as you navigate 1 2 From Google Maps: http://maps.google.com
Many sites have followed Google’s pattern Yahoo Maps: http://maps.yahoo.com MapQuest: http://www.mapquest.com Microsoft Virtual Earth: http://maps.live.com
Current caching options • Current GIS software allows analysts to create tile caches for their own maps • ESRI’s ArcGIS Server • Mapnik • Microsoft MapCruncher
Caching can require enormous resources on the server • Caches covering big areas at large scales can include millions of tiles • Many gigabytes, or even terabytes of storage • Days, weeks, or sometimes months to generate • Many GIS shops lack resources to maintain large caches
Selective caching as a strategy for saving resources • Administrator can cache only the areas anticipated to be most visited • Remaining areas can be: • Added to the cache “on-demand” when first user navigates there • Filled with a “Data not available” tile
Benefits of selective caching • Wise because some tiles (ocean, desert) will rarely, if never, be accessed • Saves time • Saves disk space
Implications of selective caching • Requires an admission that some areas are more important than others • Poses challenge of predicting popular areas before the map is released
Project presents a predictive model for where to pre-cache tiles • “Which places are most interesting?” • Inputs are datasets readily available to GIS analyst • Output vector features a template for where to pre-cache tiles
Purpose of the model • Help majority of users see a fast Web map while minimizing cache creation time and storage space
Not a descriptive model • Descriptive model shows where users have already viewed • Microsoft Hotmap good example of a descriptive tool (right) • Descriptive models useful for deriving and validating predictive models From Microsoft Hotmap http://hotmap.msresearch.us
Advantages of a predictive model • Doesn’t require the map to be deployed already • Can include fixed and varying geographic phenomena • Has applications far beyond map caching
Study area and conditions • Model predicts frequently viewed places for a general base map • May create models for thematic maps if time allows • Study area of California
Input datasets • Populated / developed areas • Road networks • Coastlines • Points of interest
Populated / developed areas • Human Influence Index grid by the Socioeconomic Data and Applications Center (SEDAC) at Columbia University • Model selects all grid cells over a certain value
Road networks • Major roads buffered by a given distance • All roads within national parks, monuments, historical sites, and recreation areas, buffered by a given distance
Coastlines • All coastlines buffered by a given distance (wider buffer on inland side)
Points of interest • Set of 60 interesting points chosen by model author • Mountain peaks • Theme parks • Sports arenas • Etc. • Represents a flexible layer that could be tailored to local needs
Deriving the output • Merge all layers together • Clip to California outline (with small buffer) • Remove small holes and polygons • Dissolve into one multipart feature • Simplify to remove unneeded vertices
Using the model output • Output a vector dataset that can be used as a template for creating cached tiles • Compare model output area with total area to understand percent coverage • Compare model output with actual usage over time • Refine if necessary
Limitations • Models of world scope should account for Internet connectivity • Input datasets have varying collection dates • Input datasets vary in resolution and precision • Maps with many scales might require multiple iterations and variations of the model
References • De Cola, L. & Montagne, N. (1993). The PYRAMID system for multiscale raster analysis. Computers & Geosciences, 19(10), 1393 – 1404. • Tomlinson, R. L., Calkins, H. W., & Marble, D. F. (1976). Computer Handling of Geographical Data. Paris: Unesco.