170 likes | 384 Views
Extending ArcGIS with R. Mark Janikas , PhD mjanikas@esri.com. Outline . Introduction What is R? Why should I use it? Application Point Clustering Integration options R versus Rpy Conclusions and Future Directions. What is R? Why should I use it?.
E N D
Extending ArcGIS with R Mark Janikas, PhD mjanikas@esri.com
Outline • Introduction • What is R? Why should I use it? • Application • Point Clustering • Integration options • R versus Rpy • Conclusions and Future Directions
What is R? Why should I use it? • R (The R Project for Statistical Computing) is an open-source data analysis package. (GNU S) • Widely Used • Over 60 CRAN sites across 30+ countries • Its Free • GNU GENERAL PUBLIC LICENSE • Base is powerful • Statistics, Linear Algebra, Visualization , etc… • Its extendible • 1800+ Contributed Extensions • splancs, spatstat, spdep, rgdal, maptools, shapefiles
R Point Clustering Tools for ArcGIS • Resource Center (Code Gallery) • Contains two tools… that do the same thing!
Application: Point Clustering • Cluster a given a set of point locations: • Spatial Proximity • Attributes Values
Integration with ArcGIS • Two (Three) Integration Options With ArcGIS • Both require Python • Both have pros and cons • ESRI UC Plenary 2008 • predicting plant species in unknown areas
Integration: R Option • Decouples R and Python • Python • Retrieves and organize parameters from ArcGIS • Convert Data (Interchange) • Shapefiles, netcdf, img etc…. • Spawns R given the *.r file with provided parameters • R • Does the analysis Python Script ArcGIS R Script
Integration: RPy Option • R and Python closely coupled • RPy (RPy2) • Python Interface to the R Programming Language • Python • Retrieves and organize parameters from ArcGIS • RPy module is imported and R commands are executed within the Python script file ArcGIS Python Script R Processing
Which One Should I Use? • R Option • Attractive to R Programmers • “Out of Proc”: Spawning R on every execute • Use Copy Features!!! • selection sets • Projections and other environment variables • You must use an R library for handling shapefiles • maptools, shapefiles • Two files per script tool (*.py and *.r)
Which One Should I Use? Cont… • RPy Option • For more advanced users (Python and R knowledge) • “In Process” • Will be MUCH faster after the first call • Honors selection sets • A robust choice of database formats • Will honor environment settings (GP Functions) • Only a single file associated with your script tool
RPy Option Code Snippet Source R Libraries NumPy and R Interchange Cluster Analysis Create Output
Which One Should I Use? Cont… • Wait… Why would I go with the R Option? • Doesn’t have as many dependencies/layers • RPy • Python, R, and RPy builds have to play nice! • You must know Python, some R and now RPy. • Currently there is an open bug in RPy that must be fixed in order to run in the “In Process” mode in ArcGIS • Manual fix in the portal tool documentation • Both methods require the editing of Environment Variables in order to run properly
Conclusions • R • contains “cutting edge” data analysis techniques from a wide body of academic and applied fields • extendible • Open-source • Can be integrated with ArcGIS using Python • R versus RPy (RPy2) • Pros and Cons
Future Directions • RPy2 • Web Portal: RTools • Could be expanded upon • Calling Python from R • Leveraging geoprocessing within the R environment • RSPython: http://www.omegahat.org/RSPython/
Links • R • http://www.r-project.org/index.html • RPy (Link to RPy2) • http://rpy.sourceforge.net/ • Python • http://www.python.org/ • NumPy • http://www.numpy.org/
Related Sessions • Developing Python Scripts for Data Analysis Tips and Tricks • Geoprocessing Demo Theater – W, 5:00 – 6:00 • Spatial Statistics: Using Spatial Statistics • TH 1:30 – 2:45 • Regression Analysis for Spatial Data with ArcGIS 9.3 • TH 3:15 – 4:30