330 likes | 497 Views
Region 3 Data Integration Architecture / Geospatial Information Management (& Expanded Geospatial Services). U.S. EPA GIS Workgroup Meeting Chicago / Region 5 September 17, 2008. Matthew T. Mellon U.S. EPA Region III. Introduction. Integrate everything Eliminate obstacles to using data
E N D
Region 3 Data Integration Architecture / Geospatial Information Management(& Expanded Geospatial Services) U.S. EPA GIS Workgroup Meeting Chicago / Region 5 September 17, 2008 Matthew T. Mellon U.S. EPA Region III
Introduction • Integrate everything • Eliminate obstacles to using data • Core principles: • Store once, use many • Allow use of any client • Eliminate file management, importing, exporting, and other “dirty words” • Automate, automate, automate
What are Geospatial Services? • Well beyond just “mapping” in 2-D… • 2-D data exploration • 3-D visualization & exploration • Tabular reporting (for example, to support annual reports / assessments, etc.) • Relating previously un-related data systems through place • Freeing data for use in any app • GPS units with complete base maps with EPA data overlays ($300, waterproof, color)
Example • Points • Lines • Polygons • Inside of ? • Outside of ? • Intersects ? • Near ? • Not near ? • Normal distribution of results?
Primary Goals • Decision Support, Decision Support, Decision Support • Virtually all data gathered by or reported to EPA is used to support decision makers • Statistical analysis, whether geographic or not, is complex; tools to automate data quality objectives serve to increase decision support confidence • Program Management Support • Data about Program activities are used in reporting EPA activities and in budget planning, among other uses
Eliminating Barriers to Data Access • Automatic deployment of full GIS suite to all desktops (and inclusion on all new PCs) • Accomplished and ready to go • Will be deployed when data access templates, assortment of guides and training shortcuts are ready • Eliminates nearly 2 hours per workstation and ensures uniform updates • Inclusion of links to map catalog, geobrowsing applications, training links, documentation and help request forms on every Start Menu • Template created for a pre-defined set of popular data layers, available in multiple clients
The Big Picture • Data Access • Recent efforts, most notably MIRA, have illuminated the Program’s lack of easy & flexible access to their own data, as well as a near total deficit regarding access to other Program’s data • Flexibility • Current widely-used technology is largely inflexbile regarding mapping, reporting, etc.
Conceptual FrameworkGeospatial Server/Services Implementation (Live data feeds) (Format-free and coordinate- reference- system- free queries)
Why / How Now? • Changes in technology • Why doesn’t this already exist? • The tools are now available; we’re just putting them all together.
Business Processes Drive Technology • Identify Divisional business needs & processes • Provision technology that facilitates solutions • Technology is responsive, not driving business processes • Identify decisions data must support and work backwards to plan data management with that in mind
Data Stewardship Obligations • The Data Integration Architecture will facilitate use of existing information management systems • Both existing and newly created Divisional/Program systems must consider existing Data Stewardship requirements • Managing data quality, geospatial data quality • Providing access to data (think “data feeds”) • Plan data management to meet decision support business needs, including access for internal and external partners
Utilization Projections • Currently grossly underutilizing geospatial services • Triple-track training needs • Most Users: Basic use of templated data in front-ends • Power Users: Advanced use of powerful GIS software for spatial analysis, data visualization and exploration • Data Stewards/Owners: Training in geospatially-enabled SQL and maintenance of geospatial infrastructure • New (conceptual) menu-driven routable workflow management system as management control • Divisional POCs verify valid service requests • Deputy-approval required for any costs
Multiple Levels of Service Needs • End-user services • Training • Template-driven access to data from multiple front-end clients (both web-based and desktop) • Simple location mapping by users • Advanced mapping by power users & via service requests • Infrastructure services • Divisional coordination and consultation for IT and IS planning, design, and stewardship • For those who don’t yet, but should have, information systems • Behind-the-scenes services • Maintaining the Geospatial Architecture • Consultations regarding inclusion of data systems in Architecture
Need: Divisional First Contacts • Anticipated vast increase in utilization of services will require simple location mapping services be pushed back to the users • Routable workflow will manage onslaught of service requests • Need Divisional Points of Contact to be first stop on workflow • Must be knowledgeable users, assess validity of requests, deny if user simply needs more training • Must approve licensing requests where a cost will be incurred by EPA, verifying Branch Chief / Deputy [REVISIT] approval • Must be first contact for all questions, support requests, and must provide basic support for geobrowsing for trained users • GIS Council could serve as forum for bidirectional feedback for all Divisional POCs
Examples • Geobrowsing • Location Maps • Maps including spatial analyses • Who can defend this in court or before Congress? Which algorithm was used, and why? • 3-D visualizations and calculations • Shape file vs. geospatial SQL database, enterprise design considerations, template
New GPS Services • Sync EPA Data as overlays into units containing all Regional topo/USGS base
Submitted maps -- conclusions obvious or even supported? …not exactly.
Good things to be familiar with • CSRS or SRID or EPSG ID numbers • Datum AND Ellipsoid = Coordinate Spatial Reference System which has a unique Spatial Reference Identifier • EVERY CSRS has a unique “social security number” • NO SUCH THING as a “lat/lon” anymore; now “lat/lon/SRID” • WGS84 as datum & ellipsoid = EPSG 4326 • (Used by Google Maps, Google Earth, Virtual Earth) • Javascript (how you embed a map client in a web page) • SLD's -- Styled Layer Descriptors • Google Earth Title & Placemark Templates • CMS's -- Content Management Systems (no such thing as editing and uploading html files anymore) • R Statistical Language -- like "S" or "S-Plus" and we can embed in the SQL server itself, thus GIS layers can be defined as queries containing things like "max, mean, stdev, etc." • http://www.r-project.org/
Data format WKT Examples: POINT(2572292.2 5631150.7) LINESTRING (2566006.4 5633207.9, 2566028.6 5633215.1, 2566062.3 5633227.1) MULTILINESTRING((2566006.4 5633207.9, 2566028.6 5633215.1), (2566062.35633227.1, 2566083 5633234.8)) POLYGON (2568262.1 5635344.1, 2568298.5 5635387.6, 2568261.04 5635276.15,2568262.1 5635344.1); MULTIPOLYGON(((2568262.1 5635344.1, 2568298.5 5635387.6, 2568261.04 5635276.15,2568262.1 5635344.1), (2568194.2 5635136.4, 2568199.6 5635264.2, 2568200.85635134.7, 2568194.2 5635136.4 )))
How to do it • SERVER: • Install base server (OS, Postgres SQL) • Install Proj, Geos • Install PostGIS extension inside Postgres • Install Java Enterprise Application Server (Apache Tomcat, Glassfish, WebSphere, JBoss, etc.) • Deploy WAR for Geoserver; config through web gui • Admin Workstation: • Install PGAdmin III (GUI for db admin) • Install Quantum GIS (to test spatial queries, then save in above as a “view”)
…How, continued • Load Data • Use SPIT for consuming shape files in GUI • Use shp2pgsql for more granular control, for use in automated scripts • Use OGR Tools, GDAL for consumption of *many* other formats • Use ArcCatalog • Register data layers loaded in PostGIS in Geoserver • Import process registers spatial layers, now available for use in any client (direct for via WMS, WFS, WFS-T, etc.)
Creating a PostGIS Database Create database: createdb <dbname> Load PL/pgsql language for PostGIS: createlang plpgsql <dbname> Load PostGIS and object definitions: psql -d <dbname> -f postgis.sql CreateTable spatial_ref_sys (coordinate system codes): psql -d <dbname> -f spatial_ref_sys.sql This file also contains the CreateTable SQL for the metadata table geometry_columns
Why bother with all this? • Load data, create views one time only • Stylize & Symbolize disconnected from data (SLDs, FTLs) • Register layers with Geoserver one time • Build all front-ends (templates) using LINKS to data, now also independent from data • Update data themselves without breaking links • Automatic reprojection (get NAD27 as WGS84), file formatting (get AsKML, get AsSHP), filtering (use CQL in http:// strings….)
…why? Automate triggers for data acquisition, data review, etc. SELECT * FROM epadata.frs AS frs, usa.states AS states WHERE frs.stabbrv = ‘IL’ AND frs.the_geom ST_OUTSIDE (SELECT the_geom FROM states WHERE stabbr = ‘IL’)
Issues • Security • Service accounts for connections to “external authoritative data sources” • No more passwords -- defer to AD for auth • Data owners • Internet Explorer issues • Roles for data/geospatial services • Who runs the “Geospatial Services Desk” ?
Why? Conclusions… • Open standards allow greatest flexibility • Licensing allows cloning, replication, sharing, central management, rapid deployment and provisioning (cost can be as little as just hardware & time) • Overall model allows for a centralized “master replication database” to replicate based on geographically-defined subscriptions to Regional servers • Replication engine allows “cascading replication” whereby Regional servers can 100% replicate to COOPs, “Incident Servers”, etc.
Acknowledgments… End of slides. Thank you. End of slides.