380 likes | 598 Views
Data Model Sharing. Steve Grise ESRI. Overview. Schema sharing experiences Based on several years of experience with template data models and ESRI technology (a lot of projects) Ideas evolving in the past year Key points SDI Template concepts Design process for SDI
E N D
Data Model Sharing Steve Grise ESRI
Overview • Schema sharing experiences • Based on several years of experience with template data models and ESRI technology (a lot of projects) • Ideas evolving in the past year • Key points • SDI Template concepts • Design process for SDI • Data Schema sharing discussion • US Examples • Dataset Mapping needs
SDI Template Concepts • Develop useful implementation templates • Make it work for Local Governments first, also make it work for a larger community picture • Local, State/Regional, National/Regional • Support Business needs and Enterprise Architectures • Organize data and web services into a publication/data sharing information model • More than a data model, but that’s a key part • States can play a key role in terms of partnerships and data infrastructure development
SDI Themes Data Content • Emergency Operations • Environmental • Land Use/Land Cover • Basemap • Addresses and Names • Government Units • Structures • Utilities • Transportation • Hydrography • Cadastral • Elevation • Imagery • Geodetic Control • …
Address and name information is associated with features that are collected at neighborhood extents for building entrances, structures, parcels, and landmarks. Also collected for addressable features at smaller map scales – such as for street centerline representations and place name locations. Names should also be collected for government and other administrative boundary units and for key landform/geographic features.
(US) State Business Drivers • Data integration between internal and external silos • Location is really the only method that can work • This drives a pattern • Get address and transportation data in order • Locate State and other facilities using addresses • Look at service/performance improvements and data integration applications • State Government functions • National Preparedness • Health • Agriculture • Education • Transportation • Taxation
SDI Challenges • We can’t analyze all of the applications • Need Best Practices for data models, datasets, web services that support business needs • Need to communicate the relevance of SDI to many applications/lines of business • Geo is horizontal/pervasive like security or communications networks • Need to blend GIS and IT methods for design, navigate through all of the options • Templates + process • Education and training • Need to do something useful!
Design Approach • Application-Driven Design • Understanding the GIS needs required to support business processes • Documenting the information products and layers required • Creating Application/Data matrices to document the high-level information needs
Hurricane Response and Recovery Agriculture in North Carolina Perimeter Control NC OneMap Disease Cluster Map Map It Tool
Thematic Layers
Multiple Scales of Use Neighborhood City Regional State and National GIS users work at a range of Geographies State City Neighborhood Regional
Geographic Information System Design Maps, Reports Applications & Information Products Display Use Layers Datasets Represent Layer Files, Map Elements Geodatabases, etc.
Some Background… • ESRI started 6-7 years ago on Case Tool work for ArcGIS • Basic requirement was to create a geodatabase from a Case Tool environment (Logical model) • Microsoft Repository initial implementation • Active Template Library C++ UML template for “custom features” was an initial goal (not implemented by programmers) • XMI perceived to solve issues because of “standard” • Implementation for Rose and Visio much more complicated than expected
Some Background… • Evolution • User feedback was that this is all too complicated, learning curve too steep • Tools do not help people to “discover” a good design (tool trap) • UML does not capture spatial models/patterns well • Many advanced rules hard to represent in UML • Topological rules, logical network associations • Conclusion is to model simple features and have tools that operate on that simple structure rather than sophisticated object models • In general UML has proven to be better for software design than GIS Design • Result: Diminished interest in Case Tools
Overall Design Process Supports information product needs Sketches UML/Analysis Diagrams Conceptual Model Logical Model Implementation/Representation decisions Case Tool UML for some projects Implements rules/ networks/topologies Specific data format and dataset decisions Physical Model Content / terms and concepts shared or Conceptual Model shared? Datasets Data Model/ Schema + Data
Tools/Outputs Sketches UML / ER Diagrams Spreadsheets, etc. Conceptual Model Logical Model Detailed design for data model implementation Decisions for representations, scale, datasets according to planned implementation/products Forward Engineering Implementation of rules and business logic Tool/technology specific Physical Model Forward Engineering Datasets Data Sharing Data Loading/QC
Conceptual Schema Sharing Issues Conceptual Model Conceptual Model • Case Tool auto-diagram creation and other reverse engineering capabilities generally poor (i.e., abstract classes, packages) • Conceptual Schema interchange may not save a lot of time or effort – so much entry/re-entry required between tools it is easier to start over in most cases • To be useful, many Geo extensions and detailed implementation decisions would have to be wrapped up in a common template(s) (i.e., how to handle field lengths, associations in object-relational systems, …) • UML Standards lack spatial/feature models • UML only addresses a part of the overall design • Not many people really understand UML diagrams Reverse (Cross) Engineering
Geo Extensions for UML • Further exploration/standardization to support common models • Spatial patterns • Associations • Primary and foreign keys • Topological • Rules and constraints • Database properties vs. object properties • Field lengths • Data types • Spatial patterns • Reserved words (tables, columns) • Spatial References and Grid sizes • Measures and Z values • Codelists, enumerations, domains • Behavior • Accommodate multiple physical approaches • Handle transition from Conceptual to Logical models • If it is possible to produce consistent XMI from multiple tools, there may be benefits • Template UML versioning for tools • Semantics checkers / validation
StreetName Community PrefixType PrefixDirection ProperName SuffixType SuffixDirection CommunityID CommunityID CommunityName StateorProvinceID AddressRange LowAddress HighAddress OddorEven(Pattern) Side StreetSegmentID Conceptual Model Diagrams * * StreetSegment StreetSegmentID FullStreetName * Key Point: Class diagrams do not represent different spatial models/patterns
Logical Schema Sharing Issues Logical Model Logical Model • XMI implementations can vary significantly • Visio is XMI 1.0 • Rational Rose (Unisys) is XMI 1.1 • This made ESRI goals for multiple CASE tool support difficult • Each logical model closely tied to implementation decisions • Not software vendor, but data model, technology and architecture decisions • Where, when, and how will business rules be enforced? • Database • Software layer • Batch process validation • These implementation decisions are key to logical design Reverse (Cross) Engineering
Physical Schema Sharing Issues Physical Model Physical Model • Features of different implementations will make a universal model difficult to create • Exchanges will tend to be lossy for rules, constraints, etc. (i.e., Geodatabase Feature Datasets, DGN Elements, Oracle Triggers, topology management) • General services oriented architecture trend indicates a simpler approach is needed • SOAs with contracts that hide implementation details Reverse (Cross) Engineering
CASE Tools in the Project Lifecycle Most project teams use CASE tools to manage schemas until going to Production – due to change Management issues.
Datasets Datasets Dataset/Schema Sharing Extract, Transform, and Load (ETL) processes • Benefits • COTS tools exist for ETL mapping and data management • Simple data structures dominate, most rules are private to implementation • Not all schemas will come from a Case Tool, GML Profile, or XML Schema document
Datasets Datasets Dataset/Schema Sharing Extract, Transform, and Load (ETL) processes • Formats: • Files • Databases • GML • XML Schema • Workspace • Recordset • Tools: • Safe Software FME • XSLT • ArcGIS Data Interop • ArcGIS XML Schema • Vendor/custom
Data Sharing Procedures Are ImportantThe Key for Integrating Our Individual Efforts Data Transformation • Formats • Data Models • Rules and Constraints • Projections / Datum GeoWeb Interoperability Procedures (ETL) Distributed Collaboration Semantic Translation National States
GeoWeb Enterprise Integration Location Based Services Consumer Mapping Sensor Networks Focused Applications GIS Networks Situational Awareness GIS Web Services Will Provide a FrameworkSupporting Many Geospatial Communities Over Time • Expanded GIS Services • More Synergy • Easier Exploration Tools • Pervasive Use The GeoWeb Will Evolve Rapidly Driven by Millions of Participants
Issues to be Resolved… • Schema/Semantic mapping between datasets, services, and products • EuroRoadS a good example • Partners map into common schema • State/Regional Aggregation to European datasets • End User Applications access consistent • Schemas (content standards) • Datasets (ETL/aggregation of regional data) • Vocabularies/search tools • Both Distributed and integrated datasets are required • Search tools must connect to a Catalog that has a knowledge of vocabularies/terms and relationships (more on this tomorrow)
New Jersey NJGIN Portal Facility Locations Metadata Catalog (with security roles) Data Web Services, Products Integrated Data Sets Federated Data Sets Produce Geocoding and address validation Consume County map services • Examples • Roads • Counties • Examples • Local Parcels
North Carolina NC OneMap Portal Floodplain Mapping Metadata Catalog (with preferred source designations) Data Web Services, Products Integrated Data Sets Federated Data Sets • Examples • Multi-Hazard Threat Database in Agriculture Dept. • NC Floodplain • Examples • All datasets registered with NC OneMap Produce OneMap Consume County web services
Federated System Requirements • Applications • User-driven tools • Web Services, Products • Built on datasets, data location will change over time but web services must be consistent • Portal/Viewer & Metadata Catalog • Includes metadata for federated and integrated datasets, services • Federated Datasets • Source datasets in published form • Integrated Datasets • Integration for multiple requirements (i.e, Euro Roads dataset)
Needs for Integrated Datasets • Consistency and Quality • Applications require consistent data structures for analytical purposes • Availability • Copies of data required in secure/highly available locations for business and mission critical applications • Security • Extract, Transform, Load and other practices required to ensure that sensitive data is not present in less secure locations. Application-level security not enough • Performance • Decisions to locate data in multiple locations should not affect architecture. It should be an engineering choice to support performance needs
SDI Template Concept • More than a data model, templates need to include: • Guidance and best practices on: • Data content, capture, and processing procedures • Data import and migration procedures • Production of standard map products • Data sharing agreements • Security standards and procedures • Hardware, software, and infrastructure standards • Operational use / sample applications • Data QA/QC, maintenance, and updating workflows • Infrastructure and hosting strategies
Summary • Suggest a dataset-level mapping approach rather than conceptual model interchange • Tools readily available • Fits data management lifecycle (ETL etc.) • Geo-UML extensions for common conceptual schema could be developed, but a transition to implementation models should be considered • UML and CASE tool environments do not capture all aspects of Geo models • Applications and Information Products • Content Guidelines and Strategies • Layers and Representations • UML methods alone have not produced good GIS data models • Datasets and web services most important for the GeoWeb