100 likes | 316 Views
Pipeline Architecture. Eugene Feld, Vladimir Zhukov. Introduction. What it is: Platform for integration of external (primarily bulk) content sources Used currently for core Map and DiCi Transit sourcing What it has: Set of tools for data cleansing and normalization
E N D
Pipeline Architecture Eugene Feld, Vladimir Zhukov
Introduction • What it is: • Platform for integration of external (primarily bulk) content sources • Used currently for core Map and DiCi Transit sourcing • What it has: • Set of tools for data cleansing and normalization • Engine(s) for cross layer spatial and non-spatial data conflation • Configurable workflows for source-specific processing • Supports various standard GIS data formats as well as custom formats • Allows for integration of COTS ETL tools into the processing pipeline
Technology Stack 2009 Source Formats Transit XML CSV ESRI Formats Workflow Visualization and Editing Apache Ant Computational Logic Spatial ETL Java SE ArcGIS Desktop ArcObjects Plug-ins FME ArcSDE Oracle PL/SQL Spatial Indexes
Architectural Characteristics 2009 All manual activities precede batch processing and integration to Core. No support for review/approve tasks within workflow Manual process kick-off, even for recurring source deliveries Core Map insert/delete via R2R/ClipTool. No support for update. Monolithic computational components. No support for scaling through distributed architecture. Exclusive write lock on Core Repos required for integration Low visibility in process status with ANT workflows Brittle and lengthy Reference NQ creation process from IWs
Business Reasons For Change • Significant map expansion of 2009-2010 shifts focus to rapid update of the newly added areas through external sources • Map Update ability is required in addition to Add/Delete which exists today • Emergence of online change collection • Need for event-driven, transactional source processing and integration • High volume of source deliveries calls for greater efficiency of map update process • Automated change detection • Visibility into automated processing workflows • Complimentary update detection logic between Postal, Map and Transit data processing is identified. • Emergence of common platform for source based change detection and processing
Pipeline Data Flow 2010+ BPM ArcDesktop Map Sources (ESRI) TrX Generation Change Candidates EGIS Turbo Map Geometry Change Detection Attribute Change Detection Asset Management and Access Interface DiCi Transit Sources (XML) NQ IW Source Normalization Attribute Derivation Refresh TBD Reference NQ Graph Conflation Engine
Functional View 2010+ Human Activities EGIS Asset Management Source2NTMapping Reference NQ New Coverage NQ New Coverage tNQ Sources (PGDB, SHP) Change Detection Normalization Correlation Tools jBPM Workflow Conflation Review Graph Conflation Attr Change Detection New Geo Detection Geo Change Detection Arc SDE Change Candidate Management Change Candidate DB Change Review TurboMapTrX generation Change Correlation Turbo Map Core Repos Production Cleanup Map IW DiCi RMOB2NQ DC Sync
Improvements circa 2010+ Manual review/approval tasks interspersed with computational tasks Automated change detection through Graph Conflation Engine Change Candidates are tracked, reviewed, and approved by users prior to TM submission Process kick-off of recurrent source deliveries is triggered by AM notifications Rich configuration for Reference NQ creation