310 likes | 554 Views
Steward system. Seokjoo Lee University of Florida seokjoo@ufl.edu. Table of contents. System configuration Operation Management and expansion Migration Hotter program. Steward System configuration. Steward data flow.
E N D
Steward system Seokjoo Lee University of Floridaseokjoo@ufl.edu
Table of contents • System configuration • Operation • Management and expansion • Migration • Hotter program
Steward data flow Every day, TSS data are delivered from D2, D4 and D6 into Steward. Data sizes are 100MB, 230MB and 110MB each.
Steward system configuration-ETL(1) • FTP server - Filezila FTP server version 0.9.30 beta version is installed to serve the TSS data transfer from each district. - C:\Steward_ftp is assigned for the FTP storage - Each district can access C:\Steward_ftp\DistrctX folder for its file transfer. - Host IP: cdw880.ce.ufl.edu Administrator User ID: cdw Administrator Password: trc513 District User ID: cdw_user_d1 ~ cdw_user_d8 District Password: steward08 • Data backup - 2BrightSpark SyncBack version 3.2.19.0 is used to backup FTP folders into local file storage daily - Two copies of backup are created into two local file storages (external hard drive: 1GByte each, USB interface)
Steward system configuration-ETL(2) • Data transformation - Custom-build SunETLUtility is used to transform the raw TSS data into the aggregated Steward data format - SunETLUtility can be executed in command-line or stand-alone program. - SunETLUtility needs the following subfolders: FacilityDataFolder: Input facility data folder ToConvertFolder: Input TSS data folder ConvertedDataFolder: Output folder for converted data/report GroupDataFolder: Output folder for group data - SunETLUtility command line format: SunETLUtility /dDx /iInputDataFolder /oOutputDataFolder " example) SunETLUtility /d"D2" /i"c:\D2Data" /o"c:\Steward\Input"
Steward system configuration-DB(1) • Steward database tools - Oracle Database 10g 10.2.0.1.0 Enterprise edition - Oracle Warehouse Builder 10g 10.2.0.1.311: GUI interface - Oracle Workflow 2.6.4.0.0: Scheduler • Steward database/web server - Dell Optiplex GX620 - Pentium 4 3.2GHz/ 4GB Ram/Windows XP3 SP3 - Seagate 160GB HDD (SATA I) for Oracle SW - Seagate 500GB HDD (SATA II) for database 1Oracle patch version 10.2.0.3 is expected to be used. metalink
Steward system configuration-DB(2) • Steward DB tables - Design Strategy for Steward database -- Keep the simple architecture to avoid the costly joins -- Allow duplicate data tables/views rather than on-the-fly computations - Fact tables (Cubes) -- Station-level tables: TSS_5min, TSS_15min and TSS_1hr -- Lane-level tables: TSS_5min_lane, TSS_15min_lane and TSS_1hr_lane - Dimension tables (Dimension) -- date_ref, time_ref, tss_station
Steward system configuration-DB(3) - Materialized views: -- SYSTEM_LANE_5MIN, SYSTEM_LANE_15MIN, SYSTEM_LANE_1HR -- VOLUME_MAP_5MIN, VOLUME_MAP_15MIN, VOLUME_MAP_1HR -- TRAFFIC_COUNTS_5MIN, TRAFFIC_COUNTS_15MIN, TRAFFIC_COUNTS_1HR -- PM_5MIN, TT_REL_5MIN -- MAX_FLOW_5MIN, MAX_FLOW_15MIN, MAX_FLOW_1HR -- EFF_DET_LEN_5MIN, EFF_DET_LEN_15MIN, EFF_DET_LEN_1HR - Materialized views are partitioned by Date and District and indexed by Station_ID, Hour and Date - Fact tables will be partitioned and indexed on next major revision - Materialized views are refreshed on demand. They will be updated into on-commit on next major revision
Steward system configuration-Web(1) • Web design concepts - Minimize the on-the-fly computations by using pre-developed materialized views - Minimize the complex queries on the client side by using the query functions where the materialized are not applicable. • Steward web tools - Microsoft Internet Information Services 5.1 for web server - Text-based ASP, JavaScript, HTML
Steward system configuration-Web(2) • Steward web architecture - Home - Overview - Resources - Maps - District data -- TSS Facility-Level Reports -- TSS Section-Level Reports -- TSS Station-Level Reports -- TSS ETL Reports -- TVT Station-Level Reports -- TVT ETL Reports -- Report Archive
Steward system configuration-Web(3) • Steward web pages
Steward operations -ETL process (1) • D2, D4 and D6 pushes archived TSS data into Steward FTP server daily at 3AM, 2AM and 2AM. - Average file sizes are 15MB, 30MB and 15MB in compressed file format (100MB, 230MB and 110MB in uncompressed format) - File location: c:\Steward_FTP\DistrictX • File backup and uncompressing - Backup location: u:\DISTRICT_DATA\Districtx\rawdata - Uncompressing location: c:\Program Files\TRC\SunETLUtility\ToConvert • Data transformation - Raw traffic data are processed using custom program (SunETLUtility) - Process the data files -- Assign Station/Lane ID -- Aggregated 1min, 5min, 15min and 60min lane/station data -- Run the basic quality check
Steward operations -ETL process (2) -- One input data file would generate one converted file, one daily report, eight aggregated data file. For example, TSS-03032009--1.dat from District 2 would generate the following output files ConvertedData\D2_TSS-03032009-0.csv ConvertedData\D2_TSS-03032009-DailyReport.Log GroupData\D2_TSS-03032009-1S.csv GroupData\D2_TSS-03032009-1L.csv GroupData\D2_TSS-03032009-5S.csv GroupData\D2_TSS-03032009-5L.csv GroupData\D2_TSS-03032009-15S.csv GroupData\D2_TSS-03032009-15L.csv GroupData\D2_TSS-03032009-60S.csv GroupData\D2_TSS-03032009-60L.csv - Report process results to Steward Admin and users - Archive all the processed files and prepare the data loading -- Backup location: u:\DISTRICT_DATA\Districtx\processeddata\ConvertedData u:\DISTRICT_DATA\Districtx\processeddata\GroupData • Data backup - Steward_FTP folder are backup into U drive at 4AM daily - U drive are back-up into the identical V drive at 5AM daily
Steward operations -ETL process (3) • Data loading - Oracle Process Flow modules are used to load TSS data files - For one raw TSS file, six processed data files are loaded (5min, 15min, 60min station and lane data) - Oracle Process Flow modules can be executable manually using Control Center or automatically using Oracle Schedules
Steward operations -ETL process (4) • After data uploading, data changes can be verified from their table row counts.
Steward operations -ETL process (5) • Database performance can be monitored from Oracle Enterprise Manager
Steward operations - Current status • Traffic data from districts *: Daily data are not received, data CRC error or our ftp server issues.
Steward operations -Web • Steward web statistics
Management plan • Daily data loading - Daily transform report by email • DB performance - Oracle enterprise manager console • Web statistics - WebLog Expert Lite version 5.6
Expansion scenario (1) • New data for CDW: three possible scenarios • Limited information on district changes/information - New HOT lanes in District 4 and 6 - New facility (I-75) in District 4 - Limited information (I-295 milepost) in District 2
Expansion scenario (2) • Expansion might cause the performance issues on preprocessing, DB performance or web interface. • Unexpected system configurations could impact the system modifications: - Two HOT lanes in District 4 and District 6 - HOT lanes in NB are operational but SB will be implemented later - 30sec aggregation interval in D5
Expansion scenario (3) • New analysis report - New materialized views needs be created. - Web interface needs be revised
Statewide traffic CDW at other states • Rob Hranac at Berkeley Transportation Systems defines the progress of Archived Data User Service (ADUS) as follows: - Data Reports (for decision support system) Application (web 2.0) Prediction Control Automation • Most of states are in the stage 2 (Report) and move toward stage 3 (Application) • PeMS at California are developing integration of traffic data and incident data on GIS. • PeMS are trying to integrate the transportation simulation model with the traffic data • WisTransPortal integrated traffic data database with lane closure system • U of Maryland developed traffic data warehouse for Washigton DC, Maryland and Virginia • U of Maryland developed the web-based analytical tools for the traffic data and incident data
Migration plan • Web service - Web sources are server independent and ready to migrate • Database - Migrate Oracle warehouse builder metadata - TSS data will be re-loaded rather than table migration - TSS data tables will be reconfigured for Index and partitions - Refresh method for TSS materialized views will be reconfigured from on-demand to on-commit • ETL process - ETL process will be customized for new system
Hotter program • Utility program for analyzing the operation of High Occupancy Toll (HOT) lanes within SunGuide traffic management systems in Florida. • I-95 in D4 and D6 has HOV lanes on SB and HOT lanes on NB • Input data are generated from Steward manually • Hotter program generates analysis report on HOV/HOT lanes in comma-delimited (CSV) file format • Results may be plotted if desired from the analysis files