270 likes | 278 Views
Explore GIS applications in transportation & logistics, focusing on online geocoding methods and data matching strategies for improved accuracy. Learn about two-stage matching procedures and experimental results.
E N D
Some Recent GIS Applications in Transportation and LogisticsNew York Metropolitan Transportation Council November 19, 2008Fan YangCity College of New Yorkfyang@ce.ccny.cuny.edu
Outline • Introduction to GIS • Online Geocoding Methods • Web-based GIS solutions • GIS in Logistics
Introduction to GIS An Information System For Maintaining and Using Spatial Information Views Products Updates Analysis Mobile / LBS Mission Critical Applications A Generic Platform for Working With Geographic Information
Online Geocoding Processing • Data cleansing from multiple data sources • Various errors might exist in the input data ‘dirty’ input data ‘clean’ output data State DOT Traffic Information Service Provider (ISP) Local Agencies State Patrol Clean Reference DB Radio Data collection Data cleansing Data disseminating
An Example: Online Incident Locator • Match the similarity between an input record and reference ones Input: Reference:
Data Matching • Various input data errors including spelling mistakes, truncations, inconsistent conventions and missing fields
Two-Stage Matching • 1. Use inexpensive metric to quickly find a relatively small candidate set; • 2. Identify the best matches for the input within the candidate set in terms of the similarity score. • Use the offline pre-built similarity index to improve the performance for online operations. • Three ways to build similarity index in the first stage • Build on the whole words in every important column • Build on every token in every important column (token based) • Build on every q-gram in every important column (q-gram based)
Token and Q-gram Based Matching Methods • Token based: • Build tree or hash based similarity index upon all tokens in important columns of all reference records. • Candidates should share at least one common token for each of the columns “main_base” and “cross_base” with that of the input record (e.g., “Mountain Viw”/ “Mountain View”). • Q-gram based: • Divides a token (word) into character groups (grams) with equal length q. E.g., for “Redlands”, if q=3, six q-grams “Red”, “edl”, “dla”, “lan”, “and”, and “nds”. • Build tree or hash based similarity index upon all q-grams. • Candidates should share at least one common q-gram for each of the columns “main_base” and “cross_base” with that of the input record (e.g., “Moutain Viw” / “Mountain View”).
W a t e r m a n | | | | | | | | W a t _ _ m a n | | | | | | | | 0 0 0 1 1 0 0 0 Second Stage: Measuring Similarity Score • Edit distance ed(s1,s2): minimum number of character edit operations (delete, insert, replace) required to transform s1 to s2, divided by the maximum length of s1 and s2. • IDF weight: more frequent, less weight. • The record similarity function: is the cost to transfer record u to v, proportional to ed(u,v). ed(s1,s2) = 2/8
Experimental Results • The road network (reference table) - the processed TIGER database in Los Angeles Area. • Based on 500 geocoded (correct) incident records in downtown LA, we randomly generated “dirty” input data
Online Performance With the pre-built index, only a small portion of reference data is retrieved to match an input record, therefore, significantly improving the online performance.
The Size of the Candidate Set • The smaller q value means finer granularity, and may catch more candidates which might be missed for larger q values. • The size of the candidate set increases as q value becomes smaller.
Remarks • Proposed two efficient approximated matching methods for online incident data cleansing. • A two-stage matching procedure is developed to significantly improve the online performance. • The q-gram based method outperforms the token based one in terms of match accuracy. Suggest q=3. • This study can be applied to ITS online data management such as loop detector data and construction data. More geographical information can be accommodated.
Outline • Introduction to GIS • Online Geocoding Methods • Web-based GIS solutions • GIS in Logistics
Consume fewer licenses and require thinner client. Provide rich spatial analysis and editing functionalities. Satisfy service Oriented Architecture (SOA) Provide SOAP (Simple Object Access Protocol),WMS (Web Map Service), KML(Keyhole Markup Language) based services. Why Web-based GIS Solutions
NYSDMV Application Accident Location Information System (ALIS)Location Editing, Query and Reporting • Integration with NYSDMV and NYSDOT Legacy systems • Multi-Agency Effort • Web Application Host • NYSOFT • Data Management • NYSCSCIC • Application Users • NYSDMV • NYSDOT • NYSCSCIC • GIS Data Co-op (Local Government Agencies)
ALIS: Web-based GIS Application • Automatically verifies the location information against the GIS basemap • Allows users to edit or update accident locations based on the availability of improved map data in a region or the availability of more information pertaining to the accident case. • Allows users to monitor and record changes made to the geospatial database. • Provides users the ability to select street segments for editing using either spatial queries, attribute queries, or network tracing.
Outline • Introduction to GIS • Online Geocoding Methods • Web-based GIS solutions • GIS in Logistics
“Using advanced Geographic Information Systems (GIS) tools and methods in conjunction with existing infrastructure and procedures in order to solve logistics problems” What is “GIS Logistics”? • Main Applications: • Site Selection Analysis • Asset and Property Management • Territory Optimization • Real-time Dynamic Routing and Scheduling • Supply Chain Management
Overwhelming planning task Efficient routes are not guaranteed Dependent on local knowledge Why use GIS Logistics?
What is Territory Optimization? • A periodic vehicle routing solution in a big territory • Distribute periodic orders among available trucks/drivers • Input: service requests, truck schedules, and business rules • Output: truck daily schedule • Large-scale problem size and complicated business rules • The goal • Balance workloads among employees • Minimize total travel time (by all trucks over the entire planning period) • Minimize time window violation • Minimize overtime
Territory Optimization Tool Bar Map View List View Explorer Tree View Gantt Chart View
Desktop App Client Enterprise Systems Web Service Bus GIS Server GIS Enterprise Database Routing & Scheduling System Architecture
What is Real-time Dynamic Routing and Scheduling? • Customers call for periodic service requests • Used to determine optimal truck schedule candidates • Need to be served in a real-time fashion • Might change the existing daily schedule
Real-time Dynamic Routing and Scheduling • Efficiently handles recurring service requests • Dynamically constructs the service tree structure • Considers combinations of feasible employees and date ranges • Re-sequences a daily route by solving a Vehicle Routing • Problem with Time Windows (VRPTW)
Thank you! Questions?