230 likes | 242 Views
Machine Learning Approach to Report Prioritization with an Application to Travel Time Dissemination . Piotr Szczurek Bo Xu Jie Lin Ouri Wolfson. Agenda. Background Model and Problem Definition Machine Learning Approach Application – Travel Time Dissemination Results Conclusion.
E N D
Machine Learning Approach to Report Prioritization with an Application to Travel Time Dissemination PiotrSzczurek Bo Xu Jie Lin OuriWolfson
Agenda • Background • Model and Problem Definition • Machine Learning Approach • Application – Travel Time Dissemination • Results • Conclusion
Background • Technology in vehicles • Computers • GPS • Communication devices (802.11p, C2C) • Sensing of environment • Video cameras • GPS • Temperature • Automobile status: break sensors, accelerometers
Background • Dissemination of information • Limited by connectivity and bandwidth • Store-and-forward communication • Information is stored in a local database of limited size • Addresses connectivity issues • Prioritization • Not all information may be communicated • Not all information may be stored • Need to select most useful information to be kept and communicated
Model and Problem Definition • System: • Set of mobile nodes: physical entity capable of data computation, storage, and short range wireless communication • Nodes observe environment through sensing device (e.g. GPS) • Reports: • Data derived from the sensing device • Fixed set of attributes and their values. • (all reports have fixed size) • Created over time by nodes • Examples: • Speed report (average speed, timestamp, vehicle id) • Parking space report (parking meter id, availability) • Once created, stored in report database
Model and Problem Definition • Report database • Local database maintained by each node • Limited in size • Communication • Reports stored in the database are communicated over time to a subset of other nodes in the network • Broadcast communication: reports are sent to all nodes within transmission range • Communication protocol: decides when and how many reports to broadcast • Remaining question: which reports should be broadcast?
Model and Problem Definition • Relevance value • Utility a report holds when it would be sent to other nodes, given the sending node’s current characteristics and the attribute values of the report • Highly application specific, difficult to specify • Value of a report can change over time • Can be a range of values (0..1) or Boolean (0 or 1) • Example: parking space availability (0 for occupied, 1 for not occupied) • What to broadcast and keep in report database? • Find relevance value for each report and keep (or broadcast) the highest valued reports • Problem: finding the relevance value of a report
Machine Learning Approach • Idea: use received reports as input to a machine learning process • Assumptions: • Nodes can judge the relevance of a report after it is received • Relevance value is based on a goal common to all nodes • Method description: • Define a goal on which the relevance value is based • Relevance value of a report is then defined based on how close the report achieves the goal • For every incoming report, use report’s attributes and sender’s characteristics as input values. Use relevance value as output. This creates a training example. • Use a supervised machine learning algorithm to find a model for mapping inputs to outputs. • Use learned models to find the relevance value of a report
Machine Learning Approach • Two ways of learning: • Online: models are updated while training • Offline: • First, collect training examples • Second, use learned model • Offline learning • Advantage: Nodes do not incur overhead of learning • Disadvantage: model is not adaptable • Can also be used to bootstrap online learning • Used for finding useful attributes • Research questions: • Can the relevance value be learned? • What advantage does the learned model offer?
Application – Travel Time Dissemination • Assume every vehicle in system carries GPS, on-board computer with communication capabilities (e.g. 802.11b) • Each vehicle has a known destination to which it travels along the shortest path • Vehicles measure travel times on road segments as they traverse them • Travel times are encapsulated by reports. Each report contains: • Report ID • Road segment ID • Travel time • Time of measurement • Reports are stored in reports database of a limited size (200 reports) • Reports database is a list of all received or generated reports. List is ranked by ranking function. If database size is exceeded, lowest ranked report is discarded. • Example of ranking function: r=1/ageOfReport
Application – Travel Time Dissemination • Reports are disseminated over VANET • Incoming and newly generated reports are used to update a digital map • Digital map contains: • Road segment identifier • Coordinates of the segment endpoints • Road type • Travel time estimate (average of all reports for latest time interval; initially free-flow) • List of reports used for the estimate • Time period number (indicates 5-minute interval; initially -1)
Application – Travel Time Dissemination • Travel time updates • Executed at end of each 5-minute interval • All reports generated or received within that interval are used • For each road segment in digital map: • Reports for the most current period are identified. All others are discarded. • Report period number is then compared with that in digital map: >: Time period is updated and all reports are inserted in list. Travel time estimate is average of all inserted reports. <: All reports are discarded =: All reports are inserted in list; duplicates are discarded. Travel time estimate is average of all inserted reports. • After each update, vehicles recalculate the shortest path to their destination
Application – Travel Time Dissemination • Communication • Based on TrafficInfo algorithm • Combination flooding/periodic broadcasting • Flooding for freshly created reports • Periodic broadcasting of subset of reports from report database. Subset is chosen based on ranking function. Highest K ranked reports are chosen. • Size of subset (K) is determined by Good Citizen Formula. • Based on transmission range, node density, and last broadcast time • Broadcast period is determined based on transmission range and vehicle velocity
Application – Travel Time Dissemination • Example: • Vehicle A just traversed road segment 123 at time of 1:04pm (time period 2). The recorded travel time was 10 minutes. • Vehicle A creates a report with ID 1, using the measured travel time. Report contains: • Report ID (1) • Road segment ID (123) • Travel time (10 minutes) • Time of measurement (1:04pm) • Vehicle A updates its digital map at 1:05pm. Currently, it holds no reports for segment 1. The following changes are applied for road segment 123: • Travel time estimate = 10 minutes • List of reports: [report 1] • Time period = 2
Application – Travel Time Dissemination • Example (continued): • Vehicle A broadcasts report 1 at 1:06pm (in time period 3). • Vehicle B receives the report. • Vehicle B updates its digital map at 1:10pm. It currently has one report (report 2) for segment 123, with travel time of 11 minutes, for time period 2. The following changes are applied for road segment 123: • Travel time estimate = 10.5 minutes • List of reports: [report 1], [report 2]
Application – Travel Time Dissemination • Learning the ranking function (offline) • Goal of application: vehicles choose the best (shortest) paths to their destinations • Relevance of a report: report is good when it changes the shortest path • 0 if report does not change path • 1 if report changes shortest path • Attributes: • Age of report in time periods • Distance to road segment in terms of free-flow travel time form vehicle’s current position to the road segment contained in report • Road type: either highway or city street
Application – Travel Time Dissemination • Learning (continued) • Learning examples created artificially by emulating different scenarios • 25 learning epochs: • Each epoch had vehicles placed randomly on a road network (region of Chicago) • Random destination for each vehicle • All vehicles have digital map with one report containing free-flow travel time and random period number between 0 and 100 • Random segment is chosen from the road network. Its travel time is chosen from a uniform distribution between 0 and free-flow travel time • 101 reports are created for each vehicle with ages 0..100 • Each report is about the chosen road segment and contains the assigned travel time • Every vehicle applies the 101 reports independently. After each is applied it is checked whether the shortest path would change. • If report would change path, a positive training example is created; otherwise a negative training example is created • Two road networks were used (from different regions of Chicago). On smaller region, 100 vehicles were used; 250 was used for the larger region.
Application – Travel Time Dissemination • Learning (continued) • Weka learning toolkit was used for learning • Negative examples were downsampled to match positives • 7677 positive and 7677 negative examples • 5 classifiers were tested: • Naïve Bayesian (NaiveBayesWeka implementation) • Logistic Regression (using Logistic Weka implementation) • Support Vector Machines (using SMO Weka implementation, w/ buildLogisticModels enabled) • Artificial Neural Network (using Multilayer PerceptronWeka implementation) • Decision Tree (using J48 Weka implementation)
Results • 10-fold cross validation • All algorithms, with exception of decision trees, performed similarly with an accuracy of approximately 83-84% • The decision trees had the best accuracy of 96.22% • But unusable model: complex tree with most leaf nodes being homogeneous • Logistic regression model most understandable: U = -0.0322*age - 0.02*distance + 0.3885*[road=highway] – 0.3885*[road=city street] + 4.9053
Results • 3 logistic regression models were derived: • Using region 1 examples • Using region 2 examples • Using examples from both regions • No major difference in coefficients for all models, except for road type • This shows that the relevance values are dependent on the makeup of road network • Usefulness of derived models in report prioritization • Tested SWANS/STRAW simulator • 100 vehicles were randomly placed in region 1. Each travelled to random destinations for 1 hour. • Majority of highway segments had reduced speed limits (simulated accident scenario) • Number of broadcasted reports limited to 10 • Two evaluation metrics were used: • Average Trip Time: average time to reach destination • Total Path Travel Time Difference: calculated by taking the absolute value of the difference between travel time along the shortest path given vehicle’s current knowledge and full knowledge • Compared to common heuristics (1/(age+distance) used by TrafficInfo)
Conclusion • Proposed a machine learning approach to report prioritization for use in peer-to-peer environments • Uses incoming reports in order to provide input to supervised machine learning algorithms • Learned model can then be used by all nodes in order to rank the reports to be disseminated • Accurate prediction is feasible • Learned model outperformed heuristics in terms of disseminating the information most likely to affect the vehicle’s path