160 likes | 256 Views
Towards improving data presentation in the TripCheck system. Rafael J. Fernández-Moctezuma rfernand @cs.pdx.edu. You’re about to leave…. … and you’re addicted to TripCheck. So you go and check it out and see this:. What’s wrong with this picture?. What’s wrong with this picture?.
E N D
Towards improving data presentation in the TripCheck system Rafael J. Fernández-Moctezuma rfernand@cs.pdx.edu
You’re about to leave… … and you’re addicted to TripCheck. So you go and check it out and see this:
What’s wrong with this picture? Gray is not the new black!
Estimation may be betterthan no data on final products • Data may not be displayed for various reasons • Sensor failure • Data quality • May prefer to estimate the system state instead of displaying gray areas • Not enough sensors – but may still be able to recover information. • Must be careful with estimation – at least report a confidence factor.
System State Estimation • Must carefully choose good sources of correlated data • Every sensor station has its own estimator • The PORTAL project does a great job at archiving data – this makes statistical regressors for state estimation a good alternative. • May consider to rely on observed features from the past in addition to well-known transportation theory.
Regression • Find a description of data in terms of a function • Example: height (H) and weight (W) data transformed into a function F(H) = W.
Which functional family? • May consider a linear family first… … which can easily be derived (Least squares). May also consider the expected value of a conditional Gaussian: • A conditional Gaussian buys us statistics: The conditional mean is a linear regressor! Plus, estimating the joint is easy.
Which functional family? • May also want to consider non-linear functions. A good first approach is an Artificial Neural Network
Experimental results • Looked at rush hour (06:00 – 10:30) data from a “typical” Portland week, from US 26 E (Oct. 16 – Oct. 20 2006) • Found a segment that is typically shown gray (it is my commute, so I notice these things) • Inputs: current measurements of speed at nearby stations • Goal: come up with a good enough estimate to color the TripCheck map
Confusion matrices Milepost 73.62 Milepost 71.37 Prediction R Y G Prediction R Y G Observed R Y G Observed R Y G Linear 80% 100% Prediction R Y G Prediction R Y G Observed R Y G Observed R Y G ANN 100% 89%
Future work • May still be able to recover system information with a nonlinear model from far away stations • Still need to explore other segments and build a representative amount of model regressors (20% ?) to demonstrate effectiveness of the approach • What keeps us from using this approach to estimate intermediate location states?
Future work • May want to consider regressors with more inputs (shifted speeds, time, etc.) • If nonlinear regressors are effective, we may want to use Gaussian Mixture Models (cheaper to train, statistically rich) • Addressing quality in data presentation can be a sub-product of a more general problem: construct a framework for reliable system state estimation.