250 likes | 397 Views
Dempsy - Stream Based “Big Data” Applied to Traffic. Television. Radio. Traffic.com Sensor Network. Internet. DOT Sensor / Flow Data. Wireless. Incident and Event Data. Historic Data. In-Vehicle. Probe Data. Collection. Fusion. Dissemination. Traffic End-to-End. Data Fusion.
E N D
Television Radio Traffic.com Sensor Network Internet DOT Sensor / Flow Data Wireless Incident and Event Data Historic Data In-Vehicle Probe Data Collection Fusion Dissemination Traffic End-to-End Data Fusion
Overview of Arterial Model Probe Data • Map Matcher Matches the probe data to road network in real time with associated probabilities. • Path Analysis Routes between pairs of probable matches across chains and applies a Hidden Markov Model to the results to determine the most likely path through a set of points. • Travel Time Allocation Assigns the path travel times to the appropriate arterial segments. • Arterial Model Combines expected values with the allocated travel times and previous estimates into the current estimate Map Matcher Path Analysis Travel Time Allocation Arterial Travel Times Arterial Traffic Data
Width of the Road • Center the normal distribution over the probe reported location • Compute the distance from the peak of distribution to the edges of the road. • It is possible to estimate road width from the number of lanes • Integral of the normal distribution gives the probability of the probe being on that road.
Technology Survey • Streams Processing Engines • Hadoop / Map Reduce • “Distributed Actors Model”
Technology Survey • Streams processing engines • Oracle, IBM, SQLStream • Not a good fit. More for relational data processing. • Hadoop Map Reduce • Not a good fit for low latency computations (15 to 30 minutes per batch) • Hbase Co-processors are a possibility but more of a hack • Actors Model • S4, Akka, Storm • Just what we need
Dempsy – Distributed Elastic Message Processing System • POJO based Actors programming abstraction eliminates synchronization bugs • Framework handles messaging and distribution • Fine grained partitioning of work • Elastic • Fault tolerant
Dempsy – Distributed Elastic Message Processing System • Separation of concerns – scale agnostic apps versus scale aware platform • Support code quality goals (guidelines, reuse, design patterns, etc) • Functional programming (-like) • Map Reduce (-like) • Distributed Actors Model (-like)
Dempsy MP Container Cluster MP Container Cluster ZooKeeper ZooKeeper MP Container MP Container Distributor MP Container MP Container
System Characteristics - DevOps • Manage every node and every process in exactly the same way. E.g. arterial, path analyzer, map matcher look the same to an operations person. • Everything runs on exactly the same hardware • Scale elastically. To increase throughput, just add a machine to the cluster – no extra work required. The system can even be automatically scaled as load increases. • Robust failure handling – no real-time manual intervention required when nodes fail. • Development, QA and Integration teams can use a pool of resources rather than dedicated resources. The pool can grow elastically as required by overlapping project schedules
Map Matching and Path Analysis as an Example • Algorithm decomposition • Discrete Business Logic Components • Map Matching • Vehicle Accumulation • Path Analysis (currently A* routing) • MP Addressing • Tile based addressing • Addressing by vehicle id • Tile based addressing
Dempsy – Arterial Model Example MapMatch MP Vehicle Accumulator MP PathAnalyzer MP TravelTime MP TrafficState MP Adaptor OLTP x 1 Key: tile x 40k Key: probeId x 10M Key: tile x 40k Key: tile x 40k Key: segment Id x 2M Traffic Reporter MapMatcher Singleton PathAnalyzer Singleton TravelTime Singleton TrafficState Singleton X 9 Every 60 seconds x 50 x 50 x 50 x 50 Linkset Astar Graph Traffic History Segment Table Extract Analytics Distributed Log Collection Quality & Audit Logs App Logs Distributed File Storage
Dempsy Testing and Analysis • Decomposed Arterial (MegaVM) into Dempsy Message processors • Implemented first two stages of Arterial, Map Match and Path Analysis • Implemented Message Processors as trivial POJOs around existing mapmatch and path analysis libraries • Wrapped into a Dempsy Application • Front ended with Dempsy Adaptor to read probe data from files and inject them into Dempsy • Deployed to Amazon EC2 to prove out scaling, collect performance data, and analyze behavior of system under load • Three main rounds of testing • Original HornetQ Transport (Sprint 6.2 ) • Lighter weight TCP/Socket Based Transport (6.3 Sprint) • More finely grained Message Keys (6.3 Sprint)
Distributed Map Match /Path Analyzer Testing • Ran multiple tests on EC2 with increasing number of Dempsy Nodes • Scaled Map Match in Parallel • Used a constant number of Probe Readers, empirically set at 3
Development Life Cycle • Write Message Processor (MP) prototypes • Configuration using the Dependency Injection container of your choice (currently supports Spring). • Develop using one node or pseudo distributed mode • No messaging code to write • No queues • No synchronization • Narrow scope of concern – each processing element deals with only a limited set of data. There may be millions of processing elements. • Simple debugging and unit testing
Trade-offs • There’s no free lunch • Sacrifice guaranteed delivery, message ordering, message uniqueness • Gain response time • Gain simple clustering • Gain memory efficiency (no queuing) • Gain lower latency under load • Where does this work • Statistically based analytics • Techniques where sacrificing input data quantity results in low output quality • Where doesn’t this work • Transaction based systems • Techniques where a message results in ‘false’ results (e.g. bank transactions)
Dempsy – Mp Lifecycle diagram Start Message Processor Prototype Startup Start Message Processor Construct @Start Proposed Addition Proposed Addition Future Addition message Prototype Ready clone() finalize explicit instantiation Activate jvmgc No Activate @Activate @Passivate Elasticity jvmgc Passivate message Ready no eviction eviction complete @Evictable scheduled evict check @MessageHandler complete output scheduled output @Output