240 likes | 508 Views
Hui Lei, Fan Ye IBM T. J. Watson Research. Mobile CrowdSensing and Context-aware Real-time Data Fusion in MCS Applications. Outline. Mobile Crowdsensing and its applications A general MCS platform and brief research agenda Conext-aware real-time data fusion in MCS applications
E N D
Hui Lei, Fan Ye IBM T. J. Watson Research Mobile CrowdSensing and Context-aware Real-time Data Fusion in MCS Applications
Outline • Mobile Crowdsensing and its applications • A general MCS platform and brief research agenda • Conext-aware real-time data fusion in MCS applications • Relevance to Army scenarios • Summary Contributions from Han Chen, Raghu Ganti as well
What is Mobile Crowdsensing? Mobile sensing devices are pervasively available and are a rich and inexpensive source of sensor data One data point: 59.6 million iPhone users Sensors embedded in the iPhone: GPS, accelerometer, gyroscope, ambient light, proximity, microphone, and camera sensors Mobile crowdsensing(MCS) refers to applications that leverage consumer mobile devices (GPS, smart phones, and car sensors) to collect and share data about the user or the physical world, either interactively or autonomously, towards a common goal Mobile crowdsensing enables a new category of applications, both participatory and opportunistic, including smarter city, smarter transportation, and smarter energy applications without requiring major investments in the sensing infrastructure Mobile sensor data collection, analysis, and consumption
Example MCS application: Public safety • Sample phones to obtain human crowd density, movements in large areas to maintain safety of the public • Selectively sample a set of devices for location, density, temperature, noise levels • Aggregate to obtain global picture of crowd density and movements • Handle emergencies such as disasters (e.g., hurricanes, high-rise building fire) and terror attacks • Authorities (police, fire) can gain an overall picture of the situation at different places to prioritize and coordinate the response • Individuals can report their locations (e.g., room/floor when trapped in a high-rise) and video/picture/voice/text report of detailed situation • Public event order maintenance • Stampede causes numerous deaths/injuries • Cambodia bridge 10’, 350 deaths; Chicago station nightclub 03’, 100 deaths/230 injured; German Love Parade 10’, 19/342; Mecca Saudi Arabia, more than 2000 deaths since 90’ • Tens of incidents around the global even in 21st century and thousands of deaths/injuries • Authorities can estimate the number of people participating a public event (e.g., protests, aggregations, open-air concerts) and ensure the orderly movements of people • Help tracking and modeling of disease dissemination (e.g., SARS) • Based on the movement pattern and density of crowd flow, CDC may build better models to predict the scope and speed of disease spread and take appropriate control measures
Mobile Crowdsensing Presents Unique Challenges(compared to conventional sensor-based applications) The population of mobile sensing devices is highly dynamic. There may be excess or gaps in sensing capabilities at times. Depending on resource availability on the device, the sensing function is not always available for external use. Crowdsensing data may contribute to many diverse use cases, while a conventional sensor network typically supports a single use case Human participants are an important part of crowdsensing. A social architecture with incentive mechanisms is required to recruit, engage, and retain the human participants. The privacy of the human participants must be preserved.
Current State of the Art: Application Silos • Current state of the art as verified by nearly 30 existing crowdsensing applications studied • Each application requires two application-specific pieces, one on device and one in the backend • Push data from device to backend, with optional primitive processing on device, upon triggering conditions (e.g., entering a store, on a bus) • Limitations of the current paradigm • Inability to scale • Phones have a cap on the number of concurrent applications • Data gathered from societal-scale sensing may overwhelm network and backend server • Low efficiency: Applications perform sensing and processing activities independently without understanding the consequences on each other • Likely duplicate sensing and processing across applications • No collaboration or coordination across devices. Devices may not all be needed when the device population is dense • Hard to program • Applications have to address challenges in energy, privacy and data quality in an ad hoc manner, reinventing the wheel all the time • Applications have to deal with heterogeneous devices, limiting the number of device platforms an application can run on App k App 2 App 1 App1 App1 App1 App2 App2 App2 App k App k App k
Our Vision: A general MCS platform app1 app2 app3 Smart grid Smart building Smart healthcare Smart supply chain Data Center Domain Analytics Library MCS Data Broker Application Gateway Wide Area Network MCS Gateway Access Appliances Mobile Sensing Devices MCS Device Agent Social Architecture MCS Data Collector Develop a general platform reusable for different mobile crowdsensing applications • On devices: Provide middleware components that run on mobile sensing devices for enabling crowdsensing in a coordinated, privacy-preserving, and energy-adaptive manner • MCS Device Agent: supports interaction with the backend infrastructure • Social Architecture: supports interaction with human participants in crowdsensing • MCS Data Collector: supports interaction with embedded sensors • At the cloud backend: • Provide MCS Gateways at the edge of the network to connect with local mobile devices and present an aggregated view of vicinity sensor data to backend applications • Provide an MCS Data Broker as a backend application service that consolidates data needs from multiple applications and discover and orchestrate data needed by applications • Build a rich Domain Analytics Library for processing temporal-spatial crowdsensing data to derive domain-specific insight
Research Agenda in Brief • Understand data needs specification from applications and negotiate with gateways to select those that can provide the requested data • Maintain metadata about device and aggregate data availability • Select devices that can produce the desired data and generate sensing directives to configure their sensing plug-ins • Monitor changes in data availability and quality, and make adaptations to ensure data needs are satisfied continuously • A set of commonly used sensing analytics running on devices • A device agent that coordinates the sensing and local processing activities for efficiency • Application specific problems in both local sensing and backend mining
Real-time Data Fusion in the Monitoring of Human Crowd Distribution and Movements • In airports, railway stations, public gathering, how to gain an accuracy overview of the human crowd distribution and their movements? • Conventional infrastructure based methods (e.g., camera-based) have drawbacks • Constrained by angle, light conditions, moving speed, density of population • Not easy to track the movements across cameras • Complete coverage requires careful planning • Cost in installation and maintenance of the infrastructure • Using the sensors (radios, microphones) on phones, they can detect devices in vicinity and estimate the size of local neighborhood • Obtain overall distribution and movements by the fusion of information from individual devices • Interesting tradeoff between efficiency and real-time • Least latency when phones are sensing and reporting continuously, but with worse efficiency in energy and bandwidth • Key to efficiency: the scanning frequency of each device should depend on 1) how much changes has happened since last time, and 2) how much other devices have already reported • More changes in neighborhood, more frequent scanning of the neighborhood • Devices can piggyback how much they have reported during scanning exchange; or a waking up device can obtain hints from backend first • Generalize to an optimization problem: given a tolerable latency, how should devices adapt their scanning such that the energy consumption is minimized? Joint work with University of Minnesota, Tian He
Food Queue Walking Food Queue Check-out Queue Walking Walking Context-aware Data Fusion in Queuing Time Estimation • Queuing time is an important piece of information in many application scenarios • Queues at check-in counters, security, restaurants in airports; check-out lines in supermarkets • Important for airport operators to foresee potential bottlenecks and for passengers to plan their journey • Infrastructure based methods require significant human and monetary costs in installation and maintenance (e.g,. BLIP systems) • Use the accelerometer data to infer the changes in human activities and estimate queuing time • Sample data is promising: different patterns for queuing comparing to walking • Use context input to improve accuracy and reduce false alarms • A person wondering around may be falsely interpreted as in a queue • Leverage the context: similar temporal patterns from different passengers in the same spatial scope indicate they are more likely in a queue • Those in the same queue move in subsequent order, and move in about same distance / time • Collecting more data to measure how reliable the context is to differentiate between wondering and queuing • Generalize: how to build a framework to exploit the spatial and temporal correlation in context to more reliably infer the information? Joint work with UIUC, Jiawei Han
Penetration Threshold for Reliable Data Fusion • Samples may be sparse: not everyone carries a phone; not every phone installs the MCS agent • What’s the minimum penetration threshold needed for reliable fusion of results? • Examples: how many samples needed for 90% confidence in queuing time estimation, or fuel consumption prediction? • Model the fusion from theoretical perspective • Human arrival follow Poisson distribution; service time follow exponential distribution • Derive the statistical bounds for the threshold to achieve a certain confidence level of estimation • Compare against empirical data for validation • Collect the fuel consumption data from ~100 truck fleet • Use data from different fractions of the trucks to predict the overall fuel consumption • Find the minimum fraction needed to have predictions within certain margins to actual numbers • Generalize • if only a certain percentage of devices can be sampled, which ones to choose so as to maximize the confidence • How to progressively make the selection, each time using the previously sampled subset as the context to help choose the next batch? Joint work with UIUC, Tarek Abdelzaher
Relevance to Army problems • In non-conventional combat scenarios that require crowd control, e.g., civilian order maintenance, peacekeeping • Monitor the distribution and movements of crowds of different mixtures • Gain a broad and real-time view of the overall situation • Complementary to existing technologies relying on special hardware or infrastructure (cameras, UAVs) • Support both overview of large crowds and zoom-in on finer spots • Instruct devices around interesting phenomena to sample on more modalities, higher frequencies, or finer granularities • In disaster response where ground conditions and relief resources change quickly • Monitor the evolving situation for victims that need care and attention • Keep updated about the location of response personnel and amount of relief supply • Prioritize and coordinate the response efforts such that personnel and supply are directed to most urgent cases • The MCS platform has broader impact on intelligence collection in the battlefield • Each GI / vehicle can be equipped with various sensors and mobile devices • Multiple applications, each of which serving a different purpose (e.g., one for road conditions and one for suspicious activities), can run in parallel on the same MCS platform, collecting data from the same underlying set of devices and sensors • The MCS platform will handle the dynamic changes in device population, mobility, and resource levels to ensure quality in application data
Summary of the status • MCS is a new paradigm to build large scale sensing applications but the current approach has major drawbacks • Defined the architecture of MCS platform and functions of its components • Identified a number of key application scenarios and a research agenda to drive the development of the MCS platform • Public safety • Airport pax flow monitoring • Initial results for context-aware real-time data fusion issues in a few driving application problems, with collaborators from schools • Real-time crowd detection and movement tracking • Context-aware queuing time estimation in public transport • Penetration threshold analysis for reliable data fusion • The general MCS platform will greatly easy the development of applications, including those relevant to Army • Submissions and on-going research efforts • Mobile Crowdsensing: Current State and Future Challenges, Raghu Ganti, Fan Ye, Hui Lei, in submission to IEEE Comm. Magazine • Ongoing research work with Jiawei Han, Tarek Abdelzaher
Existing Crowdsensing Applications • Environment • Suelo - Human assisted soil monitoring • Common Sense - Air quality monitoring using handheld devices • Ear phone assessment - Noise pollution monitoring • Harbor monitoring - Monitor quality of harbor using mobile phones • PEIR - Personal environment monitoring • Hab watch - Monitor habitats • SoundSense - Noise pollution monitoring • CreekWatch – Creek monitoring • Life in a city • PetrolWatch - Monitor petrol prices using mobile phone cameras • Neighborhood culture and identities inferred from mobile phones • ParkNet - Parking space estimation • Video highlights of events using mobile phones • Walkability - Safety of walking on streets • YellowButton – Emergency reporting 18 • Traffic • CarTel - Traffic using mobile phones • Nericell - Monitoring road and traffic conditions using mobile phones • Cooperative transit tracking • GreenGPS - Fuel consumption • Individual health, entertainment, finance • BikeNet - Bike route monitoring • CenceMe - Sensing presence in social networks • CenWits - Hiker tracking using sensor networks (can be easily extended to mobile phones) • DietSense - Diet monitoring using pictures of what you eat • Clean cooking in India • Market price dispersion - mobile phones + bill scanning • Public works maintenance • Garbage watch • Pothole portal
Crowdsensing: From Autonomous to Participatory A continuum of effort: Mobile crowdsensing varies along a continuum from autonomous to participatory depending on how much effort individuals must extend to gather data A range of incentives: The incentives (and incentive systems) needed depend on how much effort is being asked of the user; but even autonomous crowdsensing requires some incentive to attract users and encourage them to opt in Mobile Crowdsensing Participatory Autonomous Users are made aware of the crowdsensing application and must install app and opt in to sensing activity User may turn sensing on/off User may turn sensing on/off User may turn sensing on/off User may turn sensing on/off User goes to places / takes routes where data needed User goes to places / takes routes where data needed User goes to places / takes routes where data needed User may take measures to improve quality of a sample User may take measures to improve quality of a sample User may need to do things to collect a sample
Functions at the broker • Receive data needs specs from applications • What elements should we have in the data needs spec language. • Consolidate data needs from multiple applications • Identify and avoid duplicates in different data needs. Two cases • 1) the same data is requested by more than one application; • 2) higher quality data (e.g., higher resolution) is requested by another application but can be consumed by different apps • Negotiate with gateways and select those that can provide the requested data • Maybe the same language could be used by gateways to describe data availability at aggregate level? • Reuse/adapt existing data streams for a new data need • Adapt existing data streams for new data needs • 1) existing subset of streams: same event types, at the same spatial/temporal scope, with the same or higher resolution/quality, • 2) the new app requires data semantically at a lower level than existing apps: pothole and spikes • One plausible solution is to make modular local analytics and migrate some of them such that they can run either at devices or gateways. • Alternative solutions include: 1) allow broker/gateways to generate and host a small "flow graph" that turns low level events into high level ones. The risk is lack of limit on the complexity of such auto generated 'flow graphs'. 2) require app developers to make apps flexible to use possibly different types of events, and let broker and apps negotiate to switch to different event types. The drawback is it places extra burden on app developers.
Functions at gateways (1) • Divide devices into groups and assign each to a gateway • Mostly for a data center, the gateway is probably a process on a VM. • The division may not have to follow geographical constraints. What are the possible division schemes to facilitate the search of devices groups given data needs spec? • Maintain metadata about device and aggregate data availability • what state to maintain about the data availability of individual devices, in what form? • for gateways to track the kind/quality of data each device can produce, so gateways can select devices and produce sensing directives for devices. • what information to maintain about the aggregate data availability, and in what form? • for negotiation between broker and gateways, and for the broker to choose which gateways can satisfy a data need. • Some elements in the metadata include: individual devices' event types, energy, quality in different metrics; at aggregate level, event types and quality metrics. It's likely the data needs spec language or its elements could be reused here. • Select devices that can produce the desired data and generate sensing directives to configure their sensing plug-ins • given data needs from the broker, how a gateway decides which devices to choose • Sensing events are straightforward, but quality may not be. • Define quality as multi-dimensional scalar values, • Or a comparator is supplied to compute the quality and make the decision on the fly • how to generate parameters to configure corresponding sensing plug-ins. • which sensor to sample, at what granularity, and what local analytics to run, etc.
Functions at gateways (2) • Device selection • May need a universal language and negotiation protocol that runs between lower/upper layers, both b/w devices and gateways, and gateways and the broker • Monitor changes in data availability and quality, and make adaptations to ensure data needs are satisfied continuously • selecting different subsets of devices, or adapting their sensing directives. • If gateways can no longer shield the changes, they may need to notify the broker to select other gateways. • May need to define common quality metrics and some controlling mechanisms, possibly in the form of a library of common quality control modules reusable by different apps. • Some utility-based approach could be used for the adaptation, e.g., selecting different devices or adapting sensing directives, based on the utility for apps (i.e., benefits) and devices/owners (i.e., costs). • 'de-perturb' the noised added to the data by the privacy mechanism on devices • Aggregate data and remove the aggregate of noises
Functions at devices • A set of common local analytic plug-ins, and the common plug-in development spec • local analytics written based on the spec can be installed and managed in a plug-in management platform (i.e., agent platform) • There're probably two kinds of plug-ins: • Those call the physical sensor access API on the device OS and provide a device-independent API for processing analytics; • those that takes the data produced from the first kind and do some local processing to produce sensing events. • A common plug-in library shipped as part of MCS middleware will make it easy to develop new apps. • Some plug-in that produce semantically more abstract event types (e.g. potholes) could have a modular composition for efficient placement – a more advanced feature • The platform needs to maintain data availability of the hosting device and update such information at the associated gateway • need to decide what elements exist in the data availability metadata. Besides event types, quality metrics, resource availability and user policy probably should be included as well • need to design an efficient protocol for devices to synchronize the state maintained by gateways. • Privacy protection of the owner of the devices. • We need to understand what aspects of privacy (e.g., location, activity pattern) the owner wants to protect and design mechanisms to provide such privacy. • Incentive mechanism to recruit, engage and retain owners to participate • This function is likely to run across three layers. Need to define how it interacts with the rest of MCS. • Hand-off protocols for devices to move across association boundaries of gateways • Debate on implication between architecture and business/operating models