190 likes | 284 Views
Local Business Ambience Characterization Through Mobile Audio Sensing. He Wang, Dimitrios Lymberopoulos , Jie L iu. Local Search Experience. c rowded bar playing loud pop music. quiet Thai restaurant outdoor seating. Is it crowded now ?. Static/Stale Experience Ratings/Reviews Images
E N D
Local Business Ambience Characterization Through Mobile Audio Sensing He Wang, Dimitrios Lymberopoulos, Jie Liu
Local Search Experience crowded bar playing loud pop music quiet Thai restaurant outdoor seating Is it crowded now? • Static/Stale Experience • Ratings/Reviews • Images • Location/Phone • URL What music is it playing now? How loud is the music now? Is it noisy now? Outdoor seating now?
65 Users online Study Which information would you like to have? What is the single most important piece of information? Zhou et al., IODetector: A Generic Service for Indoor/Outdoor Detection, Sensys 2012
Goals: reliable detection of levels for Human Chatter Music Noise Occupancy
How? Unique Business Ambience Extraction Phone In User Hands Phone’s Sensors … Audio-based approach
Examples of Real data occupancy: normal chat: high music: low noise: normal near-phone talking: no occupancy: high chat: high music: high noise: normal near-phone talking: yes occupancy: normal chat: high music: very high noise: low near-phone talking: no occupancy: normal chat: normal music: very high noise: high near-phone talking: yes occupancy: low chat: low music: low noise: high near-phone talking: no
Challenges • Multiple sound sources are simultaneously recorded • Music / Human chatter / Environmental noise • Sound source separation is hard • Near-phone talking • phone owner / nearby TV or speaker • saturates audio recording hiding business’ conditions • A single model across multiple businesses • Scalability
Architecture 32K features 40 features tiny window time and frequency domain features 1s Model Training Statistics Near-Phone Talking … Occupancy Chat 7-15s Feature Extraction Music Noise … … … Feature Extraction Majority voting over all 1s model outputs It takes 1 second to process 1 second of audio Occupancy: High Chatter: High Music: Very High Noise: Normal Privacy: Everything runs on the phone!
Feature Extraction: Temporal Letter “S” Letter “O”
Data Collector on Windows Phone User selects a business 15s sensor data collection User provides ground truth
150 Real Business Audio traces High Normal 3-fold cross-validation
Audio recording time 4-level Classification 2-level Classification
Near-phone talking – 97% Accuracy Near-phone talking audio segments are ignored
Device Variability After Calibration Before Calibration
Device Variability 2-level classification test on 20 traces
Summary • Real-time Business Ambience metadata • Change the way users interact with local search results • Change the way search engines rank local businesses • Mobile audio sensing approach • Crowd-source ambience info from real users visiting businesses • >80% accuracy • Robust to near-phone talking