330 likes | 424 Views
Synthesis of Streaming Data from Multiple Sensors via Embedded Data Extraction April 15 th , 2004 Project Report. Magdiel Gal á n CSE591: DataMining Dr. Huan Liu Spring 2004. http://www.public.asu.edu/~mgalan/StreamProjApr15.ppt. Outline. Problem/Project Description Sampling Smoothing
E N D
Synthesis of Streaming Data from Multiple Sensors via Embedded Data ExtractionApril 15th, 2004 Project Report Magdiel Galán CSE591: DataMining Dr. Huan Liu Spring 2004 http://www.public.asu.edu/~mgalan/StreamProjApr15.ppt
Outline • Problem/Project Description • Sampling • Smoothing • Clustering • Current Status • Plans
Project Description • Synthesis of Streaming Data from Multiple Sensors (~100’s) via Embedded Data Extraction for mission critical applications. • Work in conjunction with Motorola’s Human Interface Lab (on-going project) • Simulation Environment
Project Description • Goal: Develop driver assistance system that provide feedback, but not control, during unsafe instances. • From distractions caused by cellphones, PDAs, eMail, • Why: Targeting a government initiative to create a safer car environment in the information age explosion • How: Develop intelligent system by mining Streaming Data from multiple automotive sensors • Development work being done using driving simulator with projections screens with up to 400 parameters/sensors including video links for eye-gaze and foot-pedal movement
Sample Cases • Case Scenario #1: • Passing Slow Traffic • which slowed down due to an accident • which you are also rubber-necking • while fidgetting with your radio • Case Scenario #2: • Making a left turn • while hearing directions from MapTracker • while checking at the time because you are late • while reaching for the cellphone with on-coming call
Simulation Environment 150 Simulated View
Driving Experience Gas GPS Internet Gas Batt PDA CellPhone EngineTemp GearShift A/C Oil Driver CD Air Bag Acceleration Sonar Proximity Sensor RPMs Lateral Acc. Wheel Rotation Brake Pressure
Motivation • Primary Interest: Robotics • Merging of Sensors/Sensor Fusion • optical • proximity (IR, sonar, radar) • location (GPS, visual maps) • movement (actuators, rotations) • system (battery, temperature, bump switches) • Problem: decide agent’s next best action vs. a goal • Not too dissimilar from an Automobile environment • Other Applications: • Manufacturing Environment • Increase Yields/Productivity/Reduce Defects using quality control daily monitor data (100’s Parameters 1K’s) • Pentium Ex.: Oxide Thickness, Poly Width, Boron Implant Density, Plasma Etch eV’s, Litho PM, Diffuser RPMs, etc…
Stream Data Properties • Numerical/Continuous • Speed • Steering/Heading • Acceleration (Forward/Lateral) • Distance (Lane Edge, Vehicle on Front) • Categorical • Lane Position • Gear: P/R/D/OD/L1/L2 • Headlights On/Off • Radio/CD ON • Incoming Call • Sampling Rate: 60Hz
Critical/Special Conditions • Left/Right Turn • Passing/Changing Lanes • U-Turn • Reverse • Tailgating • Not On Road
Some Warning Signs • Lane Drifting • Erratic Behavior • droopy eyes • eyes not facing the road • foot/pedal movement do not correspond with road conditions • Incoming Call while performing Critical Maneuver
Goal • Identify Instances outside normal patterns as an indication of an Abnormal Situation • Hence – Need to draw Driver’s Attention to Impending Situation • Ultimate Goal: • Develop bootsrapping mechanism that combines driving situation classifiers (i.e. LeftTurn/Passing) together with instance selection methods in active learning • Bootsrapping – selecting high utility data for re-training
Instance Selection Properties • Instance representative • Instance selection reduce rows • Ideal outcome instance selection • choose a data subset achieves same result as whole data with little or no performance P deterioration • Should be model independent • ∆ P(Mi) ≐ ∆P(Mj) [LM01]
Problem#1: Sampling • Initial step towards instance selection: select representative subset… • Divide into collection of elements which must cover the whole population without overlapping [GHL01] • These are called sampling units
Sampling Results Sampling at 10mS (x-axis: signal duration; y-axis: count)
Problem#2: Smoothing • Reduce/Filter out noise and outliers. • Smoothing Techniques used: • Bin Median/Rolling Average [LM01]/[D03] • Median preferred over Mean since less sensitive to outliers • Tresholding/Bin Boundaries [LM01]/[HK01] • 10% offset treshold
PreSmoothing - RAW Data x-axis: driving time elapsed in minutes y-axis: speed(km/h); steering(degrees), heading(degrees)
RAW Data Map/Course Route Map – starting point at (0,0)
Smoothing Results - Median x-axis: driving time elapsed in minutes y-axis: speed(km/h); steering(degrees), heading(degrees)
Dr. Liu’s Incremental Instance Selection Algorithm Given: Data streams with instances I Output: indicative instances For each data stream Do the following incrementally Create a profile P for I Check new instance i against P if i is an outlier of P Return i else Update P with i End do
Problem#3: Clustering • Why? • Data is Unclassified • Previous results using Numerical Data on most significant key parameters • Develop clusters exemplifying ALL attributes • Select instances that do not belong to a cluster as triggering mechanism
Stream Clustering Challenges • Large “Unclassified” Data Base • Fast On-Line Resolution within small window • 0.5 – to 2 or 3 seconds • One Pass Only restriction (need fast I/O) • Mix of Numerical and Categorical Data • Traditional algorithms do not work well for categorical attributes (remember P/R/D/OD/L1/L2, or CD On) • Centroid approach cannot be used • Hard to reflect the properties of the neighborhood of the points • Memory Constraints
Clustering Techniques vs. Streaming Data • SVM • Good at handling multidimensional data • Not good – need classified data, lots of I/O, data in memory • BIRCH • Good at handling mulidimensional data, large databases; single scan, linear I/O time • Not good – predominantly for “numerical” type of attributes; order dependent
Clustering Techniques vs. Streaming Data (2) • CURE (Clustering Using REpresentative)[D03] • Good at handling outliers; hierarchical • Not good – random sampling (won’t fit streaming) • ROCK (RObust Clustering Using LinKs)[D03] • Good at Hierarchical clustering for categorical attributes • Not good: Random sampling for scale up
My 1st Clustering Attempt… Move in Reverse
My 1st Clustering Attempt(2) Zoom Next Page
My 1st Clustering Attempt(3) Move in Reverse
Current Status/Plans • This is an ON-GOING project • Cluster Technique Development • Evolve from known methods? • Generalization of the technique • Not just Automobile Streaming Data
References • [LM01] H.Liu, H. Motoda. “Data Reduction via Instance Selection”. Instance Selection and Construction for Data Mining. 2001. KAP. ASU Library • [GHL01] B. Gu, F.Hu, H. Liu. “Sampling: Knowing Whole From its Part”. Instance Selection and Construction for Data Mining. 2001. KAP. ASU Library • [HK01] J. Han, M. Kamber. Data Mining Concepts and Techniques. Chps. 3, 8 Data Cleaning, Clustering. Morgan Kaufman. ASU Library • [D03] M.Dunham. Introductory and Advanced Topics. Prentice Hall, Chps. 3-5. Mining Techniques, Classification, Clustering. ASU Library