1 / 33

Magdiel Gal á n CSE591: DataMining Dr. Huan Liu Spring 2004

Synthesis of Streaming Data from Multiple Sensors via Embedded Data Extraction April 15 th , 2004 Project Report. Magdiel Gal á n CSE591: DataMining Dr. Huan Liu Spring 2004. http://www.public.asu.edu/~mgalan/StreamProjApr15.ppt. Outline. Problem/Project Description Sampling Smoothing

Download Presentation

Magdiel Gal á n CSE591: DataMining Dr. Huan Liu Spring 2004

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Synthesis of Streaming Data from Multiple Sensors via Embedded Data ExtractionApril 15th, 2004 Project Report Magdiel Galán CSE591: DataMining Dr. Huan Liu Spring 2004 http://www.public.asu.edu/~mgalan/StreamProjApr15.ppt

  2. Outline • Problem/Project Description • Sampling • Smoothing • Clustering • Current Status • Plans

  3. Project Description • Synthesis of Streaming Data from Multiple Sensors (~100’s) via Embedded Data Extraction for mission critical applications. • Work in conjunction with Motorola’s Human Interface Lab (on-going project) • Simulation Environment

  4. Project Description • Goal: Develop driver assistance system that provide feedback, but not control, during unsafe instances. • From distractions caused by cellphones, PDAs, eMail, • Why: Targeting a government initiative to create a safer car environment in the information age explosion • How: Develop intelligent system by mining Streaming Data from multiple automotive sensors • Development work being done using driving simulator with projections screens with up to 400 parameters/sensors including video links for eye-gaze and foot-pedal movement

  5. Sample Cases • Case Scenario #1: • Passing Slow Traffic • which slowed down due to an accident • which you are also rubber-necking • while fidgetting with your radio • Case Scenario #2: • Making a left turn • while hearing directions from MapTracker • while checking at the time because you are late • while reaching for the cellphone with on-coming call

  6. Simulation Environment 150 Simulated View

  7. Driving Experience Gas GPS Internet Gas Batt PDA CellPhone EngineTemp GearShift A/C Oil Driver CD Air Bag Acceleration Sonar Proximity Sensor RPMs Lateral Acc. Wheel Rotation Brake Pressure

  8. Motivation • Primary Interest: Robotics • Merging of Sensors/Sensor Fusion • optical • proximity (IR, sonar, radar) • location (GPS, visual maps) • movement (actuators, rotations) • system (battery, temperature, bump switches) • Problem: decide agent’s next best action vs. a goal • Not too dissimilar from an Automobile environment • Other Applications: • Manufacturing Environment • Increase Yields/Productivity/Reduce Defects using quality control daily monitor data (100’s  Parameters  1K’s) • Pentium Ex.: Oxide Thickness, Poly Width, Boron Implant Density, Plasma Etch eV’s, Litho PM, Diffuser RPMs, etc…

  9. Stream Data Properties • Numerical/Continuous • Speed • Steering/Heading • Acceleration (Forward/Lateral) • Distance (Lane Edge, Vehicle on Front) • Categorical • Lane Position • Gear: P/R/D/OD/L1/L2 • Headlights On/Off • Radio/CD ON • Incoming Call • Sampling Rate: 60Hz

  10. Critical/Special Conditions • Left/Right Turn • Passing/Changing Lanes • U-Turn • Reverse • Tailgating • Not On Road

  11. Some Warning Signs • Lane Drifting • Erratic Behavior • droopy eyes • eyes not facing the road • foot/pedal movement do not correspond with road conditions • Incoming Call while performing Critical Maneuver

  12. Goal • Identify Instances outside normal patterns as an indication of an Abnormal Situation • Hence – Need to draw Driver’s Attention to Impending Situation • Ultimate Goal: • Develop bootsrapping mechanism that combines driving situation classifiers (i.e. LeftTurn/Passing) together with instance selection methods in active learning • Bootsrapping – selecting high utility data for re-training

  13. Instance Selection Properties • Instance representative • Instance selection  reduce rows • Ideal outcome instance selection • choose a data subset achieves same result as whole data with little or no performance P deterioration • Should be model independent • ∆ P(Mi) ≐ ∆P(Mj) [LM01]

  14. Problem#1: Sampling • Initial step towards instance selection: select representative subset… • Divide into collection of elements which must cover the whole population without overlapping [GHL01] • These are called sampling units

  15. Sampling Results Sampling at 10mS (x-axis: signal duration; y-axis: count)

  16. Problem#2: Smoothing • Reduce/Filter out noise and outliers. • Smoothing Techniques used: • Bin Median/Rolling Average [LM01]/[D03] • Median preferred over Mean since less sensitive to outliers • Tresholding/Bin Boundaries [LM01]/[HK01] • 10% offset treshold

  17. PreSmoothing - RAW Data x-axis: driving time elapsed in minutes y-axis: speed(km/h); steering(degrees), heading(degrees)

  18. RAW Data Map/Course Route Map – starting point at (0,0)

  19. Smoothing Results - Median x-axis: driving time elapsed in minutes y-axis: speed(km/h); steering(degrees), heading(degrees)

  20. Smoothing Results - Median

  21. Smoothing Results - Threshold

  22. Smoothing Results - Threshold

  23. Dr. Liu’s Incremental Instance Selection Algorithm Given: Data streams with instances I Output: indicative instances For each data stream Do the following incrementally Create a profile P for I Check new instance i against P if i is an outlier of P Return i else Update P with i End do

  24. Outliers

  25. Problem#3: Clustering • Why? • Data is Unclassified • Previous results using Numerical Data on most significant key parameters • Develop clusters exemplifying ALL attributes • Select instances that do not belong to a cluster as triggering mechanism

  26. Stream Clustering Challenges • Large “Unclassified” Data Base • Fast On-Line Resolution within small window • 0.5 – to 2 or 3 seconds • One Pass Only restriction (need fast I/O) • Mix of Numerical and Categorical Data • Traditional algorithms do not work well for categorical attributes (remember P/R/D/OD/L1/L2, or CD On) • Centroid approach cannot be used • Hard to reflect the properties of the neighborhood of the points • Memory Constraints

  27. Clustering Techniques vs. Streaming Data • SVM • Good at handling multidimensional data • Not good – need classified data, lots of I/O, data in memory • BIRCH • Good at handling mulidimensional data, large databases; single scan, linear I/O time • Not good – predominantly for “numerical” type of attributes; order dependent

  28. Clustering Techniques vs. Streaming Data (2) • CURE (Clustering Using REpresentative)[D03] • Good at handling outliers; hierarchical • Not good – random sampling (won’t fit streaming) • ROCK (RObust Clustering Using LinKs)[D03] • Good at Hierarchical clustering for categorical attributes • Not good: Random sampling for scale up

  29. My 1st Clustering Attempt… Move in Reverse

  30. My 1st Clustering Attempt(2) Zoom Next Page

  31. My 1st Clustering Attempt(3) Move in Reverse

  32. Current Status/Plans • This is an ON-GOING project • Cluster Technique Development • Evolve from known methods? • Generalization of the technique • Not just Automobile Streaming Data

  33. References • [LM01] H.Liu, H. Motoda. “Data Reduction via Instance Selection”. Instance Selection and Construction for Data Mining. 2001. KAP. ASU Library • [GHL01] B. Gu, F.Hu, H. Liu. “Sampling: Knowing Whole From its Part”. Instance Selection and Construction for Data Mining. 2001. KAP. ASU Library • [HK01] J. Han, M. Kamber. Data Mining Concepts and Techniques. Chps. 3, 8 Data Cleaning, Clustering. Morgan Kaufman. ASU Library • [D03] M.Dunham. Introductory and Advanced Topics. Prentice Hall, Chps. 3-5. Mining Techniques, Classification, Clustering. ASU Library

More Related