1 / 56

Data Mining for Engineers

This presentation explores graphic methods for extracting useful diagnostic information from large, messy data sets. Topics include regression analysis, data dependence, averages, standard deviation, correlation, derived variables, and more.

stinec
Download Presentation

Data Mining for Engineers

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Data Mining for Engineers Graphic Methods of Pulling Useful Diagnostic Information from Large Messy Data Sets. Slides Extracted from a Talk Given at theVibo-Rama 2011 Meeting of the Vibration Institute March 10, 2011 Holiday Inn Express Latham, NY 12065

  2. DATA MINING FOR ENGINEERS Graphic Methods of Pulling Useful Diagnostic Information from Large Messy Data Sets

  3. Talk Outline

  4. Talk Outline

  5. Data Mining for EngineersAssessment of Learning Questions* You Will Be Asked To Answer When The Talk Ends Who’s taken a formal course in statistics? What is Regression Analysis?What is data dependence? Who’s tried using statistics to analyze data? What is average? Mean? Median? Standard deviation? What is Correlation? Who knows what derived variables are? Sliders? What other kinds of data manipulators can you think of? What is replicated data, and when can it be used, and not used?When is it OK to delete/not include data points in a statistical analysis? What kinds of non-numerical information might you want apply statistical methods to? *Questions will be interspersed at the beginning and throughout the presentation to assess participant pre-knowledge as well as audience understanding of and experience with basic statistical parameters.

  6. Answers to Questions Data Mining for Engineers Assessment of Learning Who’s taken a formal course in statistics? (See Hands Raised) What is Regression Analysis? Regression Analysis is a Statistical approach to forecasting …change in a dependent variable on the basis of observed changes in one or more independent variables. Regression Analysis is also known as …curve fitting or line fitting because a regression analysis equation can be used in fitting a curve or line to data points. Relationships depicted in a Regression Analysis are, however, associative only, and any cause-effect inference is purely subjective unless otherwise proven. What is a simple definition of Data Dependence? Data dependence is when one set of information is directly related to another. One goal for regression analysis is to find a mathematical relationship that describes the connection between the two sets of data. Who’s tried using statistics to analyze data?(See Hands Raised) What is Average? An Average is total numeric sum of all the data divided by the number of data points. Mathematically it can be stated as follows; Average = Sum of Numbers / Quantity of Numbers What is Arithmetic Mean? Arthmetic Mean - the average obtained by dividing a sum by the number of its addends. Sometimes in statistics the word “MEAN” by itself is referred to as the halfway point between the extreme the values in the data.

  7. More Answers to Questions Data Mining for Engineers Assessment of Learning What is the Median? The Median is the value of the term in the middle Define Standard Deviation? The Standard Deviation is a statistical measure of the spread or variability in a data set.Mathematically the Standard Deviation is the root mean square (RMS) of the values from their arithmetic mean. What is Correlation? Correlation is the amount of positive or negative relationship existing between two measures. What are Derived Variables? Derived Variables come from a user provided formula What are Sliders? In the data desk program Sliders are a rapid way of changing and entering variable values to get quick results. What other kinds of data manipulators can you think of? (Student’s idea Only) What is Replicated Data, and when can it be used?Replicated data is the process of adding subsets of did you already have into your database. You might want to Replicate Data if it truly strengthens the associated relationship between this When is it OK to Delete (or NotInclude) data points in a statistical analysis?You should Not Delete Data entries - if a faulty (untrue) relationship between your data sets would result after deletion. What kinds of Non-numerical Information might you want to apply statistical methods to? Any that help describe actual relationships your data may have

  8. NSEWACOUSTICS.WORDPRESS.COM WEBSITE WEBSITE

  9. NSEWACOUSTICS.WORDPRESS.COM

  10. At the Start … Let me start off by saying - this presentation cannot be appreciated by Just Looking at a set of static slides - presented ONE AT TIME. What I’m about to show you is highly dynamic and requires the use of a real-time computer. Only after experiencing the dynamic effects of this presentation will you to get a real feel for what it’s like to DATA MINE. In this highly interactive presentation I will give you a just a glimpse of what you can learn from huge amounts of data in a very short order using some graphic analytical tools that are available today.

  11. 1 MILLION DATA POINTS Did you ever think about what a million data points looks like. Have you ever seen a million data points all at once? You’re looking ‘em. The plot below contains 1 million data points.

  12. I can’t believe there’s really 1 million points in this plot … If you don’t believe there’s a million data points here - Let’s rotate ‘em in real time and see if you can pick out each and every point and count them one by one. Rotate Plot Now that I’m rotating them do you believe there’s 1 million points? – half of them are Green

  13. LOOKING CLOSER AT THIS DATA Here’s an output from the data mining package I’m going to use throughout the day. Below is a plot matrix of the data from three different axes. One of the viewing angles has been magnified to reveal individual points. Half of the data has been highlighted in Green.

  14. Here’s a closer at what this data mining package can tell us Now, along with some of the multi-plots we can see a list of the data rows by count. The arrow points to the one millionth row, which just happens to be highlighted green. Other details show data subset icons listed by name and a few other icons which represent the action plots, we have made up to this point. The nice thing is, the program keeps track of whatever you do, as you do it, so you can back-track and review whatever you did and found out

  15. Package Has Dynamic Tables & Plots

  16. Box plots, area plots and Multi-Series Plots

  17. MULTI-PLOT MATRIX

  18. MULTI-PLOT MATRIX MULTI-PLOT MATRIX EstimateAccuracyPlot

  19. Stock Market Data CanPROFITS Be Estimated by Company Info

  20. Seeded cloud data

  21. Individual Plots by Data Row

  22. Highlight Data by Clicking

  23. Mining Tools are Menu Driven

  24. Mining Tools are Menu Driven

  25. Mining Tools are Menu Driven

  26. How to Do Regression Analysis

  27. One Click Regression

  28. Multi-Click Multi-Regression

  29. Regression Analysis Live Demo UsingFive Variables &11 Sliders

  30. Regression Analysis Live Demo UsingFive Variables &11 Sliders

  31. Regression Analysis Live Demo UsingFive Variables &11 Sliders

  32. Wheelset Data Multi Plots Almost Parallel and Vertically Offset MEASUREDDATA REGRESSIONFIT DATA

  33. Lake Michigan Level Analysis

  34. Lake Michigan Water LevelsPredicted Versus Actual

  35. Lake Michigan Water LevelsPredicted Versus Actual

  36. Lake Michigan Water LevelsPredicted Versus Actual

  37. Dynamic Slider Demo

  38. Simulation Using Real Data

  39. Simulation Using Real Data

  40. Simulation Using Real Data

More Related