340 likes | 546 Views
Engineering Data Analysis & Modeling Practical Solutions to Practical Problems. Dr. James McNames Biomedical Signal Processing Laboratory Electrical & Computer Engineering Portland State University. Course Overview. Key question: How to extract useful information from data? Some theory
E N D
Engineering Data Analysis & ModelingPractical Solutions to Practical Problems Dr. James McNames Biomedical Signal Processing Laboratory Electrical & Computer Engineering Portland State University
Course Overview • Key question: How to extract useful information from data? • Some theory • Mostly methods & applications • Problem oriented, not technology focused • Project course
Talk Overview • Problem definitions • Applications • Project ideas • Course specifics
Problem Definitions • Preprocessing (briefly) • Variable selection • Dimension reduction • Decision theory (hypothesis testing) • Density estimation • Nonlinear optimization • Pattern recognition/Classification (very briefly) • Nonlinear modeling (univariate & multivariate)
Variable Selection • Many algorithms fail if too many inputs • Often fewer inputs are sufficient due to • Redundant inputs • Irrelevant inputs • Goal: Find a subset of inputs that maximizes model accuracy • Is Greenspan’s BP relevant?
Dimension Reduction • Redundant inputs can also be combined into a smaller composite set • Improves accuracy • Reduces computation • If done well, minimal information is lost • Used for signal compression • Principal component analysis is most common
Nonlinear Optimization • Find the vector a such that E(a) is minimized • Many algorithms have parameters that must be “fit” to the data • Usually “fit” by minimizing error measure • Sometimes subject to a constraint G(a) = 0 • Unconstrained optimization more common • Very widely used • Many engineering applications
Pattern Recognition • Closely related to nonlinear modeling • Goal is to identify most likely category given an input vector • Equivalent to drawing decision boundaries • Following example • Crab data • Four categories • Two composite inputs
Biomedical Application • Goal: identify brain cell types from microrecordings • Current research project • 5 categories of cell types • Created metrics to characterize signals • Following scatterplot shows 2 of these metrics
Nonlinear Modeling • Given many examples of observed variables, create a model that can predict the output • No other assumed knowledge • Observed variables • Quantitative • Measurable
Nonlinear Modeling • Observed variables may not be causal • Not all causal effects are observed • Model will not be perfect • How do you measure how good the model is?
Smoothing • For single-input single-output (SISO) systems, can plot the data • Problem is to estimate a curve that most accurately predicts future points • Could draw a smooth curve by hand • More difficult to implement automatically • More than one curve may be reasonable
Nonlinear Modeling • Many methods do not work well • Usually is much more difficult • Noise • Multiple inputs • Time-varying system • Small data sets • Still an active area of research • Will discuss "tried and true” solutions
Overview of Course • Introduction & review • Linear models • Univariate smoothing • Optimization algorithms • Nonlinear modeling • Pattern recognition & classification
Application Areas • Engineering • Controls (system identification) • Signal processing (estimation & prediction) • Communications (channel equalization) • Statistics • Mathematics • Computer science • Systems science
Application Examples • Time series prediction • Aircraft carrier landing systems • Spatial Wafer Patterns • Fault Detection • Machinery health monitoring • Automated, objective credit rating • Fraud detection
Aircraft Carrier Landing System • Can be very hard • Limited visibility • Rough seas • Night • Predict location at touch down • Flight deck • Aircraft • Is rocking of flight deck predictable?
Machinery Health Monitoring • Cost of machinery failure can be very high • Recent growth in real-time monitoring • Health and Usage Monitoring Systems (HUMS) • Condition Based Maintenance (CBM) • Reduce costs • Increase safety
Fraud Detection • Credit card fraud cost $864 million in 1992 • How quickly can fraud be detected? • The companies have amassed large data bases • What are the patterns of fraud? • Active area of research
Past Projects • Many past projects • See reports & slides on the web • Many time series applications • Need not be time series related • Many have resulted in conference and journal publications • Expect improved quality this term
Project Ideas • It is up to you to identify a project • Preferred • Data readily available (no new instrumentation or study design) • Independent samples (not time series data) • Engineering related • High likelihood of success (no financial forecasting)
Course Logistics • Project oriented • Project reports • Must meet IEEE journal requirements • May be encouraged to publish • Brief oral slide presentation at end of term • Most projects are applied • May also create new methods or compare existing methods
Prerequisites • Helpful • Random processes (ECE 565) • Signal processing (ECE 566) • Proficient at MATLAB or similar • Required • Calculus • Probability & statistics (STAT 451) • Linear algebra (MTH 343) • Proficiency at programming