200 likes | 312 Views
Lecture1 – Introduction and Organization. Rice ELEC 697 Farinaz Koushanfar Fall 2006. Summary. Syllabus Course outline Motivation Class census. Syllabus – ELEC 697. Title: “Applications of Modern Statistical Learning Theory in Embedded Networked Systems” Instructor
E N D
Lecture1 – Introduction and Organization Rice ELEC 697 Farinaz Koushanfar Fall 2006
Summary • Syllabus • Course outline • Motivation • Class census
Syllabus – ELEC 697 • Title: “Applications of Modern Statistical Learning Theory in Embedded Networked Systems” • Instructor • Farinaz Koushanfar, Rice University • Meeting time • 02:30 PM - 03:50 PM TR • Meeting place • 2014, Duncan Hall • Prerequisites • Self-contained, but assuming undergraduate level knowledge of probability and math
Syllabus - Overview and Goals • Overview • Practical statistical learning methods and tools • Modeling and optimizing emerging embedded systems • Research areas: embedded networked systems, sensor networks, your research area, assuming you will need the methods there • Emphasizing the methods rather than the theoretical aspects • Goals • Solid understanding of the state-of-the-art learning methods • Hands-on experience with statistical modeling SW • Applications of statistical modeling in SN, Internet, Networks, Intrusion detection, CAD, VLSI • A universal tool for your own research
Syllabus – Book and More… • Textbook • The elements of statistical learning: data mining, inference, and prediction, T. Hastie; R. Tibshirani; J. Friedman; New York : Springer, 2001. • Recommended further reading • Pattern Classification (2nd ed.), by R. Duda; P. Hart; D. Stork; Wiley Interscience, 2001. • Modern Applied Statistics with S-PLUS, Third Edition, W. Venables; B. Ripley; Springer, 1999. • Papers from the literature • Course webpage • http://www.ece.rice.edu/~fk1/classes/ELEC697.htm
Syllabus – Grading and Project • Grading • Weekly assignments (20%) • Mid-semester oral presentation (15%) • Paper presentation and discussion (15%) • Class project report (30%) • Class project presentation (20%) • Project • Groups of 1 or 2 (collaborations encouraged) • Dataset to analyze and model, can be more theoretical • Either propose or select from my projects/datasets
Syllabus - Software • Hands-on experience with data analysis and modeling tool • S programming language (Splus/R) • You can download R from CRAN at: http://cran.us.r-project.org/ • Documentation is also available at CRAN • Many more resources available on the web
Course Outline • Week 1: Orientation and overview of supervised learning and its applications in embedded networks • Week 2: Intro to R, Linear regression, model selection, validation • Week 3: Applications of regression in embedded networks (HW 0) • Week 4: Linear classification: LDA, logistic, separating hyperplanes • Week 5: Applications of classifications in embedded networks (HW 1) • Week 6: Available datasets, possible project proposals, and project selection • Week 7: Model assessment and selection • Week 8: Applications of models selection and validation in embedded networked systems (HW 2)
Course Outline (Cont’d) • Week 9: Kernel methods • Week 10: Applications of kernel methods in embedded networked systems (HW 3) • Week 11: Mid-term project proposal and presentations • Week 12: Model inference and averaging: boosting, ML, EM • Week 13: Applications of model inference in embedded networked systems (HW4) • Week 14: Progress report -- presenting the related work to your project and your goals • Week 15: Summary • Week 16: Final project presentation and reports (Report) + Paper presentations!
Class Consensus • Tell me about yourself! • Your name • Your year of study • Your field – or your interest • Your advisor
Statistical Learning - General Key role in science, finance, and industry. Examples: • Predict the prob. of a second hearth attack (demographic, diet, clinical measures) • Stock prices in 6 months (company performance and economic data) • Estimate no.’s in a handwritten zip-code • Estimate the glucose in diabetic patient blood (infrared absorption spectrum) • Identify the risk factors in a prostate cancer (clinical and demographic variables)
xbow MICA2 DOT motes Contaminant Transport Seismic Response Environmental Sensing Sensor Networks (SN) Courtesy of Prof. Deborah Estrin (UCLA-CENS)
Statistical Learning - SN • Classification/target detection • Modeling the biological systems • Inter-sensor modeling • Sleeping coordination, compression, intrusion detection/security • Characterization of sensors - a rapidly growing market, e.g. • Pressure sensors – revenue: $4,018.8M in 2004, projected $5,545.1M in 2011 • Image sensors - $4B++ in 2005, led by the camera phone application • Fiber-optic sensors - $288.1M now, will be $304.3M in 2006 • Bio-sensors - ?? • Proximity, Photoelectric, Linear Displacement Sensors - $1B in 2004, will be 1.05B in 2007 • Nano-sensors – will grow more than 30%+ by 2009 Sensors & Transducers Magazine (S&T e-Digest), Vol.62, Issue 12, December 2005, pp.456-461
Statistical Learning – VLSI/CAD • nanometer-scale devices: increased process variation and decreased predictability of circuit performance • Traditionally corner-case models were used – pessimistic • The magnitude of variations in the gate length, are predicted to increase from 35% in a 130nm technology to ~60% in a 70nm • The variations are specified the fraction 3/ • The major trade-off is the computational efficiency Photoresist line pattern PDF King, Wada, Woo, IEEE TRANSACTIONS ON SEMICONDUCTOR MANUFACTURING, VOL. 17, NO. 2, MAY 2004
Sources of Variations • Process variations • The value of process parameters observed after fabrication • Parametric yield: the fraction of manufactured samples that meet the performance constraints • Environmental variations • Modeling variations • Power and delay models used to perform design, analysis and optimization are inaccurate • Other sources • Change in process parameters with time • Hot electrons • Process instability
The Theme of the Course • About practical learning methods – something you can learn and use in your research • This is not an embedded system design course nor a sensor network design course! • The research topics are to motivate real applications of the statistical learning in other fields • You do not need any prior knowledge of these subjects to learn in this course • Dynamic reading list
Learning from Data • Supervised learning • Outcome measurement: either categorical or quantitative • Predict outcome from a set of features • Training set of data • A good learner can predict a testing set well • Unsupervised learning • Only features, no outcome
Example 1: Email Spam • Categorical outcome: spam or email • 4601 email messages • Rule based learning, e.g. • if (%george < 0.6) & (%you > 1.5) then spam else email
Example 2: Prostate Cancer • Correlation b/w the level of prostate specific antigen (PSA) and clinical predictors • Regression problem!
Example 3: Handwritten Digit Recognition • Automatic envelope sorting procedure • 16x16 8-bit grayscale, intensity from 0-255 • Classification problem!