כריית מידע – רגרסיה Regression

כריית מידע – רגרסיהRegression ד"ר אבי רוזנפלד

שימושי רגרסיה • ניבוי • יש אוסף של נתונים ואנחנו רוצים להבין מה יהיה בעתיד • דוגמא:רגרסיה לינארית (עשייתם כבר) • סיווג • יש אוסף של נתונים ואנחנו רוצים לקטלג אותם • גם אפשר רגרסיה לינארית • SVM (Support Vector Machine) • Logistic Regression • נושא של ההרצאה היום 

רגרסיה לינארית למען ניבוי Regression Dependent variable Independent variable (x) יש לך אוסף של נתונים מכניסים קו שהוא מצמצם איזשהו מדד של טעות אם הצלחנו, זה כלי טוב לניבוי

דוגמא

מה מנסים לצמצם?Which Objective Function? • טעות מוחלט (Least Absolute Error) • טעות בריבוע (Least Square Error)

רגרסיה לא לינארית Nonlinear Regression Nonlinear functions can also be fit as regressions. Common choices include Power, Logarithmic, Exponential, and Logistic, but any continuous function can be used.

רגרסיה למען סיווג – עץ החלטות

מודל פשוט יותר -- רגרסיה

הבעיה– לא תמיד ברור איפה לחתוך

הבדלים בדיוק בין מודלים

SVMהרעיון הכללי– למקסם רווח בין הקטגוריות

הגדרת הפתרון • קיים: אוסף של נתונים ש X הוא הוקטורשל מאפיינים וY הם הקטגוריות במצב אידיאלי אנחנו רוצים:

לפי ההגדות...

אבל המציאות לא תמיד נותן... • יש צורך להקטין את הHINGE LOSS, או המופעים שהם בצד ה"לא נכון" • HINGE LOSS הוא רק פונקציה אחת של LOSS

הנוסחאות...

Linear SVM Mathematically • Goal: 1) Correctly classify all training data if yi = +1 if yi = -1 for all i 2) Maximize the Margin same as minimize • We can formulate a Quadratic Optimization Problem and solve for w and b • Minimize subject to

Solving the Optimization Problem Find w and b such that Φ(w) =½ wTw is minimized; and for all {(xi,yi)}: yi (wTxi+ b)≥ 1 • Need to optimize a quadratic function subject to linear constraints. • Quadratic optimization problems are a well-known class of mathematical programming problems, and many (rather intricate) algorithms exist for solving them. • The solution involves constructing a dual problem where a Lagrange multiplierαi is associated with every constraint in the primary problem: Find α1…αNsuch that Q(α) =Σαi- ½ΣΣαiαjyiyjxiTxjis maximized and (1)Σαiyi= 0 (2) αi≥ 0 for all αi

שיפורים נוספים • שימוש בפונקציה לא לינארית (Kernel Trick) • פולינומים • GAUSIAN • ועוד...

רגרסיה הסתברותיתLogistic Regression • שימוש בOBJECTIVE FUNCTION הסתברותי (logistic) • מקטלג קרוב ל1 ו1- אומר הסתברות גבוה is the intercept where f(x)=0 controls the graph shape

איך הופכים קטגוריות להסתברות • שימו לב שLOGISTIC REGRESSION מוציא בתור פלט קטגוריות (ולא מספרים) • הוא מתרגם את המספרים לlog-odds

הבדלים עקרוניים בין המודלים • עצים בנויים אינקרמנטליים– שלב שלב • רגרסיה בונה משקל לכל פרמטר בו זמנית • רגרסיה מחלקת רק לפי צורת הפונקציה (לינארית, LOGISTIC, וכו'). עצים יותר גמישים. • יש יותר משמעות לפלט של העץ (הרופאים ורוב לקוחות מעדיפים אותם) • יכול להיות שיש דיוק יותר טוב לרגרסיה

הפלט של רגרסיה

הפלט של עצים

שינויים בין המודלים ברגרסיה – לא תמיד חלוקה לינארית

כריית מידע – רגרסיה Regression

כריית מידע – רגרסיה Regression

Presentation Transcript

K-nearest neighbor methods

Introduction to Cox Regression

Regression in geoDA

Lecture 9: Marginal Logistic Regression Model and GEE (Chapter 8)

Logistic Regression – Simultaneous Entry of Variables

Multiple Regression

Statistical Inference and Regression Analysis: GB.3302.30

Stepwise Binary Logistic Regression

Artistic Regression

Linear Regression and Correlation Analysis

Chapter 11

Logistic Regression: For when your data really do fit in neat little boxes

PM 515 Behavioral Epidemiology Generalized Linear Regression Analysis

Statistical Inference and Regression Analysis: GB.3302.30

Regression-Discontinuity Design

What statistical analysis should I use?

Some further problems with regression models

預測模型

Chapter 12 Multiple Regression

การประเมินค่าอัตราพันธุกรรม

Applied Econometrics Second edition

Instrumental Variables Regression