Predicting the winner of C.Y. award

Predicting the winner of C.Y. award 指導教授：黃三益博士組員：尹川陳隆賢陳偉聖

Introduction • Baseball sport in Taiwan • CPBL (Chinese Professional Baseball League) • MLB (Major League Baseball) • Baseball sport in USA • Cy Young Award since 1956 • Baseball Writers Association of America • Weighted scores • Each league has one winner per year.

Measurements • There are no definite rules be used to judge. • Nevertheless, many measurements could be used to judge whether a pitcher is good or not. • Wins • ERA • WHIP • G/F etc.

Aim of the study • To analysis the historical statistics of pitchers. • Building a predictive model. • To predict the Cy Young Award winner of the year in the future.

Data mining procedure • Ten data mining methodology steps

Step 1：Translate the Problem • Directed data mining problem • Target variable: Cy Young Award • Classification • Decision tree • Purposes • Gambling game • Predictive activities

Step 2：Select Appropriate Data • Just MLB statistics data (1871 ~ 2006) • Cy Young Award: 1956 ~ 2006 • total 21456 records • List of Cy Young Award winners • “Time” factor • 1999 as the dividing year. • Because of the emerging items. • Variables: to remove the items that are not representative of a pitcher.

Step 3：Get to know the data • The materials that we used all come from MLB official site • These data have already been disclosed for a lot of years • The quality of data is very good • some attributes has value since 1999

Step 4：Create a model set • We divide the data into training data and testing data • We do not create a balanced sample • The record of MLB is not the seasonal materials • we will pick the materials since 1999

Step 5：Fix problems with the data • These data are taken from MLB official side • No missing values • single source

Step 6：Transform data to bring information to the surface • There are no combinations of attributes • We delete some attributes • We add a attribute-Year • We add a attribute (CyYoungAward_Winner) for classification

Step 7：Build Models • Tools Used • Weka Crash Problem • Blank Attributes • Build Model • Handling Blank Attributes

Tools Used

Weka Crash Problem • Raw data • 21456 data instances • 42 attributes • Weka crashed during model construction • Give Weka more memory

Blank Attributes

Build Model • MLB 1956~2006 • with blank attributes • ADTree • MLB 1956~2006 • without blank attributes • ADTree • MLB 1999~2006 • ADTree

Handling Blank Attributes

1956~2006, with blank attributes, ADTree

1956~2006, without blank attributes, ADTree

1999~2006, ADTree

Step 8：Assess Models(1/2) • Not good enough for gambling

Step 8：Assess Models(2/2) • Some attributes are more important

Step 9：Deploy Models • To implement a computer program with the built model. • To predict the Cy Young Award winner more easily.

Step 10：Assess Results • To compare the predictive and the final Cy Young Award winner directly. • Not “business” but “interest”. • Assessment from the judgment of the person.

Conclusions • We have used the classification technology to set up the model of predicting • We find the accuracy of the built model is not high • Some factors that we are not to consider • It can not use in the place with essential benefits • Just for fun

Predicting the winner of C.Y. award

Predicting the winner of C.Y. award

Presentation Transcript

QuestGarden: Teacher Education Classic Award Winner

“ A comprehensive product . . . ” TMCLabs Review Editor’s Choice Award Winner

WINNER OF THE IT Innovator Award

Charles Shagi Winner of the

Ivan E. Sutherland ACM Turing Award Winner 1988

Meet the Nevada Young Readers’ Award Winner

p oachpod ® winner of the gourmet gold and I.D Design award 2007

Predicting the Winner of an NFL Football Game

2010 SHRM Pinnacle Award Winner

GAMBAR : THE WINNER

2012 SHRM Pinnacle Award Winner

Winner of a 2012 Ashden Award.

Undergraduate Award One-off non-renewable award of HK$25,000 to the scholarship winner

Psst ! You can read the Newbery Award Winner later.

Award Winner Experience

2010 SHRM Pinnacle Award Winner

Winner of the Cornelius L. Hopper Scientific Achievement Award

Congratulations to Johanel Caceres our 2010 KEZ Scholarship Award Winner Award $3,000

2017 Consmer Choice Award Winner in Toronto West

Award Winner for Link Bridge - Raviv Dozetas

2000 Award Winner

Oscars 2020: Every Winner & How to Watch the Oscar Award Winners