160 likes | 391 Views
Artificial Intelligence Project 1 Neural Networks. Biointelligence Lab School of Computer Sci. & Eng. Seoul National University. Outline. Classification Problems Task 1 Estimate several statistics on Diabetes data set Task 2
E N D
Artificial IntelligenceProject 1Neural Networks Biointelligence Lab School of Computer Sci. & Eng. Seoul National University
Outline • Classification Problems • Task 1 • Estimate several statistics on Diabetes data set • Task 2 • Given unknown data set, find the performance as good as you can get • The labels of test data are hidden. (C) 2000-2002 SNU CSE BioIntelligence Lab
Network Structure (1) positive negative … fpos(x) > fneg(x),→ x is postive (C) 2000-2002 SNU CSE BioIntelligence Lab
Network Structure (2) … f(x) > thres,→ x is postive (C) 2000-2002 SNU CSE BioIntelligence Lab
Pima Indian Diabetes • Data (768) • 8 Attributes • Number of times pregnant • Plasma glucose concentration in an oral glucose tolerance test • Diastolic blood pressure (mm/Hg) • Triceps skin fold thickness (mm) • 2-hour serum insulin (mu U/ml) • Body mass index (kg/m2) • Diabetes pedigree function • Age (year) • Positive: 500, negative: 268 (C) 2000-2002 SNU CSE BioIntelligence Lab
Report (1/4) • Number of Epochs (C) 2000-2002 SNU CSE BioIntelligence Lab
Report (2/4) • Number of Hidden Units • At least, 10 runs for each setting (C) 2000-2002 SNU CSE BioIntelligence Lab
Report (3/4) (C) 2000-2002 SNU CSE BioIntelligence Lab
Report (4/4) • Normalization method you applied. • Other parameters setting • Learning rates • Threshold value with which you predict an example as positive. • E.g. if f(x) > thres, you can say it is postive, otherwise negative. (C) 2000-2002 SNU CSE BioIntelligence Lab
Challenge (1) • Unknown Data • Data for you: 5822 examples • Pos: 348, Neg: 5474 • Test data • 4000 examples • Pos: 238, Neg: 3762 • Labels are HIDDEN! (C) 2000-2002 SNU CSE BioIntelligence Lab
Challenge (2) • Data • train.data : 5822 x 86 (5822 examples with 86 dim; labels are attached at 86th-column: positive 1, negative 0) • test.data: 4000 x 85 (5822 examples with 85 dim) • Test labels are not given to you. • Verify your NN at • http://knight.snu.ac.kr/aiproj1/ai_nn.asp (C) 2000-2002 SNU CSE BioIntelligence Lab
Challenge (3) • Include followings at your report • The best performance you achieved. • The spec of your NN when achieving the performance. • Structure of NN • Learning epochs • Your techniques • Other remarks… Confusion matrix (C) 2000-2002 SNU CSE BioIntelligence Lab
References • Source Codes • Free softwares • NN libraries (C, C++, JAVA, …) • MATLAB Toolbox • Weka • Web sites • http://www.cs.waikato.ac.nz/~ml/weka/ (C) 2000-2002 SNU CSE BioIntelligence Lab
Pay Attention! • Due (April 14, 2004): until pm 11:59 • Submission • Results obtained from your experiments • Compress the data • Via e-mail (jmoh@bi.snu.ac.kr) • Report: printed version. (419호 오장민) • Used software and running environments • Results for many experiments with various parameter settings • Analysis and explanation about the results in your own way • 메일 제목에 “[4a05project1]” 반드시 포함 (C) 2000-2002 SNU CSE BioIntelligence Lab
Optional Experiments • Various learning rate • Number of hidden layers • Applying feature selection techniques • Output encoding (C) 2000-2002 SNU CSE BioIntelligence Lab