260 likes | 416 Views
A Study of Academic Performance using Random Forest, Artificial Neural Network, Naïve Bayesian and Logistic Regression . Nurissaidah Ulinnuha. Introduction. LITERATURE REVIEW. Artificial Neural Network. Superiority
E N D
A Study of Academic Performance using Random Forest, Artificial Neural Network, Naïve Bayesian and Logistic Regression NurissaidahUlinnuha
Artificial Neural Network Superiority • ANN is useful for application in several areas, including pattern recognition, classification, forecasting, process control, etc. • Robust for noisy dataset
Limitation • ANNs do not have parametric statistical properties (e.g. they do not have individual coefficient or model significance tests based on the t and F distributions). • ANN may converge to local instead of global minima, thereby providing non-optimal data fits.
Logistic Regression Superiority • LR is able to provide information about significance value of predictor • There are no assumption about normality of dataset.
Limitation • Only able to work with binary criterion variable
Naïve Bayessian Superiority • Naïve bayessian requires data training fewer than other Classsification method Limitation • Dataset should satisfy independent assumption
Random Forest Decision Tree Superiority • Random Forest runs efficiently on large databases. • Random Forest can handle thousands of input variables without variable deletion. • Random Forest gives estimates of what variables are important in the classification. • Random Forest has an effective method for estimating missing data and maintains accuracy when a large proportion of the data are missing. • Random forest able to do classification, clustering and outlier detection
Limitation • Random forests have been observed to overfit for some datasets with noisy classification/regression tasks. • Unlike decision trees, the classifications made by Random Forests are difficult for humans to interpret.
MuktaPaliwal and Usha Kumar Title Academic performance of business school graduates using neural network and statistical techniques. Overview This research compare ANN with several statistical techniques. Paliwal conclude that the superior performance of the neural network techniques as compared to regression analysis for prediction problem whereas performance of neural network is comparable to logistic regression and discriminant analysis for classification problem.
J. Zimmerman Title Predicting graduate-level performance from undergraduate achievements Result This research predicting graduate-level performance using random forest decision tree. From this research, we get information that random forest is not only able to do classification but also explain about significance of variable
Raw dataDATA GRADUATION OF INFORMATICS ENGINEERING MAGISTER STUDENT ITS (2008-2011)
Preprocess (165 field) • Filter data with null value • Change all attribute to number value • Change class attribute to nominal value
Dataset DATA GRADUATION OF INFORMATICS ENGINEERING MAGISTER STUDENT ITS (2008-2011)
Information of Dataset Fitur 7 fitur and 104 field
Class • A : GPA > 3.5 • B : GPA <= 3.5 Tools WeKa
Discussion • Data training composition influence the performance of classifier technique. • Random Forest analysis is overfit for some dataset. • Random Forest in accuracy is not better than other methods for dataset with small fitur
Future Works • Discard unimportant atribut dataset using Principal Component analysis. • Finding any method to solve overfitting problem of Random Forest Decision Tree