Efficient Use of Data for Prediction and Validation

Efficient Use of Data for Prediction and Validation Original Research Efficient Use of Data for Prediction and Validation DOI : https://dx.doi.org/10.54364/AAIML.2024.41106 Sin-Ho Jung Adv. Artif. Intell. Mach. Learn., 4 (1):1834-1846 Sin-Ho Jung : Professor of Biostatistics & Bioinformatics, Biostatistics & Bioinformatics, Basic Science Departments. Abstract Prediction model building is one of the most important tasks in analysis of high-dimensional data. A fitted prediction model should be validated for future use. So, when conducting such an analysis, we have to use the whole data for both training and validation. When using a hold-out method, the fitted prediction model will be more efficient if the training set is bigger, but the validation power will be lower with a smaller validation set. In order to balance the efficiency of fitted prediction model and its validation, 50-50 allocation of the whole data set is popularly used as a hold-out method. In prediction and validation procedure, we have to use the information embedded in the whole data set as efficiently as possible. As a such effort, cross-validation methods (CV) have been very popular these days. In a CV method, a large portion of the data set is used for training and the remaining small portion of the data is used for validation, and this procedure is repeated until the whole data points are used for validation. In a CV method, each data point is used for both training and validation, so that as the portion of training set is increased, the efficiency of training will be increased, while the validation power will be decreased due to the increased over-fitting, i.e. more frequent use of each data point for training. As another effort of efficient use of the whole data, we propose to use the whole data set for both training and validation, called 1-fold CV method. By using the whole data to fit a prediction model, training efficiency will be highest, but, by reusing the whole data set for validation, its validation power is expected to be very low. The validation power of CV methods will be estimated by permutation methods. Through extensive simulation studies and real data analysis, we find that the newly proposed 1-fold CV method uses the whole data very efficiently. Article History: Received on: 12-May-23, Accepted on: 23-Jan-24, Published on: 30-Jan-24

Welcome to the Advances in Artificial Intelligence and Machine Learning: An Artificial Intelligence Journal. Advances in Artificial Intelligence and Machine Learning (oajaiml) is a Journal, that publishes recent advancements in the Artificial Intelligence, Machine Learning and applications related to it. Article:- https://www.oajaiml.com/archive/efficient-use-of-data-for-prediction-and-validation Top of Form

Efficient Use of Data for Prediction and Validation

Efficient Use of Data for Prediction and Validation

Presentation Transcript

Data Validation

Carcinogenicity prediction for Regulatory Use

EFFICIENT USE OF RESOURCES

Efficient Decomposed Learning for Structured Prediction

Validation and Calibration of Riparian Shade Prediction Models

mobile and efficient use of energy

Efficient Use of Memory

Use and validation of biogeochemistry data in European MSFD

DATA VALIDATION AND VERIFICATION

Data Validation

Validation of EC data

The Use of GPS Radio Occultation Data for Tropical Cyclone Prediction

Data editing and validation

Efficient and Sustainable Use of Biomass for Energy

Prediction and Validation of a Micro Wind Turbine for Rural Family Use

Efficient prediction of extreme ship responses

Efficient use of spectrum

Validation Data

Efficient Incremental Validation of XML Documents

Efficient use of energy

Data validation for use in SEM

Making Efficient Use of Valuable Data with Esources