30 likes | 136 Views
Chapter 8: Extensions and Applications. Learning from Massive Datasets. Can it be held in main memory?---Naïve Byaes Method Some learning schemes are incremental; some are not. What about time it takes to model?—should be linear or near linear What to do when data set is too large?
E N D
Learning from Massive Datasets • Can it be held in main memory?---Naïve Byaes Method • Some learning schemes are incremental; some are not. • What about time it takes to model?—should be linear or near linear • What to do when data set is too large? • Use a small subset of data for training---law of diminishing returns • Some schemes do better with more data; but there is also a danger of overfitting • Parallelization is another way---develop parallelized versions of learning schemes
Incorporating Domain Knowledge :Metadata---data about data---semantic, causal, and functional • Text and web mining: • Adversarial situations: Junk email filtering, for example