1 / 16

Random Sets Approach and its Applications

Random Sets Approach and its Applications. Vladimir Nikulin, Suncorp, Australia. Introduction: input data, objectives and main assumptions. Basic iterative feature selection, and modifications. Random sets approach. Tests for independence & trimmings (similar to HITON algorithm).

tuyet
Download Presentation

Random Sets Approach and its Applications

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Random Sets Approach and its Applications Vladimir Nikulin, Suncorp, Australia • Introduction: input data, objectives and main assumptions. • Basic iterative feature selection, and modifications. • Random sets approach. • Tests for independence & trimmings (similar to HITON algorithm). • Experimental results with some comments. • Concluding remarks.

  2. Introduction Training data: where is binary label and is a vector of m features: In practical situation the label y may be hidden, and the task is to estimate it using vector of features: Area under receiver operating curve (AUC) will be used as an evaluation and optimisation criterion.

  3. Causal relations Manipulations are actions or experiments performed by an external agent on a system, whose effect disrupts the natural functioning of the system. By definition, all direct features can not be manipulated. X1 X2 X6 X7 Y X3 X4 X8 X9 Main assumption: direct features have stronger influence on the target variable and, therefore, are more likely to be selected by the FS-algorithms.

  4. Basic iterative FS-algorithm

  5. BIFS: behaviour of the target function CINA LUCAP MARTI REGED

  6. RS-algorithm

  7. 10% 10% RS(10000, 40), MARTI case

  8. Test for independence (or trimming)

  9. Base models and software

  10. Final results (first 4 lines)

  11. Some particular results

  12. Behaviour of linear filtering coefficients, MARTI-set

  13. CINA-set: AdaBoost, plot of one solution against another

  14. SIDO, RF(1000, 70, 10)

  15. Some comments In practical applications we are dealing not with pure probability distributions, but with mixtures of distributions, which reflect changing in time trends and patterns. Accordingly, it appears to be more natural to form training set as an unlabeled mixture of subsets derived from different (manipulated) distributions, for example, REGED1, REGED2,..,REGED9. As a distribution for the test set we can select any “pure” distribution. Proper validation is particularly important in the case when training and test sets have different distributions. Respectively, it will be good to apply traditional strategy: split randomly available test-set into 2 parts 50/50 where one part will be used for validation, second part for testing.

  16. Concluding remarks Random sets approach has heuristic nature and has been inspired by the growing speed of computations. It is general method, and there are many ways for further developments. Performance of the model depends on the particular data. Definitely, we can not expect that one method will produce good solutions for all problems. Probably, it was necessary to apply more aggressive FS-strategy in the case of Causal Discovery competition. Our results against all unmanipulated and all validation sets are in line with top results.

More Related