140 likes | 241 Views
1: Experiments n Evaluation a) HIVA plot the cost, explore & exploit scores - DONE - measure correlation - observe greedy vs CELF selection b ) HIVA - different cost function (number of features present ) - DONE c) Temporal - 20 newsgroup
E N D
1: Experiments n Evaluation • a) HIVA plot the cost, explore & exploit scores - DONE • - measure correlation • - observe greedy vs CELF selection • b) HIVA - different cost function (number of features present) - DONE • c) Temporal - 20 newsgroup • 2: Writing - compile papers into thesis proposal (highest priority in parallel)
Status Experiments n Evaluation • 20 Newsgroup plot the cost, explore & exploit scores • Debug setup - DONE • measure correlation - TODO • exact measures not calculated yet but visually looked uncorrelated • observe greedy vs CELF selection • Celf & greedy selection is difficult to rationalize why one selected some claims vs others but validated that they had quite a few common ones but did have some different ones • HIVA • Debug setup - DONE • measure correlation • exact measures not calculated yet but visually looked uncorrelated • observe greedy vs CELF selection • different cost function (number of features present) - Running 4/3/13 (need to analyse) • Claims • Compile results-done • Run class dependent experiment-done • 3 factors came out similar to 2 factor cost sensitive exploitation (ie. Exploration didn’t help) • Class dependent dynamic cost setup significantly improves over uniform threshold dynamic cost setup. Class dependent dynamic cost is the best performing setup • C:\mohit\official\temporal_activeLearning\interactive-sep12\claims\wlp_results_run4.xlsx • Temporal - 20 newsgroup – Done (1 version) • Results are not great for 20 newsgroup. No clear pattern • Class-dependent threshold for thought experiment for value of dynamic • Experiment done for 20 news with Pos 0.8 Neg 0.001 – DONE • Not good results (value of dynamic cost function is shown from earlier results rather than these ones) • Compiled results for Pos9Neg5 for HIVA • Not great results • NOISE robustness test • For 1 dataset (may be 20 newsgroup) • Introduce noise in 1 model, 2 models or all models & see the performance degradation Writing - compile papers into thesis proposal (highest priority in parallel)
C:\mohit\official\temporal_activeLearning\interactive-sep12\hiva\analysis\Greedy_ChtDyn_Fixed_ChtDynErrrRed_trial1C:\mohit\official\temporal_activeLearning\interactive-sep12\hiva\analysis\Greedy_ChtDyn_Fixed_ChtDynErrrRed_trial1
Temporal data wit interactive framework • Ran with CD0.2 20 newsgroup • Results not great as there is no desired pattern
Temporal data & Framework – 20 newsgroup-Concept Drift prob 0.2 (CD0.2)
Analysis of Active Learning Curves • RCV1: 'FCD\rcv1\analysis\activeLearningCurvesSVMv1.mat' • Number of queries are approximately same for all active iterations • Num of queries are different for different strategies • Diff for model-fixed, ‘conf-balanced’ and ‘dynErr-unsupActivepool’ • Rest are all similar & equivalent to avgnum of queries expected
Thesis chapters – TODO 5/20/2013 • Temporal Active Learning • Merging the current KDD paper with FCD & SSD with claims data • Show data characteristics of Claims data • Test for FCD • plot the relative EOB reasons percentage across time iterations) i.e. <# of Eob codes rows> X <# of time iterations = 23> for each test case (10). Each cell <r,c> corresponds to # of cases with EOB (r) / total positive cases • Test for SSD • Do relative similarity of all examples across time iterations. So compare 1000 x 1000 similarity across 2 time iterations and average the similarity. So for each test iteration, we will get a <# of time iterations> X <# of time iterations> graph with their average similarities