Data not in the pre-defined feature vectors that can be used to construct predictive models.

0 represent ANN Petal.Length< 2.45 | F1 F2 F4 Data1 1 1 0 Data2 1 0 1 Data3 1 1 0 Data4 0 0 1 ……… 1 DT setosa Petal.Width< 1.75 versicolor virginica 0 1 SVM LR ANN Petal.Length< 2.45 | represent DT setosa Petal.Width< 1.75 F1 F2 F4 Data1 1 1 0 Data2 1 0 1 Data3 1 1 0 Data4 0 0 1 ……… Any classifiers you can name versicolor virginica SVM LR Any classifiers you can name Direct Mining of Discriminative and Essential Frequent Patterns via Model-based Search Tree Wei Fan, Kun Zhang, Hong Cheng, Jing Gao, Xifeng Yan, Jiawei Han, Philip S. Yu, Olivier Verscheure Motivation: Itemset Mining Motivation Experiments Proposed Algorithm • Data not in the pre-defined feature vectors that can be used to construct predictive models. • Applications: • Transactional database • Sequence database • Graph database • Frequent pattern is a good candidate for discriminative features, especially for data of complicated structures. • 5 Datasets:UCI Machine Learning Repository • Scalability Study: Problems Direct Mining & Selection via Model-based Search Tree • Case Study Procedures asFeature MinerOr Be Itself asClassifier 1 2 3 4 5 6 7 dataset mine & select 1 mine & select • Accuracy of Mined Itemsets 5 2 mine & select 6 7 3 4 …….. Few Data + + …….. • Why Frequent Patterns? • A non-linear conjunctive combination of single features • Increase the expressive and discriminative power of the feature space • Examples: • Exclusive OR problem & Solution Divide-and-Conquer Based Frequent Pattern Mining Mined Discriminative Patterns • Mine and Select most discriminative patterns; • Represent data in the feature space using such patterns; • Build classification models. • Take Home Message: • Highly compact and discriminative frequent patterns can be directly mined throughModel based Search Treewithout worrying about combinatorial explosion. • Software and datasets are available by contacting the authors. Graph Mining y • Analyses: • Scalability of pattern enumeration • Upper bound • “Scale down” ratio • Bound on number of returned features • Subspace pattern selection • Non-overfitting • Optimality under exhaustive search map data to higher space 1 0 • Scalability Study • 11 Datasets: • 9 NCI anti-cancer screen datasets • PubChem Project. • Positive class : 1% - 8.3% • 2 AIDS anti-viral screen datasets • URL: http://dtp.nci.nih.gov. • H1: 3.5%, H2: 1% L1 1 0 x L2 mine & transform Data is non-linearly separable in (x, y) Data is linearly separable in (x, y, xy) Conventional Frequent Pattern-Based Classification: Two-Step Batch Method • Predictive Quality of Mined Frequent Subgraphs Basic Flows: • Problems of Separated Mine & Select in Batch Method • Mine step:Issues of scalability and combinatorial explosion • Dilemma of setting minsupport • Promising discriminative candidate patterns? • Tremendous number of candidate patterns? • Select step:Issue of discriminative power AUC DataSet Frequent Patterns 1---------------------- ---------2----------3 ----- 4 --- 5 -------- --- 6 ------- 7------ Mined Discriminative Patterns 1 2 4 select mine AUC of MbT, DT MbT VS Benchmarks Mine frequent patterns; Select most discriminative patterns; Represent data in the feature space using such patterns; Build classification models.

Data not in the pre-defined feature vectors that can be used to construct predictive models.

Data not in the pre-defined feature vectors that can be used to construct predictive models.

Presentation Transcript

the device that used to be your phone …

Not Everything That Counts Can Be Counted and Not Everything that can be Counted, Counts Albert Einstein

Not everything that counts can be counted; and not everything that can be counted counts.. Albert Einstein

Physical Models Used (and Not Used) in GNSS Data Processing

NUMERICAL METHODS THAT CAN BE USED IN BIOMECHANICS

To be or not to be that is the…

Can Data from MDTs be used to Enhance Cancer Registration Data?

Conventions to be used in Modgen models

How Predictive Coding Can be Used by Records Managers

Pre-Defined Data Types

Adding Vectors that are not perpendicular

« To be or not to be, that is the question… »

Data not in the pre-defined feature vectors that can be used to construct predictive models.

Write a program that can be used to determine/tutorialoutletdotcom

Explore different circuits that can be used/tutorialoutletdotcom

Surfaces That Can Be Used For Feeding Tortoises

Formulas That Can Be Used To Increase Twitter Retweets

Science can be defined as…

Tools that can be used Options

Classification: Feature Vectors

Types Of Transport Carts That Can Be Used In Laboratories