220 likes | 339 Views
HASAR : Mining Sequential Association Rules for Atherosclerosis Risk Factor Analysis. Laurent Brisson, Nicolas Pasquier, Céline Hebert, Martine Collard I3S Laboratory, University of Nice-Sophia Antipolis GREYC Laboratory, University of Caen. Contents. 1. Analytic question & Objectives
E N D
HASAR: Mining Sequential Association Rules for Atherosclerosis Risk Factor Analysis Laurent Brisson, Nicolas Pasquier, Céline Hebert, Martine Collard I3S Laboratory, University of Nice-Sophia Antipolis GREYC Laboratory, University of Caen
Contents 1. Analytic question & Objectives 2. Model & Data Preparation 3. Algorithms 4. Results
Analytic Question Are there any differences in thedevelopment of risk factorsand other characteristics between men of the risk group, who came down with the observedcardiovascular diseasesand those whostayed healthy ?
Objectives • Evolution of Risk Factors according behavioural changes • Groups RG versus PG and NG • Healthy patients (NCVD)versus those with cardiovascular diseases(CVD) • Groups based on patient education level and job
Sequential Rules IDE_itemset BEH_time_itemset RF_time_item • IDE_itemset : static identification attributes • Age of the patient • Educational level of the patient • Alcohol consumption at the beginning of the study
Sequential Rules IDE_itemset BEH_time_itemset RF_time_item • BEH_time_itemset : behavioural change attributes • Comsumption of cigarettes a day • Physical activity after job • Physical activity in a job • Different kinds of diet • Medecine for cholesterol • Medecine for blood pressure
Sequential Rules IDE_itemset BEH_time_itemset RF_time_item • RF_time_item : risk factor change attribute • Cholesterol level • HDL Cholesterol level • LDL Cholesterol level • Triglycerides level • Obesity • …
Model IDE_itemset BEH_time_itemsetRF_time_item • Action period where it occurs at least one control • Latency period a waiting time before observing effects • Observation period where it occurs only one control
Data Preparation : Flattening operation Initial table : 1 row 1 control
Data Preparation : Flattening operation Flattened table : 1 row 1 patient static attributes control 1 control n
Evolutionnary Approach A Genetic Algorithm searching for temporal rules Fixed-length chromosome Identification Behaviours Risk factor
Evolutionnary Approach A gene for each static identification attributes IDE1 … IDEj Behaviours Risk factor
Evolutionnary Approach A gene for each kind of behavioural changes Identification BEH 1 … BEH k Risk factor Action period
Evolutionnary Approach One gene to describe a risk factor Identification Behaviours RF i Action period Observation period
Evolutionnary Approach Fitness function : support * confidence * lift Latency period Identification Behaviours RF i Action period Observation period
Genetic Algorithm Optimization • A CLOSE based approach for initialization • CLOSE algorithm improves: • extraction efficiency reducing the search-space (use of generators and frequent close itemset) • results relevance suppressing redondant rules (bases generation)
Results : Patient classes comparison • Best rules on PG versus NG and RG
Results : Patient classes comparison • Best rules on CVD versus NCVD
Results : Initialization Methods • Comparison on RG group
Conclusion • Different tendencies among groups • Confirmation of prior medical knowledge • Contradictions with some "assumptions" • Further investigations with assistance of medical experts
Future Researches • To analyse relationships between time windows and various risk factors • To Develop new evaluation criteria • To Integrate physician’s prior knowledge • To apply HASAR approach to other temporal datasets