Modeling and clustering disease progression for correlation with genetic and demographic factors

Modeling and clustering disease progression for correlation with genetic and demographic factors Robert Kingan

What is SSIFT? • “To address […] common diseases, which include schizophrenia, depression, and breast cancer, it is essential to incorporate observations of the clinical progression of the disease to refine the definition of phenotype.” – Michael N. Liebman, U. Penn. • Yes, but what is SSIFT? • SSIFT = Stratification and Synchronization Inference Technology

What is SSIFT? • Stratification: Dividing a patient population into groups which are meaningful for diagnosis, prognosis, treatment selection, or genotype-phenotype correlation. • Synchronization: Recognizing a pattern of disease progression, regardless of disease stage for a particular patient.

SSIFT overview • Assumptions—what is SSIFT-able • Other constraints on data selection • Outline of technique • Identifying variables • Modeling disease progression • Parameterizing different models • Clustering patients by progression patterns • Interpreting the results

period of change final value Disease marker initial value Time Pattern of disease progression

SSIFT workflow Survey the data Construct feature vectors Assign feature weights Select useful variables Cluster weighted feature vectors Fit disease progression models Evaluate the clustering results Complete? No Yes

SSIFT workflow SSIFT

SSIFT curve types

Converting parameters y* = population mean, t1=first time point, tn=last time point

Modified Mahalanobis distance

Survey the data Construct feature vectors Assign feature weights Select useful variables Cluster weighted feature vectors Fit disease progression models Evaluate the clustering results Complete? No Yes SSIFT workflow

SSIFT workflow Survey the data Construct feature vectors Assign feature weights Select useful variables Cluster weighted feature vectors Fit disease progression models Evaluate the clustering results Complete? No Yes • Correlate results with: • demographic data • genetic data

Application of SSIFT to NIDDK • About NIDDK • SSIFT and transplant data • Variable selection • Modeling • Results

-Fetoprotein Albumin Alkaline phosphatase (AP) Bicarbonate Blood urea nitrogen (BUN) Calcium Creatinine clearance Cholesterol Chlorine Corrected PT control Creatinine Direct bilirubin FK506 level Glomerular filtration rate Gamma GTP Glucose Hematocrit (HCT) Hemoglobin CSA HPLC level Potassium CSA monoclonal level Sodium Platelet count Prothrombin time Part. thromboplastin CT Part. thromboplastin PT CSA RIA level SGOT (AST) SGPT (ALT) Total bilirubin CSA TDX level Total protein White blood cells (WBC) Weight in KG Candidate variables

Selected variables

Evaluating Kaplan-Meier curves Ŝ

Final selected variables • Best pair: AST + AP, Ŝ=0.34 • Best triple: AST + AP + hematocrit, Ŝ=0.42 • No set of four variables exceeded Ŝ=0.42

Survival by clustered SSIFT AST, AP and HCT parameters Ŝ = 0.42

Best cluster Worst cluster Cluster mean curves

SSIFT™ Discover Analyze Markers Time Determine Disease Genes Disease Progression Pattern SSIFT in Gene Discovery: Simulation

Simulated data Marker Value (relative scale) Time (years)

Clustering Results

Nearest-Neighbor Analysis 555555566666666666645526566351142222222222224412114442333333333344344 C2353 is related to SSIFT pattern of disease progression (p<10-41 ).

SSIFT: Stratification and Synchronization Inference Technology Discussion

Modeling and clustering disease progression for correlation with genetic and demographic factors

Modeling and clustering disease progression for correlation with genetic and demographic factors

Presentation Transcript

Environmental and Demographic Factors in IR

Imaging and genetic biomarkers of Parkinson disease onset and progression

Genes and Genetic Disease

Concepts and tools for collaborative weed demographic modeling

Mortality and Disease Progression in WA Seniors with Obstructive Airways Disease

Multiple testing, correlation and regression, and clustering in R

k - medoid clustering with genetic algorithm

Reservoirs and Mechanism of Disease Progression

Genetic algorithms (GA) for clustering

Disease Progression LUQ and LLQ #4

Karyotype and genetic disease

Disease progression and approaches to therapy

Genetic and epigenetic risk factors for asthma

Correlation Modeling

Variable Reduction for Predictive Modeling with Clustering

Concepts and tools for collaborative weed demographic modeling

GENETIC DISEASE

Correlation Clustering

Variable Reduction for Predictive Modeling with Clustering

Correlation Clustering