250 likes | 359 Views
A Novel Approach To Search For Dominant Risk Factors. Presented by Dr. Ke-Shiuan Lynn Sep. 17 2004. A Conventional Approach. Cox Proportional Hazard Model Let there be n covariates. The Cox model assumes that the hazard function (t) can be approximated by
E N D
A Novel Approach To Search For Dominant Risk Factors Presented by Dr. Ke-Shiuan Lynn Sep. 17 2004
A Conventional Approach • Cox Proportional Hazard Model Let there be n covariates. The Cox model assumes that the hazard function (t) can be approximated by where =[12 … m]T and x=[x1 x2 … xm]T
A Conventional Approach(cont.) • The model is fitted to the data X=[x1, x2, …, xn] by the method of maximum likelihood. Consider an event occurring at time tj, and suppose that there were nj subjects alive just before tj, and that values of x for these subjects are x1, x2, …, xnj
A Conventional Approach(cont.) Assuming that there is an event occurring to subject 1 at time tj, the probability that the event observed at tj was exactly the one happened to subject 1 at that time is The the method of maximum likelihood is applied to maximize
A Conventional Approach(cont.) • The conventional Cox model computes a relative risk for each factor. • Some of the computed relative risks may be smalland insignificant. • Some factor may be replaced by a combination of several other factors.
An Example Of Redundant Risk Factors • Consider the following Cox model • Assume that there exist the following relationships • The model with minimal dominant risk factors is
A Modification • Idea: To eliminate the insignificant risk factors and re-evaluate the remaining ones. • Approaches: Cox backward (forward) stepwise regression model • Rationale: Iteratively remove the most insignificant risk factor (e.g. the one with relative risk closest to zero). • Drawbacks: • The number of the resultant risk factor may not be minimal. • The risk factor to be removed (added) is strongly related to the previous removed (added) factor.
A Novel Approach • Goal: To search for the dominant risk factors and compute their relative risk. • What’s new: Perform a simultaneous search instead of stepwise search. • Advantages: • It is capable of obtaining minimal dominant risk factors in a single run. • The minimal dominant risk factors may help biomedical professionals with identifying crucial lesions and their associated genes of a disease.
The Rationale • In the method of maximum likelihood, the values of is determined by solving the following optimization problem.
The Rationale (cont.) j = 0 if subject j is censored; otherwise j = 1. Product sum over all the subjects probability that the event observed at tj was exactly the one happened to subject j at that time × a × b Subjects in the risk set: {b, c, d, e} × c × d × e f t0 tj t tf
The Rationale (cont.) • We attempt to search for the dominant risk factors and compute their relative risk by suppressing those factors with small (close to 1) relative risk. • By suppressing as many as values of relative risk to one (values of ’s to zero), we are able to obtain those dominant risk factors.
The Rationale (cont.) • We change the original optimization problem as follows: where 1 is a vector of all ones, i.e. 1 = [1 1 1 1 … 1]T
The Rationale (cont.) • The sigmoidal function =5
Superiorities Of Simultaneous Search Over Sequential Search • The former is computationally efficient (one iteration versus several iterations). • The former guarantees (local) optimal solution (sequential search may not obtain a local optimal solution in the original search space). • Factors are removed independently in simultaneous search.
Solve For The s The conventional approach: • Let • Let l() = lnL() • Solve
Solve For The s (cont.) • It is difficult to solve the proposed optimization problem using the same method. • The derivatives of the objective function are difficult to compute.
Solve For The s (cont.) • Possible alternatives: • Greedy Search • Simulated Algorithm • Genetic Algorithm
Genetic Algorithm – What • The GA is a stochastic global search method that mimics the metaphor of natural biological evolution. • It operates on a population of potential solutions applying the principle of survival of the fittest to produce (hopefully) better and better approximations to a solution.
Genetic Algorithm – Why • It solves problems that do not have a precisely-defined solving method, or the exact solving method would take far too much time. • Such problems are often characterised by multiple and complex, sometimes even contradictory constraints, that must be all satisfied at the same time.
Genetic Algorithm – How generate initial population(chromosomes) evaluate the fitness of each chromosome Based on the probabilities that proportional to the fitness values, select two chromosomes from the population and perform crossover with probability Pc to generate new chromosomes. Repeat this process until the size of the new chromosomes are the same as that of the old. perform mutation to the new chromosomes with probability Pm no Is the maximal iteration reached ? yes output the optimal solution
age1 bmi W_BR height weight waist A_S A_D SMK ALC 598 598 597 598 598 597 598 598 598 598 GOT GPT UA CHO TG HDL_C LDL_C Glu0 GLu1 GLu2 595 595 580 593 594 594 592 575 446 446 GLu3 GLu4 in1 in2 in3 in4 in5 T4 PTH Epinephr 447 444 461 355 355 357 360 337 337 240 NEP Dopamine creatine CRE DUCRE MicroA bna Bk bcl una 240 240 247 593 236 241 552 552 551 248 uk ucl DUNA DUK DUCL ALDOS Uprotein BUN Activity PRA 248 252 236 236 238 328 0 0 535 325 Covariates In The Young Hypertension Database
GOT GPT 血中肌酐酸 Uric Acid Cholesterol 三酸甘油脂 高密度脂蛋白 低密度脂蛋白 血鈉 血鉀 GOT GPT CRE UA CHO TG HDL_C LDL_C bna Bk 血氯 Aldosterone 甲狀腺素 副甲狀腺素 Plasma Renin Activity Epinephrine 正腎上腺素 Dopamine KALLIKREN 尿鈉 bcl ALDOS T4 PTH PRA Epinephrine NEP Dopamine KALLIKREN una 尿鉀 尿氯 尿中肌酐酸 MicroA 24H肌甘酸 24H尿鈉 24H尿鉀 24H尿氯 尿蛋白 血尿素氮BUN uk ucl creatine MicroA DUCRE DUNA DUCK DUCL Uprotein BUN glucose0分鐘 glucose30分鐘 glucose60分鐘 glucose90分鐘 glucose120分鐘 Insulin0分鐘 Insulin30分鐘 Insulin60分鐘 Insulin90分鐘 Insulin120分鐘 Glu0 GLu1 GLu2 GLu3 GLu4 in1 in2 in3 in4 in5 ACE activity Activitty Covariates In The Young Hypertension Database (cont.)
age1 bmi W_BR height weight waist A_S A_D GOT GPT UA CHO TG HDL_C LDL_C Glu1-Glu0 Glu2-Glu1 Glu3-Glu2 Glu4-Glu3 in2-in1 in3-in2 in4-in3 in5-in4 Glu0xin1 T4 PTH Epinephr NEP Dopamine creatine CRE DUCRE bna Bk bcl Una/Vol uk/creatine ucl/creatine DUNA DUK DUCL ALDOS Activity PRA 103 subjects Number of clusters
age1 bmi W_BR height weight waist A_S A_D GOT GPT UA CHO TG HDL_C LDL_C Glu0 CRE bna Bk bcl 519 subjects