STATISTICAL ASSOCIATION AND CAUSALITY

STATISTICAL ASSOCIATION AND CAUSALITY Nigel Paneth

CAUSALITY AT DIFFERENT LEVELS • Molecular cause • Physiological cause • Personal cause • Social cause, etc. • We will discuss “cause” from the perspective of what aspect (or aspects) of the environment, broadly defined, if removed or controlled, would reduce the burden of disease.

CAUSAL INFERENCE 1. DETERMINISTIC CAUSALITY Many expect a cause to be very closely related to an effect, as in necessary and sufficient causes:

Necessary cause: The cause must be present for the outcome to happen. However, the cause can be present without the outcome happening. Sufficient cause: If the cause is present the outcome must occur. However, the outcome can occur without the cause being present.

EXAMPLE OF NECESSARY CAUSE If outcomes are defining in terms of causes, the cause is necessary by definition. For example, the tubercle bacillus is necessary for tuberculosis by the definition of tuberculosis. Etiologic (as contrasted to manifestational) classification of diseases often produce necessary causes. Hepatitis B once looked to be a necessary cause of hepatocellular carcinoma. But now we see that Hepatitis C may produce it too.

EXAMPLE OF SUFFICIENT CAUSE Sufficient causes are very rare in medicine, because it is exceptional that one exposure is by itself enough to cause disease. Usually exposures are much more common than the diseases they cause. Only about 5% of people who smoke get lung cancer. The measles virus virtually always causes people to get clinical measles, and rabies infection is always fatal.

EXAMPLE OF NECESSARY AND SUFFICIENT CAUSE • HIV could once be classified as both the necessary and sufficient cause of AIDS. • Now, however, it may be that one can be infected with HIV and never get AIDS, either because of rare genetic protection, or because of treatment of the virus.

NECESSARY CAUSE(e.g. the tubercle bacillus and tuberculosis)

SUFFICIENT CAUSE( Rabies infection and death)

BOTH NECESSARY AND SUFFICIENT (e.g. HIV and AIDS in the past)

Koch’s postulates were an example of deterministic causality. To prove that an organism causes a disease, he required that: 1. The organism must be isolated in every case of the disease (i.e. be necessary) 2. The organism must be grown in pure culture 3. The organism must always cause the disease when inoculated into an experimental animal (i.e. be sufficient) 4. The organism must then be recovered from the experimental animal and identified.

PROBABILISTIC CAUSALITY In epidemiology, most causes have much weaker relationships to effects. For example, high cholesterol may lead to heart disease, but it need not (insufficient), and heart disease does not require a high cholesterol (unnecessary). The emphasis on multiple causes in probabilistic causality leads to expressions such as the web of causation, or chain of causation

The measures of association - odds ratio, risk ratio, or correlation coefficient, and of public health impact - e.g. population attributable risk - are related to the strength of the causal relationship. The higher the odds ratio, the closer the cause is to being necessary and sufficient. A PAR of 100% means that the cause is necessary - all cases would be prevented if the cause were removed.

One pragmatic definition of a cause (or a determinant) of a disease is an exposure which produces a regular and predictable change in the risk of the disease. Thus the increase of lung cancer in women, and its magnitude, were predicted based on information on their cigarette smoking habits

ASSOCIATION VS CAUSATION To decide whether exposure A causes disease B, we must first find out whether the two variables are associated, i.e. whether one is found more commonly in the presence of the other.

Almost all of statistics is an attempt to discover whether two variables are associated, and if so, how strongly, and whether chance can explain the observed association. Statistics are primarily designed to assess the role of chance in that association. A p value only tells us how unlikely the association is to have arisen by chance. Therefore, Statistical analysis alone cannot constitute proof of a causal relationship.

MAKING CAUSAL INFERENCES The use of causal criteria in making inferences from data.

The process of weighing evidence at the level of the individual is clinical judgment (e.g. should this patient with a urinary tract infection be treated with Ampicillin or Sulfisoxazole?)

The process of weighing evidence at the level of the population is epidemiological judgment (e.g. should middle-aged men take aspirin daily to prevent heart attacks?)

When looking at data from epidemiological studies, we often use casual criteria to assist in weighing the evidence. The most commonly used are the following criteria, derived initially from the work of the British statistician Austin Bradford Hill, and later further developed by the U.S, Surgeon General's Office in its 1964 report on smoking and cancer.

Causal criteria are usually applied to a group of articles on a topic, though, in modified form, they can be applied to an individual paper.

CAUSAL CRITERIA Five commonly used criteria for assessing causality in exposure-outcome relationships have been used by epidemiologists for many years.

1. STRENGTH dose response 2. TIME-ORDER 3. SPECIFICITY 4. COHERENCE 5. CONSISTENCY

STRENGTH Is the association strong? Heavy smoking is associated with a twenty-fold higher rate of lung cancer, and a doubled rate of coronary heart disease. The association of smoking with lung cancer is therefore stronger than its association with heart disease. The stronger the association the more likely it is to be truly causal.

STRENGTH One reason for the importance is is that any confounding variable must have a larger association with the outcome to be confounding. The larger the relative risk observed, the less likely it is that a confounder with an even larger relative risk is lurking in the background.

EXAMPLE: The strength of the association was the key evidence for the association between folic acid supplements and neural tube defects, in spite of less-than-ideal study design.

Dose-response relationship If a regular gradient of disease risk is found to parallel a gradient in exposure (e.g. light smokers get lung cancer at a rate intermediate between non-smokers and heavy smokers) the likelihood of a causal relationship is enhanced. Dose-response is generally thought of as a sub-category of strength.

Dose-response relationship However, dose-response is not relevant to all exposure-disease relationships, because disease sometimes only occurs above a fixed threshold of exposure, and thus a dose-response relationship need not be seen. (remember also that misclassification of adjacent classes can easily produce an apparent dose-response relationship)

EXAMPLE: For each increase in amount of cigarettes smoked, the risk of lung cancer rises.

TIME ORDER This very important criterion simply states that one must know for sure that the cause preceded the effect in time. Sometimes this is hard to know, especially in cross-sectional studies.

EXAMPLE 1. Studies have found an inverse relationship between a person’s blood pressure and a person’s serum calcium. But which is the cause and which the effect? Time-order can also be uncertain when disease has a long latent period, and when the exposure may also represent a long duration of effect.

EXAMPLE 2: Low serum cholesterol has been linked to increased risk of colon cancer in prospective cohort studies. But is a low serum cholesterol a cause of colon cancer, or does an early phase of colon cancer cause low cholesterol levels?

SPECIFICITY Causality is enhanced if an exposure is associated with a specific disease, and not with a whole variety of diseases

EXAMPLE 1. Asbestos causes a specific lung disease, asbestosis, distinguishable from many other lung diseases. But low level lead exposure is associated with lower IQ rather than a distinguishable brain syndrome. Thus lead is more uncertain as a cause because of possible confounding with other causes of this rather non-specific effect, low IQ (e.g. SES).

Causality is also enhanced if a disease is associated with a specific exposure, and not with a whole variety of exposures.

EXAMPLE 2: Which disease is benzene more likely to be a cause of? Significant Adjusted ORs for the association of two diseases with five exposures Disease XDisease Y 1 .smoking 2.1 1.1 2 .low SES 4.2 0.9 3. male gender 2.3 1.2 4. works with 3.0 3.0 benzene 5. factory 1.5 0.8 employee

IMPORTANT PRINCIPLE: Specificity is enhanced by hypothesis formulation. Pre-specification is our major protection against chance findings.

COHERENCE Does the association fit with other biological knowledge? One must look for support in the laboratory, or from other aspects of the biology of the condition.

EXAMPLE: Presence of a serological marker of hepatitis B infection is associated (in Asia at least) with greatly elevated rates of liver cancer. That Hepatitis B infection is a true cause of liver cancer is also supported by the finding of the viral genome in many liver cancers.

By contrast, Reserpine (an anti-hypertensive drug) was thought to be a cause of breast cancer based on some studies done in the early 1970's. But there was no other supporting biological information, or any truly plausible biological mechanism. Subsequent larger studies failed to support this association. Similarly for EMF and carcinogenesis.

CONSISTENCY Is the same association found in many studies? Hundreds of studies have shown that smoking and lung cancer are associated, and no serious study has failed to show this association. But whether oral contraceptives are associated with breast cancer is uncertain because some studies show an association, but others do not.

Meta-analysis is a formal method to assess the consistency of the measure of association across many studies.

CONSISTENCY Consistency can mean either: • Exact replication, as in the laboratory sciences, or • Replication under many different circumstances. In epidemiology, exact replication is impossible

WHEN TO APPLY CAUSAL CRITERIA? Causal criteria are principally designed to deal with the problem of confounding. By applying the criteria, we reduce the possibility of falsely assigning cause to the wrong exposure. Causal criteria do not work well in the case of bias.

FOR EXAMPLE Prenatal care was widely believed to prevent low birthweight. However, women not getting prenatal care tended to have all sorts of problems associated with low birthweight. Because studies of prenatal care assembled biased samples, it was often impossible to remove the bias by adjustment. Moreover, the biased association was very consistent, and the effect size was strong! (it lacked coherence however)

STATISTICAL ASSOCIATION AND CAUSALITY

STATISTICAL ASSOCIATION AND CAUSALITY

Presentation Transcript

Causality and Experimental Design

Causality

Emergence and Causality

Shoemaker, “Causality and Properties”

Statistical Causality by Rich Neapolitan http://orion.neiu.edu/~reneapol/renpag1.htm

Causality Matters

Correlation, Regression, and Causality

Correlation and Causality

Computational and Statistical Challenges in Association Studies

Causality

From Association Rules To Causality

Decision and Causality

The American Statistical Association

Statistical Association

V13: Causality

Causality

Causality and causal inference

Statistical association of genotype and phenotype

CAUSALITY ANALYSIS

Causality

Causality Project