390 likes | 615 Views
Genetic analysis of human disorders. Tom Scerri Basic ideas in genetics and linkage analysis. Why do we think genes cause disease?. Family aggregation: Family pedigrees with many affected individuals. Caveat: shared environmental influences may causes familial aggregation:
E N D
Genetic analysis ofhuman disorders Tom Scerri Basic ideas in genetics and linkage analysis
Why do we think genes cause disease? • Family aggregation: • Family pedigrees with many affected individuals. • Caveat: • shared environmental influences may causes familial aggregation: • e.g. living close to an area of pollution • Large twin based studies.
Large family pedigreeExample 1: Norwegian family Adapted from Fagerheim et al. (1999; Journal of Medical Genetics)
Unaffected Unknown Dyslexic Large family pedigreeExample 2: Finnish family Adapted from Hannula-Jouppi et al. (2005; PLoS Genetics)
Unaffected Unknown Dyslexic ? ? Large family pedigreeExample 3: Dutch family Adapted from Kovel et al. (2006; Journal of Medical Genetics)
mother father mother father child 1 child 2 child 3 child 4 Identical twins Non-Identical twins Twin Studies • Twins share 100% of their environment. • Identical twins also share 100% of their genes. • Non-identical twins share on average 50% of their genes.
? ? Twin Studies Identical twins Non-Identical twins 68% concordance 38% concordance
MENDELIAN (Single gene) MULTIFACTORIAL (many genes + environment) Complete penetrance Incomplete penetrance Environmental factors Genetic heterogeneity
Phenotypes (or traits) • Categorical or dichotomous • yes/no or present/absent or affected/unaffected • diseases (e.g. cystic fibrosis) • disorders (e.g. dyslexia) • traits (e.g. taste perception of phenylthiocarbamide (PTC), flower colour) • often monogenic • Quantitative or continuous • range of values, often normally distributed • Height, weight • Intelligence, or reading ability • often polygenic • small effects from multiple genes • often modified by environmental influences • e.g. diet, education
Complete penetrance Partial penetrance Polydactyly in cats Danforth, Journal of Heredity (1947) Penetrance versus Expressivity • Penetrance • The proportion of people carrying a disease causing allele that unaffected • Can be complete of partial • Expressivity • The extent to which an allele “displays or expresses” its affect • May depend on other genes and/or the environment
Carriers of high risk variant Carriers of low risk (‘normal’) variant Phenotype X Phenotype X Gene A Gene B Gene C Gene D Phenotype Y Phenotype Y Phenotype Y Phenotype Y Gene A Gene B Gene C Gene D Phenotype Z Phenocopies, Heterogeneity and Oligogenicity • Phenocopy • Affected individuals carrying ‘normal’ variant of gene • Heterogeneity • Variants in distinct genes that may result in the same phenotype (e.g. in different families). • Oligogenicity • Variant from several distinct genes acting together to create a phenotype
Familiarity versus Heritability • Familiarity • the extent to which a ‘trait’ passes down through generations • genetic • environmental • Heritability (H2) • the proportion of phenotypic variation that is attributable to genetic variation • Phenotype (P) = Genotype (G) + Environment (E) • Var(P) = Var (G) + Var (E) (simplistic model) • H2 = Var(G) / Var(P)
a human cell a human The Human Genome 23 from mother 23 from father • 23 pairs of chromosomes: • each made of DNA • contains of hundreds of genes
female male Genetic Inheritance - simplistic model I mother father child 1 child 2 child 3 correct
female male Genetic Inheritance - simplistic model II mother father child 1 child 2 child 3 wrong
female male Genetic Inheritance - recombination mother father child 1 child 2 child 3 correct
Family 1 Family 2 Family 3 mother father mother father mother father child 1 child 2 child 1 child 2 child 1 child 2 Linkage Analysis • Principle: Identify regions of genome co-segregating with disease in affected individuals. .... Family 300 Linked to disease
Types of Linkage Analysis • Parametric linkage analysis • Must define precise model of inheritance • e.g. dominant, recessive • gene allele frequency • penetrance of alleles • Suitable for Mendelian phenotypes • Non-parametric linkage analysis • Model free • Suitable for complex disorders • Require larger samples for comparable power • Look for chromosomal regions shared by affected individuals
mother father a1a2 a1a3 a1a3 a1a2 mother father a1a2 a2a3 a1a3 a1a2 IBS versus IBD • Identity by state (IBS) • Two alleles that appear the same (e.g. a1) • Not necessarily from the same ancestor • Identity by descent (IBD) • Two alleles that appear the same (e.g. a1) • They must be IBS • Derived from the same ancestor • Requires parental information • Used for affected sibling pair (ASP) analysis
Assume affection status of both parents not known AB CD AC AC = 2 IBD (0.25) AD BC BD = 0 IBD (0.25) 1 IBD (0.50) AB CD AB AD AC AC AD AA AA dominant recessive Affected Sibling Pairs • Given a random chromosomal locus, siblings will be expected to share 0, 1 or 2 haplotypes IBD with frequencies 0.25, 0.5 or 0.25 respectively. • Given a chromosomal locus ‘linked’ to a disease, i.e. a disease allele (A) is on a haplotype carried by affected individuals, siblings will share 0, 1 or 2 haplotypes IBD with frequencies: • 0.0, 0.5 or 0.5 respectively, if dominant • 0.0, 0.0 or 1.0 respectively, if recessive
Affected Sib Pair Analysis • Non-parametric • i.e. it is “model free” (no need to define model) • Collect lots of nuclear families: • Two parents • Affected sibling pairs • Genotype lots of markers: • Preferably polymorphic, e.g. microsatellites • Look for deviations from the 0.25, 0.5, 0.25 frequencies of 0, 1 or 2 IBD. • Caveats with complex disorders: • Partial penetrance and varied expressivity. • Susceptibility loci are not always necessary or sufficient to cause disease. • A single chromosomal region may not be shared by all affected sib pairs. • Can lead to large candidate regions. • Programs such as Merlin, GENEHUNTER and MAPMAKER/SIBS can derive nonparametric lod scores
Quantitative Trait Loci (QTL) Mapping • Non-parametric (or Model-free) linkage methods • Hence, do not necessarily require: • allele/gene frequencies • rates of penetrance • mode of inheritance • assumption of monogenetic inheritance • Can incorporate or better handle heterogeneity and oligogenicity • Therefore more suitable for complex genetic traits • Method 1: Haseman-Elston Linkage Analysis • Method 2: Variance Components (VC) Linkage Analysis
Square-trait difference Basic Haseman-Elston Linkage Analysis • Squared trait differences for sib-pairs are regressed on IBD allele-sharing • For a locus that does not influences trait levels, the sib-pair IBD will not be correlated with their squared trait-differences.
Square-trait difference Basic Haseman-Elston Linkage Analysis • Squared trait differences for sib-pairs are regressed on IBD allele-sharing • For a locus that does influence trait levels, the sib-pair IBD will be negatively correlated with their squared trait-differences.
Locus 1 Locus 2 Locus 3 Locus 4 Locus 5 Locus 6 Basic Haseman-Elston Linkage Analysis • Squared trait differences for sib-pairs are regressed on IBD allele-sharing • For a locus that influences trait levels, the sib-pair IBD will be negatively correlated with their squared trait-differences. Square-trait difference
VC Linkage Analysis • Dissects the genetic variation within the quantitative trait. • Advantage - large sibships or entire pedigrees can be simultaneously analysed. • Advantage - all phenotypic variability is considered. • Disadvantage - Computationally intensive. • Uses maximum-likelihood estimation • A statistical method for fitting a statistical model to data • Provides estimates for the parameters • Takes a fixed set of data (i.e. genotypes, phenotypes and pedigree structure) and derives the model parameters that produce the distribution most likely to have resulted in the observed data • Trait variability is partitioned into major-gene, polygenic and environmental factors. • Linkage analysis compares null hypothesis (no major gene effect) to the alternative hypothesis (where the major gene component can vary freely).
Key: CCI CCN Olson Read Spell Spoon Example comparing two methods:Chromosome 18 Linkage Results Haseman-Elston Analysis Variance Components Analysis UK Sample1 centromere centromere UK Sample 2 centromere centromere UK Sample 3 centromere centromere
Merlin • Performs NPL analysis: • Qualitatitive • Quantitative (an extension of the HE method) • Uses 3 specific input files: • Ped file (.ped) • Dat file (.dat) • Map file (.map)
Family 1 Mother [1] Father [2] Child [3] Child [4] header row only visible for this lecture, must not be used in reality founders Merlin: ped file • Tab-delimited • Describes pedigree structure • Each row represents a different individual • 5 mandatory columns on left side: • Family ID (numeric, unique between families) • Individual ID (numeric, unique within family) • Father ID • Mother ID • Sex of individual (1 = male, 2 = female)
Merlin: ped file Family 1 Mother [1] Father [2] • Subsequent columns contain: • genotype information • Two consecutive integers per marker • One for each allele • Else, X = missing allele • phenotype information • Qualitative (affection status) • 1 = unaffected • 2 = affected • 0 = missing phenotype • Quantitative • Numeric values • X = missing phenotype Child [3] Child [4]
Merlin: ped file • Can be massive, e.g.: • Families • Many siblings • Half-relatives • Multigenerational • Markers and/or phenotypes: • Tens, hundreds, thousands or even millions! • Requires a .dat file to describe the user-definable columns
Merlin: dat file • Tab delimited file • Describes columns from 6th onwards. • Each row: • describes a subsequent column (starting from the 6th) • contains two columns: • 1st = nature of .ped file column • A = affection status • T = quantitative trait • M = genetic marker (actually corresponds to 2 columns from ped file) • 2nd = alphanumeric name of the phenotype or genetic marker
Merlin: map file • Tab delimited file • Describes the positions of genetic markers present in the .dat file • First row is a header row: • CHROMOSOME, MARKER, POSITION • Each subsequent row gives the: • chromosome number of the marker • name of the marker • genetic position of the marker (in centiMorgans)
Exercise 1a: Using Merlin to perform NPL analysis • Data and 1st lecture available here: • www.well.ox.ac.uk/~clicker/Bologna/Lecture1/ • Merlin website available here: • http://www.sph.umich.edu/csg/abecasis/merlin/tour/linkage.html • Or simply Google “Merlin Linkage” • Follow the Merlin “Linkage” tutorial using the ASP example files. • Understand input files • Make sure to check for data integrity using pedstats: • Check for family connectivity • Perform NPL and VC linkage analyses • Tip: use the “ - -pdf ” option to see graph of output
Exercise 1b: Using Merlin to perform NPL analysis • Click on “Regression” on the left-hand menu • Perform “regression-based” linkage analysis with: • ASP example data files • chr18 data set: • 50+ microsatellites (majority named d18s###) • 3 quantitative phenotypes: • Read_T_2003 • Spell_T_2003 • Spoon_Resid_2003 • Contains 3 bugs that need fixing • Tip: use the “ - -pdf ” option to see graph of output