Linkage analysis

Linkage analysis Jan Hellemans 6

Finding causal mutations • 2 opposing strategies • sequence then select • select then sequence • Sequencing • traditional Sanger sequencing only possible after selection • Massively parallel sequencing possible prior to or after selection • RNA sequencing • exome sequencing • genome sequencing

Finding causal mutations • Selection • positional (prior to sequencing) • linkage analysis • GWAS • structural variations (e.g. microdeletions) • functional (prior to & after sequencing) • candidate genes selected based on known function or involvement in related disorders • filtering of variants based on functional predictions • overlap (after sequencing) • looking for genes / variants that occur in multiple independent patients • mostly a combination is used

exome sequencing

Aims Interprete microsatellite results Add genotypes to pedigrees Create pedigree and genotype files Calculate and interprete LOD-scores Delineate linkage intervals Basic principles of linkage analysis Analyze other types of markers Association studies Learn how to work with specific pedigree programs

Starting linkage analysis

Preparations • Clearly define the phenotype • If not specific enough than you may analyze different disorders that can map to different genomic loci • LOD scores are additive • Find suitable families • larger is better • more patients is better • Collect genomic DNA from as much family members as possible • Determine the type of inheritance • Calculate the power to prove linkage with the available material (SLink – not part of this course)

Linkage analysis types • Directed linkage analysis • Evaluate linkage at a specific locus such as a candidate gene • Common approach: evaluate an intragenic, 5’ and 3’ markeroften microsattelites • Genome wide linkage analysis • Screen for linkage for markers spread across the entire genome • Microsatellites: ~400 markers spaced at about 10cM • SNP’s: 500k SNP array • Homozygosity mapping • Screen only affected individuals in inbred families • Select homozygous markers (typically SNP markers) • Very efficient technology • Fine mapping • Some linked markers are known, but the borders of the linkage interval still need to be defined

Exercise – Part 1 • 2 inbred families with a recessive disorder • With a homozygosity mapping based on 500k SNP arrays 2 candidate regions could be identified • Chromosome 4 • Patient 1 homozygous for • 6.052Mb - 14.488Mb • 21.008Mb – 37.477Mb • Patient 2 homozygous for • 11.186Mb – 37.219Mb • Task: find microsatellite markers to confirm linkage

Find additional flanking markers • Find physical position of marker in NCBI > UniSTS • NCBI map viewer: http://www.ncbi.nlm.nih.gov/mapview/ • Go to Homo sapiens and to the wright chromosome • Maps & options: show • DeCode, Généthon & Marshfield (genetic maps) • Genes • Set region: e.g. 2Mb up- and downstream of your marker • Click ‘Data as table view’ • Click on STS behind a marker to see its details • Select markers that • locate to only 1 genomic location • have a PCR product with an extended size rangeone size  not polymorphic

http://www.ncbi.nlm.nih.gov/projects/mapview

Exercise – Part 1 > possible solution • Markers in 1st candidate region • D4S3017 (21.078Mb) • D4S3044 (25.189Mb) • D4S1618 (33.857Mb) • D4S3350 (33.857Mb) • D4S2988 (36.889Mb) • Markers in 2nd candidate region • D4S1582 (10.311Mb) • D4S2906 (12.321Mb) • D4S2944 (13.141Mb) • D4S1602 (14.059Mb) • D4S2960 (15.437Mb) •  Order primers & analyze them on all family members

Analyzing microsatellite data

Microsatellites > basics • Repeats of short sequences (e.g. 2bp)NNNNAC(AC)nACNNNN • Number of repeats is variable (instable sequence) • Number of repeats determines the allele • Number of repeats corresponds to specific length of PCR product: • allel 1: NNNNACACACACACNNNN (5*AC  18bp) • allel 2: NNNNACACACACACACNNNN (6*AC  20bp) • allel 3: NNNNACACACACACACACNNNN (7*AC  22bp) • ... • Determine length to know the allele (sequencer)

Microsatellites > basics

Microsatellites > determine size • Use internal size standard (other color) 220bp 230bp 225bp

Microsatellites > heterozygotes 220bp 230bp 223bp 225bp

Microsatellites > stutter peaks • Repeats are difficult to copy  polymerase slips • Some amplicons have 1 repeat lessa few even loose multiple repeats • Small repeats are more prone to slippage and show more pronounced stutter peaks • Largest product is the correct one • Distance between peaks = length of a repeat

Microsatellites > stutter peaks allelic peak 1st stutter peak 2nd stutter peak

Microsatellites > stutter peaks • Allelic peaks are the heighest • Stutter peaks are lower A1 A2

Microsatellites > stutter peaks A1 A2

Microsatellites > +A peaks • Taq polymerase tends to add an extra A at the 3’ end • Variable degree of products with or without this extra A • Do not confuse with stutter peaks (only 1bp difference) allelic peak allelic peak + A 1st stutter peak 1st stutter peak + A 2nd stutter peak 2nd stutter peak + A

Microsatellites > complex plots (stutter & +A) A1 A2

Microsatellites > mutliplex • Combine multiple markers in a single analysis ($$$) • Different size range • Multicolor • Commercial kits: e.g. 16 markers / lane

Microsatellite plots examples

Genotyping pedigrees

Genotyping pedigrees • Screen one or multiple markers for some or all family members • For every marker: • Make a list of all occuring allele sizes • Due to technical variation on sizing the same allele can have a slightly different size in different measurements (-0.4bp _ +0.4bp). Give all alleles within this range the same allele number • Add the allele numbers to the pedigree at the corresponding individual/marker combination • Find the wright phase • Advanced software like GeneMapper can generate tables with allele numbers for every sample / marker • Advanced pedigree programs like Progeny can store genotype information for family members • Verify inheritance

Exercise – Part 2 • Genotype 3 markers in all available individuals of 2 families • Pedigrees & microsatellite plots inExercisePart2-GenotypingData.pdf • Add allele numbers for the 3 markers to the pedigree • Interprete the genotyped pedigrees: linked?

Family 1

Family 2

Exercise – Part 2 > Conclusions • D4S1582 • Mendelian error  can not be interpreted • D4S2944 • Linked • D4S3017 • Not-linked: unaffected individuals with the same genotype as a patient

Calculate LOD scores

EasyLinkage EasyLinkage = UI for linkage analysis http://genetik.charite.de/hoffmann/easyLINKAGE/index.html#start Bioinformatics. 2005 Feb 1;21(3):405-7 PMID: 15347576 Bioinformatics. 2005 Sep 1;21(17):3565-7 PMID: 16014370 Interface for many linkage analysis programs Input Pedigree file (linkage format) Genotype file(s) Marker information (already provided for popular markers) Settings

Pedigree file Naming requirements for EasyLinkage:p_xxx.pro  e.g. p_SMMD.pro Format: Tab delimited text file 1 individual per row Columns: 1  family ID 2  person ID 3  father ID 4  mother ID 5  sex (1=male, 2=female, 0=unknown) 6  affection status (1=unaffected, 2=affected, 0=unknown) 7  DNA availability (optional, relevant for power calculations) 8  liability class (to be provided if multiple liability classes are used)

Genotype files Person ID’s have to match exactly with those provided in the pedigree file Naming requirements for EasyLinkage:MarkerName_xxx.abi  e.g. D1S1609_SMMD.abi Format: Tab delimited text file 1 individual per row Columns (for microsatellite based analysis): 1  marker (same as in file name and matching a marker in an available marker set) 2  custom information (content doesn’t matter, but column must be present) 3  individual ID (match person ID in pedigree file) 4 & 5  genotypes for 2 alleles (unknown=0)

Marker information Contains information on the chromosome and position of every marker Already available for a number of commercial SNP-arrays and for the microsatellite markers from Genethon Marshfield DeCode Custom marker sets can be created (see manual)

EasyLinkage settings Choose a program: FastLink  Parametric, single-point SuperLink  Parametric, single-/multipoint SPLink  Nonparametric, single-point Genehunter  Nonpara-/parametric, single-/multipoint Genehunter Plus  Nonpara-/parametric, single-/multipoint Genehunter MOD  Nonpara-/parametric, single-/multipoint Genehunter Imprinting  Nonpara-/parametric, single-/multipoint GeneHunter TwoLocus  Parametric, two-locus, single-/multipoint Merlin  Nonpara-/parametric, single-/multipoint SimWalk  Nonparametric, single-/multipoint Allegro  Nonpara-/parametric, single-/multipoint & simulation, single-/multi-point PedCheck  Mendelian error check FastSLink  Simulation, single-/multi-point

EasyLinkage settings Parametric <-> non-parametric Single point <-> multipoint Frequency of the disease allele Penetrance vectors (wt/wt, wt/mt, mt/mt) Standard dominant: 0 1 1 Standard recessive: 0 0 1 Reduced penetrance: replace 1 by penetrance (e.g. 0.9) Phenocopy: replace 0 by percentage of phenocopy (e.g. 0.1) Example: 0.01 0.9 0.991% chance to show a similar phenotype despite a normal genotype90% chance to show the phenotype when 1 mutant allele (dominant with incomplete penetrance)99% likelihood to present with the phenotype if both alleles are mutant

Evaluate calculated LOD-scores Maximum LOD-scores can be seen in EasyLinkage Details about LOD-scores at different recombination fractions can be found in text files generated by EasyLinkage  process in Excel (generate graphs, ...) Standard rules for LOD-scores >3  significant linkage 2<LOD<3  suggestive linkage -2<LOD<2  uninformative <-2  significant absence of linkage

Interpreting LOD plots

Linkage analysis

Linkage analysis

Presentation Transcript

Linkage analysis

Linkage Analysis I -- Parametric

Basics of Linkage Analysis

Linkage Analysis: An Introduction

Power of linkage analysis

Linkage analysis: basic principles

Non-parametric Linkage Analysis

Linkage analysis - genehunting

Linkage Analysis in Merlin

LINKAGE ANALYSIS

Genetic Linkage Analysis

Introduction to linkage analysis

Linkage Analysis

Linkage Analysis

Linkage Analysis

Introduction to Linkage Analysis

Genetic linkage analysis

Basics of Linkage Analysis

Linkage Analysis in Merlin

Linkage analysis and genetic mapping

Power in QTL linkage analysis