470 likes | 584 Views
Heritability and genomics of gene expression in peripheral blood. Paper: http://www.nature.com/ng/journal/v46/n5/full/ng.2951.html Related news: http://news.ncsu.edu/releases/wrightnatgen/. Presenter: Pak- Kan , WONG (13/05/2014). Contents. Background Method Summary Results
E N D
Heritability and genomics of gene expression in peripheral blood Paper: http://www.nature.com/ng/journal/v46/n5/full/ng.2951.html Related news: http://news.ncsu.edu/releases/wrightnatgen/ Presenter: Pak-Kan, WONG (13/05/2014)
Contents • Background • Method Summary • Results • Heritability in peripheral blood transcriptome • eQTL analysis • Biomedical relevance • Discussions
Expression Quantitative Trait Loci (eQTL) • QTL: Stretches of DNA containing or linked to the genes that underlie a quantitative trait • eQTL: QTL that regulate expression levels of mRNAs or proteins cis-eQTL Master trans-eQTL trans-eQTL Image credit: http://www.biostat.jhsph.edu/GenomeCAFE/ExpressionistSeminarSlides/eQTL_review_s.ppt
Peripheral Venous Blood • Blood vessels which are outside human heart make peripheral blood system. • Peripheral vessels • Venous blood is deoxygenated blood which travels from the peripheral vessels, through the venous system into the right atrium. http://en.wikipedia.org/wiki/Peripheral_blood http://www.circulationfoundation.org.uk/help-advice/vascular-health/the-circulatory-system/
Classical Twin Design (CTD) • Allow the study of varying family environments (across pairs) and widely differing genetic makeup: • “Identical" or monozygotic (MZ) twins • Share nearly 100% of their genes, which means that most differences between the twins (such as height, susceptibility to boredom, intelligence, depression, etc.) is due to experiences that one twin has but not the other twin. • "Fraternal" or dizygotic (DZ) twins • Share only about 50% of their genes. • Thus powerful tests of the effects of genes can be made. Twins share many aspects of their environment (e.g., uterine environment, parenting style, education, wealth, culture, community) by virtue of being born in the same time and place. • The presence of a given genetic trait in only one member of a pair of identical twins (called discordance) provides a powerful window into environmental effects. Ref.: http://en.wikipedia.org/wiki/Twin_study http://ibg.colorado.edu/cdrom2012/keller/Assumptions/Keller_Coventry_CTD_Indeterminacy_2005.pdf
Classical TwinDesign Mathematical Model • Monozygotic (MZ) twins: sharing all of their alleles • Dizygotic (DZ) twins: sharing on average 50% of their polymorphic alleles • Assumption: Equal environments for identical and fraternal twins • Assessing the variance of a phenotype in a large group and attempts to estimate how much of this is due to • Genetic effects (heritability) • Shared environment – events that happen to both twins, affecting them in the same way • Unshared, or unique, environment – events that occur to one twin but not the others, or events that affect either twin in a different way Factors A, D C E
ACE Model A=h2: additive genetics C=c2: common environment E=e2: unique environment A+C+E=1 • MZ: share 100% of their genes, share all of the environment • Correlation between identical twins provides an estimate of rmz = A+C • DZ: share on average 50% of genes, share all the environment • Correlation between fraternal twins is a direct estimate of rdz = 0.5A+C Expectation E = 1-rmz A = 2(rmz-rdz) C = rmz-A Ref.: http://onlinelibrary.wiley.com/doi/10.1002/0470013192.bsa002/pdf
From Netherlands Twin Registry (NTR) Introduction
Quantifying Human Transcriptomic Heritability • Although genes with genome-wide significant eQTLs are by definition ‘heritable’ additional polygenic variation may be widespread and fail to reach statistically significance by standard genotype-expression association. • Genes with substantial polygenic variation may also be subject to unique selection pressures not apparent from the analysis of local eQTLs.
Association Analysis of Genetical Genomics Data • Sample size > 1000 for few studies but we require > 3000 • Not replicate even using the same HapMap LCLs under standardized procedures • Especially for trans-eQTLs (due to tissue type, ancestry, winner’s curse, …) • Gene expression for commonly used LCL is sensitive to EBV copy number and growth rates Franke, Lude, and Ritsert C. Jansen. "eQTL analysis in humans." Cardiovascular Genomics. Humana Press, 2009. 311-328.
Proposed Method • Classical twin design (MZ vs. DZ) • 2752 individual twins • Cohort study • Peripheral venous blood samples
Goals • To describe and evaluate the heritability of all transcripts measured in peripheral blood • To identify a comprehensive list of local and distant eQTLs and evaluate their characteristics and replicability • To assess the biomedical relevance of the identified eQTLs
Data Collection and Pre-Processing • Subjects and biological sampling • Using harmonized protocols • Two longitudinal cohort studies (2-year follow-up) • Netherlands Twin Registry (NTR): 2752 (out of 3516) samples • Netherlands Study of Depression and Anxiety (NESDA): 1895 (out of 2783) samples • 227 controls • Steady-state transcription in peripheral blood for 43638 probe sets from 18392 genes • Gene expression assays • Remove sex-mismatched samples and additional samples of poor quality • Removal of 19 samples with the lowest D values resulted in the largest number of significant transcripts (q<0.10) • Genome-wide SNP assays • Among 714 monozygotic twin pairs, the intrapair agreement for 686895 autosomal SNPs was 0.9985 • 8.3 million SNPs are used. Examined replication in eQTL analyses
Demography of 2752 subjects from 1444 twin pairs for twin-based heritability analyses
Twin-Based Transcript Heritability • Maximize the logarithm of the profile restricted maximum likelihood (REML) function ,where • is the rank of the correlation matrix of zygosity. the correlation matrix of twins. the expression values. the covariates. Twin-based heritability Shared environmental effects
Results Twin-Based Heritability in the Peripheral Blood Transcriptome
Investigating on Expression Covariates • To identify a minimal set of covariates • Increase power for expression heritability calculation and improve the eQTL mapping • The covariates can be roughly divided into • Covariates related to technical variation • Clinical covariates that are subject specific • Covariates related to blood counts, which if not properly accounted for might produce spurious “eQTL” relationships.
Manhattan plot of heritability P values for the transcript with the highest h2 estimate 18392 genes h2 for all genes:h2 for expressed genes: 0.153 Max h2 = 0.905
K-means clustering of 777 (4.2%) genes with q<0.05 for h2 estimates Mean within-cluster expression correlation r ranged from 0.46 to 0.006
3 5 2 6 7 Tissue relevance 1 8 9 4
Heritability was strongly associated with expression mean and variance. Values in bold correspond to P<0.0022, for Bonferroni significance at α=0.05 for 23 tests in each of uncorrected and corrected analyses. And numerous KEGG and GO pathways ...
Disease Relevance ? NHGRI GWAS catalog identifying the nearest gene (GWAS genes) for each of 3628 significantly disease-associated SNPs (P≤5x10-8) for a total of 2343 GWAS genes. elevated
Hypothesis “Disease-causing genes are highly heritable.” • Given that GWAS genes were designated only on the basis of proximity to NHGRI-listed SNPs, these results may reflect an even stronger true tendency of disease-causing genes to be highly heritable. • These results are complementary to observations that disease-associated SNPs show eQTL enrichment. • OMIM database shows similar heritability enrichment, even though NHGRI GWAS and OMIM only partly overlap (of genes in either list, 10% are in both). • The OMIM genes with significant heritability (q<0.05) are also quite diverse, further supporting the potential relevance of peripheral blood to other tissues and developmental processes. • Evolutionary associations are consistent with the observation that heritability is necessary for responsiveness to selection. • Enrichment of disease-associated heritability may reflect other underlying sources of commonality but still point to transcription as an important intermediary in disease risk.
Results Local Genetic Contributions and Bias in Heritability Estimation
Local Genetic Contributions and Bias in h2 Estimation • In published studies, estimates have been complicated by bias and variability in h2 estimation.
Definitive Assess the True Extent of Transcriptomic Heritability • Model true h2 as following a gamma distribution with sampling variation determined by the ACE model 7.9% Similar mean h2 Less variation 100 0.3 For twin-based h2 estimates (n = 2752; 8818 expressed genes shown), subtracting the effects of sampling variation produces an estimated true distribution (blue curve). Resimulating from the fitted true assumed distribution closely approximates the observed h2 estimates (black curve).
Discrepancy between NTR and MuTHER • Expressed genes in both skin and LCLs with h2>0.5 • MuTHER report estimated >700 • NTR estimated ~100 • Effect of age? • NTR mean age was ~20 years younger • But age is not a covariate • Effect of sample size? • Sample size of MuTHER is much smaller. • Apply gamma fit and artificially adding sampling error to the true distribution / inflating the sampling variation • Fit the NTR estimated h2 distribution again 0.3
How many samples do we need? Effect of Sample Size Small sample size 1.0 correlation is not attainable…
Results eQTL Analyses of Peripheral Blood
Genotypes as Predictors of Transcription • Two types of genes • Local: Within 1MB upstream of the TSS and 1MB down stream of TES • Distant: Otherwise • Genes with at least one local eQTL (q<0.01) had significantly higher expression levels and heritability (P<1x10-200 for both)
Number of Unique Genes with Evidence of Local Association For NTR, the number of genes with significant eQTLs (q<0.01) was 11384. After employing final quality control steps, 9640 significant genes. With increasing sample size, it seems that most expressed genes (>10000) show evidence of local eQTL influence in peripheral blood. Little difference among the transformations
Overlap of local eQTL findings with two other large blood studies, at q<0.01 Peripheral blood eQTL meta-analysis of Westra et al. NTR NESDA Local eQTL replication True Discovered Rate: 59.6% and 59.7%, Annotated Genes
Results Characteristics of Distant eQTLs
Number of unique genes with evidence (q<0.01) for distant association Roughly linear in log-log scale
Overlap of distant eQTL findings (q<0.001) with previous studies (within 1 Mb of gene) Peripheral blood eQTL meta-analysis of Westra et al. NTR NESDA Distant eQTL replication
Properties of Distant eQTLs Examine using Ensembl Variant Effect Predictor v2.8 Lowest rate of overlap with regulatory features or replication in NESDA
eQTL Hotspots (SNPs influencing numerous transcripts) • 304 distant eQTL SNPs 203 regional clusters • 160 clusters: 1 SNP • 43 clusters: 2kb to 2Mb of DNA (median 89kb) • Potential hotspots: 11 clusters associated with ≥ 6 genes • The proportion of associated transcripts using NESDA data to avoid selection bias. • eQTL hotspots and significant distant eQTLs influence relatively few genes. Lower than the reported in MuTHER study Estimated proportion influenced by the 304 SNPs
Putative eQTL Hotspot • A distant eQTL hotspot on chr19 was associated with the expression of 12 distant genes and 1 local gene (MYO1F) • MYO1F expression is independent of the expression of the other distant genes, given the expression of the transcription factor SOX13
Biomedical Relevance • NHGRI GWAS catalog + filtering P<1x10-8 • 3415 SNPs, 498 traits and 4167 SNP trait pairs from 927 report • Of the 3118 genes in OMIM, 74.4% were part of a SNP-gene local eQTL pair (q<0.05). • …
Conclusion • Assessed gene expression profiles in 2,752 twins • Classic twin design to quantify expression heritability and eQTLs in peripheral blood • Group ~777 highly heritable genes into 9 clusters • Suggest that the previous heritability examined in a replication set is have been upwardly biased • Provide a new resource toward understanding the genetic control of transcription
Comments • New resource for support the newly identified SNPs • Computational pipeline for a board range of twin-based experiments • Sample variation in small sample size • Why are and how do they correlated? • Functions of each gene in the cluster, multiple layer of control? • New things to explore?
Data • Nature Paper + Supplementary Notes • http://www.nature.com/ng/journal/v46/n5/fig_tab/ng.2951_ft.html • Expression data and genotypes (Affymetrix6.0 and U219) • http://www.ncbi.nlm.nih.gov/gap/?term=phs000486 • Summary results in the seeQTL browser (GWAS results p<5e-8) • http://gbrowse.csbio.unc.edu/cgi-bin/gb2/gbrowse/seeqtl/
Related Links • Netherlands Twin Register (NTR in Dutch) • http://www.tweelingenregister.org/en/ • FastFacts about NTR • http://fastfacts.nl/en/content/netherlands-twin-register • The Multiple Tissue Human Expression Resource (MuTHER) • http://www.muther.ac.uk/
Correlation Matrix In GW heritability analysis using DZ twins, reestimated by PLINK with mean 0.501 and standard deviation 0.038 • Where • Re-express , where
On the Profile Function for Twin-Based Heritability • Considers the loss in degrees of freedoms associated with the fixed effect estimates. • Less biased compared to their corresponding maximum likelihood estimates and control type I error better. • The profile function has only three parameters regardless of the number of fixed effects and computationally more efficient than maximizing over the full REML function • Develop an algorithm on R for twin-based heritability analysis