520 likes | 686 Views
Projective LA ( PLA). Schematic illustration of LA. Projection-based LA for studying a group of genes. Y (second projection). Second projection. First projection. X (first projection). mediator Z. Figure 2 (a). Figure 2(b).
E N D
Projection-based LA for studying a group of genes Y (second projection) Second projection First projection X (first projection) mediator Z Figure 2 (a) Figure 2(b).
X : vector of p variables, X1,…Xp, each measures the expression level of one gene. one-dimensional projection: a’X = a1X1+..+apXp with norm ||a||=1. 2-D projection: a, b be orthogonal : a’b=a1b1+..+apbp=0. Liquid association between a’X and b’X as mediated by Z is LA(a’X, b’X|Z)= E(a’X b’X Z)=E(a’XX’b Z)=a’E(ZXX’)b. Most informative 2-D projection : maximize |a’E(ZXX’)b| over any pair of orthogonal projection directions a, b.
solution : eigenvalue decomposition of the matrix E(ZXX’): E(ZXX’) vi = li vi , l1 ≥ ……… ≥ lp vi are eigenvectors and lI are eigenvalues. Theorem. Assume Z is normal with mean 0, SD= 1. Subject to ||a||=||b||=1 and a’b=0, the maximum for the absolute value of LA(a’X, b’X|Z) is (l1 - lp)/2. The optimal 2-D projection is given by a=(v1+vp )/√2 (or -a), b= (v1 - vp )/ √2 (or -b).
Patterns of Coexpression for Protein Complexes by Size in Saccharomyces CerevisiaeNAR 2008,Ching-Ti Liu, Shinsheng Yuan, Ker-Chau Li • Many successful functional studies by gene expression profiling in the literature have led to the perception that profile similarity is likely to imply functional association. But how true is the converse of the above statement? Do functionally associated genes tend to be co-regulated at the transcription level? In this paper, we focused on a set of well-validated yeast protein complexes provided by Munich Information Center for Protein Sequences (MIPS). Using four well-known large-scale microarray expression datasets, we computed the correlations between genes from the same complex. We then analyzed the relationship between the distribution of correlations and the complex size (the number of genes in a protein complex). We found that except for a few large protein complexes such as mitochondrial ribosomal and cytoplasmic ribosomal proteins, the correlations are on the average not much higher than that from a pair of randomly selected genes. The global impact of large complexes on the expression of other genes in the genome is also studied. Our result also showed that the expression of over 85% of the genes are affected by six large complexes: the cytoplasmic ribosomal complex, mitochondrial ribosomal complex, proteasome complex, F0/F1 ATP synthase (complex V) (size 18), rRNA splicing (size 24), and H+- transporting ATPase, vacular (size 15).
Figure 1. Comparison of correlation distributions for protein pairs with respect to functional association (shown in left panel) and complex size (shown in right panel). The terms “cc”, “yg”, “rst” and “st1” represent four different data sets: cellcycle, segregation genetics, rosetta and stress data, respectively. Protein complex pairs are abbreviated as “rel” and unrelated pairs are abbreviated as “unrel”.
Yeast protein complexes the Saccharomyces cerevisiae-Protein Complexes database of MIPS (http://mips.gsf.de/proj/yeast/catalogues/complexes/index.html). For a group of genes, find mediator gene which has the strongest influence on their co-expression. Curse of dimensionality; Projection pursuit
Cytoplasmic translation initiation complex eIF2 • SUI2, SU3, GCD11 : bind , deliver initator Met-tRNAiMet to 40S ribosomal subunit • PLA finds : small subunits RPS26A, RPS23A, large subunit assembly/maintenance RPL11B, RPL10, DBP10, rRNA processing IFH1, DBP10
Translation initiation complex eIF4F • TIF4631, TIF4632 : positive role in translation ; bind to mRNA cap-binding protein CDC33p • CAF20: small p20, negative regulator of translation ; competitive binding to CDC33p • PLA finds TIF5(best score, initiation factor eIF5), RPL30, YTM1(ribosomal large subunit biogenesis), PNO1(90S preribosome), RAP1(best known TF for ribosomal proteins)
Translation initiation complex eIF4F. First panel shows the optimal LA projection as mediated by TIF5. The second panel shows that down-regulation of TIF5 indicates weak CAF20 activity. In the third panel, a negative trend between CAF20 and TIF4621 is observed when TIF5 is up-regulated, reflecting the antagonistic roles of CAF20 and TIF4621 in translation regulation.
Trait-trait dynamic interaction: 2D-trait eQTL mapping for genetic variation study • Wei Sun Department of Statisitics, University of California, Los Angeles(*) • Shinsheng Yuan Institute of Statistical Science, Academia Sinica, 128, Academia Rd. Sec. 2, Taipei 115, Taiwan(**) • Ker-Chau Li : (*),(**)and (***) • (***)To whom correspondence should be addressed. E-mail: kcli@stat.ucla.edu kcli@stat.sinica.edu.tw
Abstract • Many studies have shown that gene expression variation is inheritable. Analogous to the traditional genetic study, most researchers treat the variation in expression of a gene as a quantitative trait and map it to expression quantitative trait loci (eQTL). This common approach can be described as a “one-dimensional-trait (1D-trait) mapping” because each trait is mapped separately. 1D-trait mapping ignores the trait – trait interaction completely, which is a major shortcoming. • To overcome this limitation, we study the expression of a pair of genes and treat the variation in their co-expression pattern as a two dimensional quantitative trait. We develop a method to find gene pairs, whose co-expression patterns, including both signs and strengths, are mediated by genetic variations and map these 2D-traits to the corresponding genetic loci. We report several applications by combining 1D-trait mapping with 2D-trait mapping, including the contribution of genetic variations to the perturbations in the regulatory mechanisms of yeast metabolic pathways • Our approach of 2D-trait mapping provides a novel and effective way to connect the genetic variation with higher order biological modules via gene expression profiles.
eQTL Known as ``eQTL'' or ``genetical genomics'', treats gene expression profiles as Quantitative Traits and map them to genetic Loci applied in yeast (Brem02, Yvert03) mouse ( Schadt03, Chesler05, Bystrykh05), rat ({Hubner05}) human (Morley04) . Results from these studies have shown that gene expression level is highly inheritable; can be linked to a local locus (cis-linkage) or different distant loci (trans-linkage). One gene at a time Compare the expression profile of a single gene with the genetic mark profiles for finding most significant association. Densely distributed genetic markers.
Two parents strains: RM11 and BY 4716 are crossed to generate some offspring with diverse genetic make-ups
leucine biosynthesis X=leu1 Y=leu2 Z=marker Find Leu2 O : lab strain; a leu2 null mutant
III Correlating gene-expression with gene markers c.line1 c.line2 …….. C.linep gene1 gene2 x11 x12 …….. x1p x21 x22 …….. x2p … … y11 y12 ………… y1p y21 y22 ……………y2p ….. Marker 1 Marker 2 Brem, R., Yvert, G., Clinton, R, Kruglyak, L. (2002) Science Vol 296, 752-755. Yeast segregation
2D traits dynamic coexpression • Dynamic co-expression patternof two genes mapped to an eQTL. This cartoon shows that the co-expression pattern of two genes Gene1 and Gene2 vary depending on genotype of eQTL $M$. $M_A$ and $M_a$ are two alleles of $M$. Expression of Gene1 and Gene2 has positive correlation if the allele at locus $M$ is $M_A$, but negative correlation if the allele at locus $M$ is $M_a$. However no correlation between Gene1 and Gene2 can be observed without conditioning on genotype of $M$.
1 D trait mapping (two genes mapped to same marker) • Co-expression pattern of two genes due to common eQTL. This cartoon shows that two genes Gene1 and Gene2, which are both linked to eQTL $M$, have correlated expression, where $M_A$ and $M_a$ are two alleles of $M$. However there is no or much weaker correlation between Gene1 and Gene2 condition on one particular allele. Gene1 and Gene2 can be both cis-linked, both trans-linked or one is cis-linked and one is trans-linked
Data • 40 yeast segregants generated from a cross of two budding yeast strains: a standard laboratory strain (BY) and a wild isolate from a California vineyard (RM). • Data for 6229 gene expression traits and 3313 single nucleotide • polymorphism (SNP) markers are collected for each yeast segregant. • The genotype profile of each marker is a binary vector, indicating from which parental strain the allele is inherited. • The gene expression data is downloaded from Gene Expression Omnibus (GEO) • (http://www.ncbi.nih.gov/geo/): GDS 91 and GDS 92 in series GSE37. The genotype data • are downloaded from Leonid Kruglyak’s laboratory’s website (http://www.fhcrc.org/labs/ • kruglyak/Data/).
The genotype profiles of neighboring markers tend to have very high correlations and some are even identical. • We merge adjacent markers into marker blocks sequentially using the criterion that any two marker profiles within one block are either the same or different by only one segregant. The 3313 markers are merged into 667 marker blocks. Dichotomized centroid of all the markers within a marker block is used to represent the marker block
Liquid association for binary profile • We define Liquid Association score specifically for binary variable Z using a proper rescaling of Z. Assume P(Z = 1) = a and P(Z = 0) = b (a + b = 1). We transform Z to Z’ such that Z’ = (a/b)0.5if Z = 0; Z’ = (b/a)0.5if Z = 1. Under this transformation, E(Z’) = 0, V ar(Z’) = 1, and LA(X, Y |Z) is given by • E(XY Z’) = (ab)0.5[E(XY |Z = 1) − E(XY |Z = 0)]
Dynamic Co-expression Pattern of Gene Pairs Within One Pathway • The pathway information is downloaded from Saccharomyces Genome Database (SGD, ftp://ftp.yeastgenome.org/yeast/). • Among the 139 pathways annotated by SGD, 121 of them include at least 2 genes with expression profiles in our expression dataset. • The majority of these 121 pathways (78/121) have no more than five genes. There are altogether 1711 gene pairs that can be formed from genes within the same pathway.
Permutation p-value • The permutation p-value of the most positive or the most negative LA score for each gene pair is calculated based on 5000 permutations. • At the permutation p-value cutoff 0.005, 207 gene pairs with positive LA scores (FDR=4.13%) 176 gene pairs with negative LA scores (FDR=4.86%) • in total we get 349 unique gene pairs, covering 70 pathways.
Leucine biosynthesis • Dynamic co-expression pattern of (LEU2, LEU1) and (LEU2, BAT1) are mediated by genotype of marker block 75, LEU2 is cis-linked to maker block 75. LA(LEU2, LEU1| MB75) = 0.3654 (p-value < 2e-4); LA(LEU2, BAT1|MB75) = 0.3349 (p-value < 2e-4).
Dynamic co-expression pattern of (RNR3, IMD2) and (RNR2, IMD2) are mediated by genotype of marker block 578, and CRT10 is cis-linked to maker block 578. LA(RNR3, IMD2|MB578) = 0.4693 (p-value < 2e-4); LA(RNR2, IMD2|MB578) = 0.3410 (p-value = 0.0396).
Dynamic co-expression pattern of (ADE5,7, ADE13) and (HIS2, HIS4) are mediated by genotype of marker block 473, and IMD3 is located around maker block 473. LA(ADE5,7, ADE13| MB473) = -0.4395 (p-value < 4e-3); LA(HIS2, HIS4|MB473) = 0.4169 (p-value = 1.8e-3).
Histone and purine biosynthesis • Dynamic co-expression pattern of (HIS1, IMD3) and (HIS5, IMD3) are mediated by genotype of marker block 473, and IMD3 is located around maker block 473. LA(HIS1, IMD3|MB473) = -0.4068 (p-value < 2.8e-3); LA(HIS5, IMD3|MB473) = -0.4327 (p-value = 1.08e-2).
How to explain trans-linked gene • If there is a cis-linked gene in a locus, a straightforward explanation of the trans-linkages is that the • sequence polymorphism in the eQTL affects the expression of the cis-linked gene first, and • then the cis-linked gene affects expression of the trans-linked genes. • In this situation, we would expect to observe the overall co-expression between the cis-linked gene and the translinked • genes.
a more complicate situation encountered in 1D-trait mapping : loci with only trans-linkages but no cis-linkages are detected. • the correlation between the expression profile of a trans-linked gene and that of any gene in/around the eQTL is most likely to be low. • This suggests that we may use 2D-trait mapping to find out whether there are more subtle dynamic coexpression patterns or not.
With 1D-trait mapping, we find altogether 76 genes trans-linked to the marker blocks that contain no cis-linked genes (details in the supplementary materials). Focus on spots with more than 3 trans-linked genes. Altogether 7 such linkage spots (corresponding to a total of 44 trans-linked genes ) are identified
For each linkage spot, we measure the function enrichment of the trans-linked genes by GO Term Finder of SGD (http://db.yeastgenome.org/cgi-bin/GO/goTermFinder). • Find 3 spots with enriched GO term annotation: • 1. Marker block 391: 8 genes are trans-linked to it, with enriched GO term “ATP metabolic process” (1.97e-7). (To be discussed next) • 2. Marker block 335: 3 genes are trans-linked to it, with enriched GO term “formate metabolic process” (3.87e-10). • 3. Marker block 446: 4 genes are trans-linked to it, with enriched GO term “mitochondrial electron transport, ubiquinol to cytochrome c” (0.00041).
First, by 1D-trait mapping, eight genes functioning in ATP metabolism and aerobic respiration are linked to Chromosome XI: 235.0kb to 252.8kb (marker block 390-391) HAP4, which encodes a transcription activator of respiratory genes is found in this locus Genome-wide TF binding data shows that Hap4 binds the upstream regions of ATP5, ATP7, and ATP14 • HAP4 is not cis-linked since this locus is a cis-null/all trans-linkage spot • the correlations in expressions between HAP4 andany of the 8 trans-linked genes are quite low.
To identify the possible dynamic co-expression patterns between HAP4 and the eight trans-linked genes, • we take the expression profile of each of the eight trans-linked genes as X, • the expression profile of HAP4 as Y , • and the genotypes of all the 667 marker blocks as Z to calculate LA scores. We look for marker blocks appearing multiple times in the short list of marker blocks with best LA scores (20 most positive and 20 most negative).
We find one marker block, marker block 41 (Chromosome II: 328.5kb to 334.0kb), appears six times (Table 6) as one of the marker block among the 20 marker blocks with most negative LA scores. We further find out that HAP4 co-expresses well with these genes if the sequence of marker block 41 is inherited from RM strain.
Using 1D-trait mapping, a gene, TCM62, is • found to be cis-linked to this marker block • It is known that Tcm62 forms a complex containing at least three SDH subunits Sdh1, Sdh2 and Sdh3 (Dibrov et al., 1998), and all these SDH genes are involved in aerobic respiration Oyedotun and Lemire, 2004, • which is consistent with the function of HAP4 and its target genes. • Thus marker block 41, or more specifically, gene TCM62 is a plausible candidate that mediates the co-expression pattern between HAP4 and its target genes.
References • Thesis of Wei Sun, chapter 3, UCLA, Statistics, 2007 • Li, K-C. Genome-wide co-expression dynamics: theory and application. Proc. Natl. Acad. Sci. USA 2002; 99: 16875-16880. • Li, K-C. and S. Yuan. A functional genomic study on NCI’s anticancer drug screen.The Pharmacogenomics Journal 2004; 4, 127-135. • Li, K-C, C-T Liu, W Sun, S Yuan and T Yu. A system for enhancing genome-wide co-expression dynamics study.Proc. Natl. Acad. Sci, USA 2004.