260 likes | 755 Views
Exon Array Design Strategy GeneChip
E N D
1. Gene Level Expression Profiling Using Affymetrix Exon Arrays Alan Williams, Ph.D.Director Chip DesignAffymetrix, Inc.
2. Exon Array Design StrategyGeneChip® Human Exon 1.0 ST All content is projected onto the genome
Content has hard edges and soft edges:
Hard edges partition regions into multiple probe selection regions
Soft edges infer a probe selection region, but can be extended into a larger region by other content
Hard Edges
Internal splice site boundaries
PolyA sites
CDS Start and Stop Positions Soft Edges
Transcript start and stop positions (except when there is evidence of a PolyA site)
Internal splice site boundaries for aligned cDNAs when there are unaligned cDNA bases
All splice site boundaries from syntenic cDNA content
Introducing some new concepts:
Probe Selection Region (PSR)
Exon cluster
Transcript cluster (gene locus)
3. Probe Coverage Exon vs 3’ Array Gene Coverage
4. Content Sources GeneChip® Human Exon 1.0 ST Core Gene Annotations
RefSeq alignments
GenBank annotated full length alignments
Extended Gene Annotations
cDNA alignments
Ensembl annotations (Hubbard, T. et al.)
Mapped syntenic mRNA from rat and mouse
microRNA annotations
MitoMAP annotations
Vegagene (The HAVANA group, Hillier et al., Heilig et al.)
VegaPseudogene (The HAVANA group, Hillier et al., Heilig et al.)
Full Gene Annotations
Geneid (Grup de Recerca en Informàtica Biomèdica)
Genscan (Burge, C. et al.)
GenscanSubopt (Burge, C. et al.)
Exoniphy (Siepel et al.)
RNAgene (Sean Eddy Lab)
SgpGene (Grup de Recerca en Informàtica Biomèdica)
Twinscan (Korf, I. et al.)
5. Probes per RefSeq Transcript
6. Gene Level Summaries With exon arrays we can combine exon-level probesets to obtain better gene-level estimates.
More probes for greater sensitivity
Gene level signal estimates based on expression throughout the locus rather than a single point
Simplified bioinformatics
More flexibility in restructuring probe groupings based on expert knowledge
There is a variety of well established tools (including R/BioConductor) and methods for secondary analysis of gene level array data
Challenge
Non-constitutive exons
Discovery/Speculative content
7. Gene Level Analysis on Exon Arrays Sketch Normalization (Quantile-like)
PM-GCBG
IterPLIER
using Extended Meta Probeset File groupings
Users may want to do post summarization operations:
Normalization
Log transform
Variance stabilization by adding positive bias (ie PLIER+16)
8. Different Meta Probeset Lists
9. IterPLIER Start by generating PLIER signal estimate using all the probes
Pick 22 probes which are best correlated to the PLIER signal
Run PLIER on just the 22 probes
Pick 11 probes which are best correlated to the PLIER signal
Generate a final PLIER estimate with the 11 probes
Corollary:
If the meta probeset has 11 or fewer probes, then only 1 run of PLIER is performed and the result is equal to a regular PLIER result
If the meta probeset has more than 11 but 22 or fewer probes, then PLIER is run twice: once on the full set of probes and once on the best 11
10. Correlation of Different Gene Level Estimates
11. Adding Low-signal Decoys OWNER: Chuck
4-11 probesets are the 25th and 75th percentile of all the 1674 loci with at least 3 constitutive probe sets.
Experimental design:
Use cDNA to identify constitutive probesets included in all transcripts and at least 10 ESTs (or 5mRNAs) at that probeset.
Generate gene-level estimates from constitutive probesets and use them as gold standard.
Add low-signal decoy sets (Genscan Suboptimals) and observe effect on correlation with original estimates.
Add high-signal decoy sets (mRNA based) and observe effect of correlation with original estimates.
OWNER: Chuck
4-11 probesets are the 25th and 75th percentile of all the 1674 loci with at least 3 constitutive probe sets.
Experimental design:
Use cDNA to identify constitutive probesets included in all transcripts and at least 10 ESTs (or 5mRNAs) at that probeset.
Generate gene-level estimates from constitutive probesets and use them as gold standard.
Add low-signal decoy sets (Genscan Suboptimals) and observe effect on correlation with original estimates.
Add high-signal decoy sets (mRNA based) and observe effect of correlation with original estimates.
12. Gene Level Performance HuEx 1.0 ST vs HG-U133 Plus 2.0
13. Platform Concordance% Probe Set Pairs vs. Correlation Coefficient (1-way ANOVA p <= 10-8)
14. High Correlation: GLYAT: r=0.9902
15. Moderate Correlation: TSN: r=0.6575
16. Poor Correlation: SREBF1: r=0.0482
17. Platform Gene Level Sensitivity
18. One Array, Two functions Gene Level Expression and Transcript Diversity
19. TPM2
22. “Splicing Index” defined
23. Splicing Index Examples
24. Alternative Splicing Detection PAttern basedCorrelation (PAC)
Test whether exonscorrelate with eachother
ANOVA based(MiDAS)
Test a log-linearmodel
For more information see the Alternative Transcript Analysis Methods for Exon Arrays whitepaper:
http://www.affymetrix.com/support/technical/whitepapers/exon_alt_transcript_analysis_whitepaper.pdf
OWNER: Earl
Colon Cancer: Median normalization over entire data set
Tissue Data Set: Quantile w/in rep, Median over set
Method Assessment
Manufacture “unspliced” gene set
Choose 5,800 well sequenced genes
Bioinformatic pruning any alternative exons
Remaining exons form a “gene”
Simulate splice data
Move first exon of each gene to a different gene
Calculate ROC curves of “true positive” on simulated splice set versus “false positive” on unspliced gene set
OWNER: Earl
Colon Cancer: Median normalization over entire data set
Tissue Data Set: Quantile w/in rep, Median over set
Method Assessment
Manufacture “unspliced” gene set
Choose 5,800 well sequenced genes
Bioinformatic pruning any alternative exons
Remaining exons form a “gene”
Simulate splice data
Move first exon of each gene to a different gene
Calculate ROC curves of “true positive” on simulated splice set versus “false positive” on unspliced gene set
25. ROC Curves PAC method not suitable for a two group data set
No filter on input data
Synthetic Data
Tissues – mix exons across genes
Cancer – mix in low expression exons
OWNER: EarlOWNER: Earl
26. Alternative Splicing DetectionActive Area of Research Exon Array Workshop
45 attendees
11 presentations
New alternative splicing algorithms
New confidence in using Exon Arrays for Gene-Level expression profiling
New directions for filtering data for more robust results
http://www.affymetrix.com/corporate/events/2006_exon_tiling_workshop.affx
We have just reviewed our efforts on the commercial vendor front, there is a considerable amount of research that has been on-going in the research community, that is extremely active in designing new algorithms for microarray data analysis in the past.
In order to proactively collaborate with the research community, also in response to the request of early customers, who are interested in hearing directly from each other of their actual experiences, we sponsored the first of Exon Array Data Analysis Workshop.
This is the format that we intend to continue to support and would appreciate feedback from you as a user of the usefulness of such events and what you need from Affymetrix to help you get your research started.We have just reviewed our efforts on the commercial vendor front, there is a considerable amount of research that has been on-going in the research community, that is extremely active in designing new algorithms for microarray data analysis in the past.
In order to proactively collaborate with the research community, also in response to the request of early customers, who are interested in hearing directly from each other of their actual experiences, we sponsored the first of Exon Array Data Analysis Workshop.
This is the format that we intend to continue to support and would appreciate feedback from you as a user of the usefulness of such events and what you need from Affymetrix to help you get your research started.
27. Resources Human, Mouse, & Rat array content and annotation information
Array Support Page on Affymetrix.com
Various Analysis Whitepapers
Array Support Page on Affymetrix.com
Sample Data Sets
Sample Data section under Support
Colon cancer data set with 10 paired samples
Tissue data set
11 tissues in triplicate
4 different mixture levels for 3 tissues
Includes HG-U133 Plus 2.0 and Human Exon 1.0 ST
Analysis Software
Affymetrix Power Tools (APT)
ExACT