680 likes | 799 Views
UCSF/Stanford Oligo Array Meeting. Ash Alizadeh, Max Diehn, Chris Seidel, Joseph DeRisi, Mike Hagen, David Erle, Kate Rubins, Stephen Popper, Nicki Chin, Joseph Marquis, Elena Seraia, John Coller, Jon Pollack, Young Kim, Mike Fero, Jean Yang, Andrea Barczak, Peng Zhang, Jing Zhu April 22, 2004.
E N D
UCSF/StanfordOligo Array Meeting Ash Alizadeh, Max Diehn, Chris Seidel, Joseph DeRisi, Mike Hagen, David Erle, Kate Rubins, Stephen Popper, Nicki Chin, Joseph Marquis, Elena Seraia, John Coller, Jon Pollack, Young Kim, Mike Fero, Jean Yang, Andrea Barczak, Peng Zhang, Jing Zhu April 22, 2004
Background • Open questions: • How 3’ biased should the oligos be (how biased are the various types of labeling reactions)? • What are the optimal conditions for oligo arrays to allow us to answer the above: • i. Printing issues (substrates, amino modification, concentration, density) • ii. Post-processing (substrates, shampooing, boiling, BSA, alkylation) • iii. Labeling (direct vs indirect Cy incorporation, amplification, differential expression) • iv. Hybridization (stringency, formamide vs SSC/SDS, temperature) • v. Post-hyb washing (stringency)
The Problem: dT primed, 3’ biased amp method Rosetta Method Genome Biology 2003, 4:R66 (http://genomebiology.com/2003/4/10/R66)
Affymetrix On Tiling Controls-I “What 3'/5' ratio for control genes, for example GAPDH and Actin, should I anticipate to obtain on GeneChip probe arrays? • In addition to the conventional probe sets designed to be within the most 3' 600 bp of a transcript, additional probe sets in the 5' region and middle portion (M) of the transcript have also been selected for certain housekeeping genes, including GAPDH and Actin. Signal intensity ratio of the 3' probe set over the 5' probe set is often referred to as the 3'/5' ratio. This ratio gives an indication of the integrity of your starting RNA, efficiency of first strand cDNA synthesis, and/or in vitro transcription of cRNA. The signal of each probe set reflects the sequence of the probes and their hybridization properties. A 1:1 molar ratio of the 3' to 5' transcript regions will not necessarily give a signal ratio of 1.” http://www.affymetrix.com/support/help/faqs/ge_assays/faq_17.jsp
Affymetrix On Tiling Controls-II “What 3'/5' ratio for control genes, for example GAPDH and Actin, should I anticipate to obtain on GeneChip probe arrays? • There is no single threshold cutoff to assess sample quality for all of the diverse organisms and tissues. This is due to the presence of different isoforms of these house-keeping genes and their different expression patterns in various tissues and organisms. Although we routinely refer to a threshold ratio of less than 3 for the most common tissues, such as mammalian liver and brain, this may not be applicable to all situations. It may be more appropriate to document the 3'/5' ratios within a particular study and flag the results that deviate, therefore representing an unusual sample that deserves further investigation.” http://www.affymetrix.com/support/help/faqs/ge_assays/faq_17.jsp
RNA digestion plot: shows strong dependency on chip design – identical biological probes HG_U95Av2 HG_U133A: 10/11 probes from U95Av2 used
Distance from 3’ end vs log2 spot intensity(David Erle, UCSF) purple aRNA blue cDNA yellow pred aRNA lt green aRNA- pred aRNA
Is comparing 70mers at various distances from different genes fair? Average Log(2) Unigene Cluster Size (mRNAs+ESTs) Representative mRNA size per UG Cluster
Our tiling experiment • Tiled 16 long mouse mRNAs, known to be expressed at high levels in lung by Operon arrays and Affymetrix arrays (picked by David Erle) • Best 70mer picked for each exon (Ash Alizadeh/Chris Seidel) • Added 10 “housekeeping” mouse mRNAs supplemented without attention to exon structure • 3 Random sequences, 4 empty wells
Microarray Versions • Tiling 1: 3 plates printed on PLL, Schott AS, Corning UltraGAPS, 6 spots per well: • Mm tiling plate, 300 uM (accidental) • Hs tester plate from Illumina (unmodified 70mers at 50, 40, and 20 uM) • Hs tester plate from Qiagen/Operon (c-6 amino modified 69mers at 50, 40, and 20 uM) • Tiling 2: 9 plates printed on PLL, Schott AS, Corning UltraGAPS, 6 spots per well (3 tandem replicates, each plate printed 2x): • Largely same as tiling 1, added Version 2 of amino-mod from Illumina for tiling plate and MJDC • corrected 300 uM Mm tiling to 50 and 25 uM, did this for both amino mod versions (V1 and V2) • Also printed 192 MJ 70mers at 25 and 50 uM, V1 and V2 • Tiling 3 (in progress): • Increased density, sharper tips • No human spots • 4 spot replicates per print-plate well • Titrated down to 2uM (2-50 uM)
Mike Hagen: Ti1n014 Hagen heart vs lung prep 1 (rescanned 2 days later to be subsat)
Kate Rubins:Ti2n039 Falkow Spleen Total vs. Stratagene Reference
Kate Rubins: Ti2n040 Stratagene Total RNA Spleen vs. Stratagene Total RNA Brain
Nicki Chin: Ti2n095 Stratagene Ref vs Stratagene Ref ; PLL; 42C, 30% formamide, 5XSSC
Nicki Chin: Ti2n073 Stratagene Ref vs Stratagene Ref ; AS; 42C, 30% formamide, 5XSSC
Part II-Oligo Array Conditions • Oligos produce bigger spots with older SFGF tips (~120 uM median spot size); spotting size seems independent of substrate and printing concentration • cDNA protocols do NOT directly apply to oligo arrays • Boiling after SA/NMP causes catastrophic failure • Not shampooing causes faint glow to many spots, which appears non-specific • Oligos printed on cDNA arrays seem to confound cDNA measurements when arrays treated w/ conditions optimized for cDNAs (Kate Rubins; LC + smallpox ORF 70mers) • Suspect oligos cross-contaminating each other (‘DNA jumping’) during post-processing without shampoo
Regular Packing Dense Packing r r x r r/2 x2 + (r/2)2 = r2 x = 0.866 r Thus, when you dense pack, you reduce one dimension by 13.4% at the expense of increasing the other dimension by 1/(2Nc)% where Nc is the number of columns. This increase in width is neglegible in the limit of large Nc, so in the limit of large Nr and Nc the gain in packing is 13.4%.
Effect of printing concentration on intensity (by vendor) on Ti1 PLL array Sybr Green (SFGF)random 9mer (UCSF Sandler Core) Sybr Green Stain Nonamer Hyb
Effect of printing concentration on hybridization intensity (Nonamers on AS – Ti2 array)
Effect of Illumina’s V1 vs V2 amino modification on real hybridization Ti2010 (Mike Hagen)
Effect of Illumina’s V1 vs V2 amino modification on real hybridization Ti2010 (PLL, Mike Hagen)
“Cy3 and Cy5 average intensity values for 96 oligos printed in duplicate at the concentrations indicated. Median background for Cy3 and Cy5 was 71 and 47 respectively. So I printed Operon at 4 diff concentrations, and illumina at 2 diff concentrations (several replicates of each spot), and just took the averages of each group.”
Effect of printing concentration on random 9-mer signal on PLL (left) and AS (right) using Tiling 3 array (Andrea Barczak, UCSF) PLL_9mer.0116a.gpr Schott_AS_9mer.0026.gpr
Post-Processing BSA instead of alkylation is commonly preferred for AS, and some PLL users use BSA too, though no analysis has been published
42 deg (30% formamide; Corning Protocol) vs 65 deg (Ash/Max protocol) hyb (Nicki Chin, SFGF MM QC cDNA arrays)
Array CGH of female vs male genomic DNA (Jon Pollack; Young Kim)UCSF Protocol From Andrea Barczak T test: UCSF vs sheg: 8.77 x 10-8
UCSF Good v Bad (-35 cutoff) : T test: 0.147
pick70_oligos +------------------+--------------+------+-----+---------+-------+ | Field | Type | Null | Key | Default | Extra | +------------------+--------------+------+-----+---------+-------+ | oligo_name | varchar(25) | | PRI | | | | pick70_name | varchar(120) | | | | | | sequence | varchar(80) | | | | | | percent_gc | float | YES | | NULL | | | int_repeat | smallint(6) | YES | | NULL | | | self_anneal | smallint(6) | YES | | NULL | | | pick70_target_id | varchar(120) | YES | | NULL | | | energy | float | YES | | NULL | | +------------------+--------------+------+-----+---------+-------+ pick70_secondary +------------------+--------------+------+-----+---------+-------+ | Field | Type | Null | Key | Default | Extra | +------------------+--------------+------+-----+---------+-------+ | oligo_name | varchar(20) | | MUL | | | | secondary_target | varchar(20) | | | | | | position | int(11) | YES | | NULL | | | energy | float | | | 0 | | | sec_target_seq | varchar(150) | YES | | NULL | | +------------------+--------------+------+-----+---------+-------+ oligo2target +------------+-------------+------+-----+---------+-------+ | Field | Type | Null | Key | Default | Extra | +------------+-------------+------+-----+---------+-------+ | oligo_name | varchar(25) | | PRI | | | | target_id | varchar(20) | | MUL | | | | id_source | varchar(20) | YES | | NULL | | +------------+-------------+------+-----+---------+-------+ mousdb_islands +-------------+-----------------------+------+-----+---------+-------+ | Field | Type | Null | Key | Default | Extra | +-------------+-----------------------+------+-----+---------+-------+ | c_id_ex_i | varchar(20) | | PRI | | | | c_id_ex | varchar(15) | | MUL | | | | c_id | varchar(15) | | MUL | | | | chr | varchar(5) | YES | | NULL | | | strand | char(1) | YES | | NULL | | | left_bound | int(10) unsigned | YES | | NULL | | | right_bound | int(10) unsigned | YES | | NULL | | | 5_marg | mediumint(8) unsigned | YES | | NULL | | | 3_marg | mediumint(8) unsigned | YES | | NULL | | | freq_full | smallint(5) unsigned | YES | | NULL | | | full | smallint(5) unsigned | YES | | NULL | | | freq_all_t | smallint(5) unsigned | YES | | NULL | | | all_t | smallint(5) unsigned | YES | | NULL | | | n_min | smallint(5) unsigned | YES | | NULL | | | n_max | smallint(5) unsigned | YES | | NULL | | | defline | varchar(120) | YES | | NULL | | | sequence | text | YES | | NULL | | | ex_type | varchar(5) | YES | | NULL | | | refseq | tinyint(1) | YES | | NULL | | +-------------+-----------------------+------+-----+---------+-------+ mousdb_exons+-------------+-----------------------+------+-----+---------+-------+ | Field | Type | Null | Key | Default | Extra | +-------------+-----------------------+------+-----+---------+-------+ | c_id_ex | varchar(15) | | PRI | | | | c_id | varchar(15) | | MUL | | | | chr | varchar(5) | YES | | NULL | | | strand | char(1) | YES | | NULL | | | left_bound | int(10) unsigned | YES | | NULL | | | right_bound | int(10) unsigned | YES | | NULL | | | 5_marg | mediumint(8) unsigned | YES | | NULL | | | 3_marg | mediumint(8) unsigned | YES | | NULL | | | freq_full | smallint(5) unsigned | YES | | NULL | | | full | smallint(5) unsigned | YES | | NULL | | | freq_all_t | smallint(5) unsigned | YES | | NULL | | | all_t | smallint(5) unsigned | YES | | NULL | | | defline | varchar(120) | YES | | NULL | | | sequence | text | YES | | NULL | | | ex_type | varchar(5) | YES | | NULL | | | refseq | tinyint(1) | YES | | NULL | | +-------------+-----------------------+------+-----+---------+-------+
Splicing examples • Hs SORBS1 hg16dev chr10.838 -> 12 skips
Rules for picking a 70mer from exons for a gene • 70mer penalty score = (# of 2ndary hits) + (sum of 3*2ndary energies/-30) + 3*(distance from 3' end)/1500 + evidence score • where evidence score = 20 - 20 * (eT-e.min/e.max-e.min) • eT is the denominator in the last field within the defline and represents the total evidence available for that exon, • e.min represents the minimum denominator seen among the exons, and • e.max represents the max denominator seen among the exons. • The 3 in the second and third terms is an arbitrary constant, and 20 in the 3rd term is just an arbitrary constant. The idea is to give 20 points to the exon with the least amount of evidence and give 0 points to the exon with the most evidence. • After the scores are calculated for each exon, we pick the one with the lowest penalty score.
What is the average ratio of # transcripts for a given exon relative to the exon with maximal evidence, starting from the last exon (3’ most exon is farthest left)