Design of Micro-arrays

Design of Micro-arrays Lecture Topic 6

Experimental design • Proper experimental design is needed to ensure that questions of interest can be answered and that this can be done accurately, given experimental constraints, such as cost of reagents and availability of mRNA.

Design considerations in micro-arrays • There are 2 main components where Designs come in in Micro-arrays: • Probe Design • Allocation of RNA to probes

Array/Probe design • Which gene-representative sequence from which gene collection to print on the array? • Where? • Controls or Not? • Numbers, how many controls, how many genes? - Duplicate or replicate spots within a slide position.

Commonly asked questions Should we put duplicates on a slides. • What should be the percentage of control spots? • Where should the control spots be placed? [These relates to preprocessing such as quality assessment and normalization].

Probe Design As Statisticians we often have VERY little say on the probe designs. The only input may be in location of control spots. However, we may have some input in the allocation of RNA samples to the probes.

Idea behind Experimental Design • It was introduced by Sir Ronald Fisher in the 1920s to deal with systematic sources of variation in agricultural field trials. • The same ideas are true TODAY for Micro-arrays. • Fisher’s idea was divided into 3 main principles: • Randomization • Replication • Local Control or Blocking Lets discuss some terms USED in design.

Terms and Definitions • Treatment/ Condition: any attribute of primary interest • Unit: Independent Replicate that is subject to the treatment • Block: any attribute that is believed to have an influence on the response but NOT of primary interest • Crossing: assigning all possible combinations of factors to units • Confounding: the effect when the effect of one factor cannot be separated from another factor

Designing using Principles • Randomization: a chance device to assign treatments to units, essential to reduce any systematic bias • Replication: including more than one unit per condition, allows us to estimate random variation and is also used for reducing bias. • Local control/blocking: if we believe that there is a systematic source of variation that may affect the response, we should identify this source and randomize within the blocks.

Crossing and Confounding • Crossing: refers to assigning one of all possible combinations to the units. Common in terms of dye-swap, exposing all experimental conditions to both dyes. • Confounding: happens when one factor cannot be told apart from another factor • Example: you have two conditions H and C and 2 slides. You hybridize condition H with Red dye in all slides and condition C with Green dye in all slides. Here you cannot distinguish the effect of Dye from condition. This is called confounding.

The main goal is: Avoidance of bias • Conditions of an experiment; mRNA extraction and processing, the reagents, the operators, the scanners and so on can leave a “global signature” in the resulting expression data. Hence it is essential to follow the principles of proper experimentation to avoid bias.

Replication and related issues What type of replicate is to be used?

Allocation of samples to the slides A Types of Samples • - Replication – technical, biological. • This always needs to be considered in microarrays since in general we often do NOT have biological replication • - Pooled vs individual samples. • - Pooled vs amplification samples.

Biological Replication • The number of organisms from which you have taken the RNA is your biological replicate. If you used 3 mice and obtain RNA from each mice, that is your biological replication. • Biological replication allows us to infer about the general population of interest. • According to McClure and Wit: “the only thing that is good enough to answer a biological questions are the so-called biological replicates”

Technical Replicate • Sometimes it is more convenient to obtain RNA from 3 organisms, put them together and extract the RNA and then divide them up into 3 RNA samples to be hybridized. • This is NOT biological replication, rather this is called technical replicate after pooling. • This is more convenient and has less variability (pooling always decreases variability), but often leads to bias. • Another way is to obtain RNA from one organism and divide the RNA into 3 batches for hybridization. This is an extreme technical replicate.

More on Technical and Biological Replication • Having a particular gene or (EST) repeated on a slide (as in Affy chips) is an example of Technical replication. • This is NOT biological replication since the whole chip is exposed to the experimental condition • However, technical replicates are useful, since they capture the variability due to measurement error, hybridization inequalities across a slide. • The bottom line is: we are interested in the average expression level of a particular gene exposed to a particular condition for a specific biological organism.

How many Replicates? • This is where the theory of optimal design comes in. • Deciding HOW many replicates depends upon the questions you are interested in and the contrasts you want to estimate. • In general a rule of thumb is: “at least 3 arrays per condition” • One thing to keep in mind is that, technical replicates are in general highly reproducible, r = .95, whereas biological replicates from the same condition often have r ~ .30.

Different design layout • - Scientific aim of the experiment. • - Robustness. • - Extensibility. • - Efficiency.

Taking physical limitation or cost into consideration: • - the number of slides. • - the amount of material.

Pooled vs. amplified samples In the cases where we do not have enough material from one biological sample to perform one array (chip) hybridizations. Pooling or Amplification are necessary. • Amplification - Introduces more noise. - Non-linear amplification (??), different genes amplified at different rate. - Able to perform more hybridizations. • Pooling - Less replicates hybridizations.

Pooled vs individual samples • Pooling is seen as “biological averaging”. • Trade off between - Cost of performing a hybridization. - Cost of the mRNA samples. Cost or mRNA samples << Cost per hybridization Pooling can assists reducing the number of hybridization.

To pool or not to pool? • Pooling is routinely done when a single organism doesn’t allow you to have enough RNA for hybridization. So several organisms are combined to get enough RNA. • The alternative to pooling is PCR amplification, where you use PCR techniques to physically amplify the harvested RNA. • The literature has is not uniform in deciding which is better. Affy (GeneChip help notes) suggest that pooling causes too much averaging and sometimes we can average out less significant expressions.

Design of Micro-arrays