1 / 1

U 3’ row 2

The Specific Primer & Amplicon Design Software ( SPADS ) select s specific regions within genes and design s primer pairs picked to amplify such regions ( Figure 1B; Thareau et al, 2001). The procedure is summarised in the four following steps:

jerry
Download Presentation

U 3’ row 2

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. The Specific Primer & Amplicon Design Software (SPADS) selects specific regions within genes and designs primer pairs picked to amplify such regions (Figure 1B; Thareau et al, 2001). The procedure is summarised in the fourfollowing steps: Our GST design is based on expressed sequences (EST or cDNA) or on coding regions predicted by EuGène (i.e. excluding UTR not represented in EST or cDNA). The GST lengths range between 150 and 500 bp which is sufficient to yield reproducible microarray signal for transcriptome analysis (Figure 3). Because of the inherent duplicated nature of the Arabidopsis genome, not all genes will be represented by perfect GSTs. Rejecting candidate sequences that show over 70% identity with another sequence in the Arabidopsis nuclear genome, our process has identified so far a GST for 21,420 (72.0 %) genes out of 29,775 identified on all 5 chromosomes (Figure 2). • Search for the most specific region within each gene. Each exon is tested with BLASTn against the whole genome sequence and segments with hits are removed. Primer pairs are designed in the remaining regions. If none are detected, the mismatch parameter of BLASTn is decreased and only segments with stringent hits are substracted, thus enlarging the specific remaining regions for primer design. • Primer design. The specific regions are used as input for the Primer3 software. • Selection of specific primer pairs. Oligonucleotides designed by Primer3 are tested for specificity with BLASTn against 2 Mb segment containing the gene and are excluded if matches indicate potential unwanted PCR amplification. • Analysis of amplicon specificity. Each successive amplicon is tested with BLASTn to determine its specificity. If the identity with putative paralogous sequence is over 70%, the amplicon is removed and the next one is processed. GST are searched from 3’ to 5' until one is found. SplicePredictor NetGene2 center 3’ 5’ Netstart A B CDS UTR ATG stop gene sequence exon coordinates 5115 (24%) 3267 (16%) 12701 (60%) SPADS GST genomic fragment genes EuGène GST specificity primer specificity RepeatMasker Blastn Blastn Blastx Blastn Primer3 Arabidopsis nuclear genome template PCR AtRepBase SP & PIR cDNA & EST 24 x 16 Table 1. Assessment of EuGène prediction results correct partial missing missing missing actual gene gene split missing actual missing exons central exons wrong genes models models genes genes exons exons in 5' exons in 3' exons Plant- Gene 238 182 (76%) 50 (21%) 5 (2%) 1 (0.5%) 1639 51 (3%) 33 (2%) 12 (0.7%) 6 (0.4%) 1 (0.06%) Araset 51 37 (67%) 14 (27%) 0 0 254 15 (6%) 8 (3%) 5 (2%) 2 (0.8 %) 1 (0.4%) First round PCR with specific primer pairs Second round PCR with universal primer pairs U 5 ’ U 3’ row 2 S 5 ’ S 5 ’ U 5 ’ Primary amplicon Genomic BAC DNA U 3 ’ S 3 ’ U 3 ’ U 5’ col. 4 Figure 5. Two-step GST amplification A. Hybridization (Cy5 cDNA) B. Signal according to length 100000 FLOWER BUD 10000 LEAF FLOWER BUD 1000 signal 100 10 predicted gene GST known gene GST intergenic region GST References ROOT PLANTLET highly expressed cDNA negative control 1 The Arabidopsis Genome Initiative (2000) Analysis of the genome sequence of the flowering plant Arabidopsis thaliana. Nature 408, 796 - 815. Pavy N, Rombauts S, Déhais P, Mathé C, Ramana DV, Leroy P and Rouzé P (1999) Evaluation of gene prediction software using a genomic data set: application toArabidopsis thaliana sequences. Bioinformatics 15, 887-899. Schiex T, Moisan A and Rouzé P (2001) EuGène: an eukaryotic gene finder that combines several sources of evidence. In “JOBIM 2000, LNCS 2066”,O. Gascuel,M.F. Sagot (Eds.), pp 111-125. Thareau V, Déhais P, Rouzé P and Aubourg S. (2001) Automatic design of gene specific tags for transcriptome studies. Proc. of JOBIM'2001 (Journées Ouvertes Biologie Informatique Mathématiques). Toulouse, France. 100 600 1100 1600 2100 probe length (bp) Figure 3. Transcription profiling with a test set of GSTs The Complete Arabidopsis Transcriptome MicroArray (CATMA) Project P. Hilson, T. Altmann, S. Aubourg, J. Beynon, F. Bitton, M. Caboche, M. Crowe, P. Dehais, H. Eickhoff, E. Kuhn, S. May, W. Nietfeld, J. Paz-Ares, W. Rensink, P. Reymond, P. Rouzé, U. Schneider, C. Serizet, A. Tabrett, V. Thareau, M. Trick, G. van den Ackerveken, P. Van Hummelen, P. Weisbeek, M. Zabeau http://jic-bioinfo.bbsrc.ac.uk/CATMA/ 2. Automated design of GSTs Introduction Most cDNA clones included in DNA arrays are identified by an EST covering only a portion of their length. The complete clone sequence is generally unknown and is not selected to yield hybridisation results specific to a single gene. ESTs only represent about half the genes identified in model eucaryote genomes. To bypass these shortcomings, we are constructing a collection of high quality Gene Specific Tags (GSTs) representing most Arabidopsis genes for use in microarray transcriptome analyses and in other functional genomic approaches. 1. Gene structural annotation The identification of each gene in the Arabidopsis genome is at the root of any genome-wide effort to study their expression. Since the structure of only a minority of Arabidopsis genes has been determined experimentally so far, annotation still relies on gene prediction to identify the boundaries of transcription units and of the exon(s) within it (The AGI Consortium, 2000). Using the AGI nuclear genome, we have generated an updated structural annotation of all 5 Arabidopsis chromosomes. The annotation process has been automated. It uses the EuGène software (Schiex et al, 2001) with a unique set of parameters and algorithms applied to all chromosome regions (Figure 1A). Its prediction quality has been tested by matching results against a set of experimentally defined full length cDNA as described by Rouzé and collaborators (Pavy et al., 1999). Quality assessment parameters for chromosome 2 annotation are shown in Table 1. EuGène identifies 29,804 genes in the Arabidopsis nuclear genome, which is higher than the 25,470 identified by the AGI (Figure 2). The detailed comparative analysis of the EuGène and AGI annotations is currently underway. Preliminary observations indicate that EuGène’s higher number results from the combination of several factors: EuGène can predict two genes where AGI annotates one,it predicts genes where none is annotated by AGI (3,369) more often than the contrary (1,533), and it seems biased towards overprediction in pericentromeric regions rich in repeated sequences. A. Distribution of GST lengths B. Position of GSTs 150-200 bp: 42% 200-300 bp: 36% 300-500 bp: 22% Figure 4. GST characteristics 3. Structure of the GST collection Each primer designed to synthesize a GST carries a gene specific 3’ domain corresponding to the sequence selected by SPADS (18-25 nt) and a 5’ extension (17 nt) added to allow for reamplification of the GSTs with a limited set of universal primers. A set of 40 extensions has been designed so that each sample in a 384-well plate can be amplified witt the unique combination Figure 1. Gene identification and GST selection combination of one row and another column primer, hence avoiding cross-contamination which often plagues the storage and dissemination of large-scale clone collections. The primary amplicons obtained from BAC DNA templates in large excess can be conveniently reamplified and distributed. Also, ampliconproduction using BAC increases the quality of the GSTs and the fraction of successful PCR amplifications by reducing the complexity of the templates (Figure 5). All GSTs are oriented with regard to transcription with column primers at the 5’ end (see above picture). As of 26 September 2001, the Consortium had PCR amplified 16.280 GSTs. Figure 2. Gene density according to the Eugène and AGI annotations Conclusion The project is based on a novel complete unified annotation of the Arabidopsis nuclear genome, generated with our upgraded EuGène software, from which GSTs are selected with SPADS. We are currently studying how best to complement the current GST collection to minimize the presence of non specific probes allowing hybridisation with transcripts from non cognate genes. Given the structure of the GST collection, it can be adapted to a variety of microarray protocols and procedures. It can also serve as a key resource for other large scale functional genomic endeavours based on specific nucleic acid hybridisations, such as systematicArabidopsis RNAi programmes.

More Related