10 likes | 133 Views
Identification of new expressed sequence tags covering putative splicing regions in Drosophila melanogaster .
E N D
Identification of new expressed sequence tags covering putative splicing regionsin Drosophila melanogaster. Marco Aurélio Valtas Cunha1, Valeria Valente1, Josane de Freitas Sousa1, Nadia Monesi2,Rafaela Martins Maia1, Daniela Dover de Araujo1, Wilson Araujo da Silva Jr3, Marco Antonio Zago3, Waleska K. Martins4, Luis F. L. Reis4, Emmanuel Dias Neto4, Sandro José de Souza4, Andrew John George Simpson4, Ricardo Guelerman Pinheiro Ramos1, Enilza Maria Espreafico1, Maria Luisa Paçó-Larson1 1 Departamento de Biologia Celular, Molecular e de Bioagentes Patogênicos, Faculdade de Medicina de Ribeirão Preto, Universidade de São Paulo, Ribeirão Preto 14049-900, SP, Brazil, 2Departamento de Análises Clínicas, Toxicológicas e Bromológicas, Faculdade de Ciências Farmacêuticas de Ribeirão, Universidade de São Paulo, CEP 14040-903, Ribeirão Preto, Brasil, 3 Departamento de Clínica Médica, Faculdade de Medicina de Ribeirão Preto, Universidade de São Paulo, Ribeirão Preto 3900 14049-900, SP, Brazil, 4 Ludwig Institute for Cancer Research, Rua Professor Antonio Prudente, 109, São Paulo 01509-010, SP, Brazil. A B C D E ORESTES Genomic Scaffolds F split against genomic Full-length cDNAs do not match full-lenght cDNA split against cDNA G dbEST dbEST H newly validated new isoform of characterized cDNA new isoform of non-characterized genes Figure 2. Genomic sequences flanking the ORESTES alignment regions were used as queries in BLAST searches against a set of Drosophila protein-coding sequences compiled by Ashburner et al. (http: //www.fruitfly.org/sequence/download.htm/), Drosophila ESTs present in the dbEST database, ORESTES database and the predicted transcripts compiled in the GadFly (release 2) database. genomic sequence, ESTs from dbEST, ORESTES , predicted transcripts. Annotated functions of the predicted genes are: (A) actin biding, (B) cell adhesion, (C) enzyme, (D) transporter, (E) unknown, (F) polynucleotide adenylyltransferase, (G) phosphatase. (H) shows an unnanotated region. Abstract One important lesson learned from projects on complex genomes is that the definition of genes from genome sequences alone using computational predictions is subject to serious limitations. For instance, untranslated regions are refractory to this kind of analysis and intron/exon structures are poorly predicted by these means. Therefore, the immediate challenge in the post-genomic era is to identify transcribed regions in the genome. In order to generate sequence information on D.melanogaster transcriptome, we have generated over 10.000 ESTs, using the Open Reading Frame Expressed Sequence Tags (ORESTES) methodology. Expressed sequence tag (EST) analysis is an important tool for identifying transcription units but is also subject to errors. EST information can be misleading if, for example, it derives from immature mRNA, genomic contamination or spurious transcription of intergenic regions. The presence of a splice site upon alignment with the genomic sequence is a criterion that increases the EST data confidence. We have used this criterion to analyze the D. melanogaster ORESTES databank. Of 9,157 ORESTES (after excluding sequences derived from rRNA, mtDNA and yeast DNA), 2,702 have shown interrupted alignment with the genomic sequence. To identify among these the ones not yet represented in public databases, the 2,702 ORESTES were searched by BLAST against a set of Drosophila full-length cDNA sequences and ESTs present in the NCBI databanks. Using a Pearl script that we developed, complemented by visual inspection of the alignments, we have identified 88 ORESTES representing new expressed sequences in Drosophila. Among those, two are possibly derived from new isoforms of characterized genes (coding for calreticulin and elongation factor-2); seven match existing ESTs, but show a different splicing pattern and therefore could be derived from different isoforms of uncharacterized genes. The genomic DNA surrounding these novel ORESTES are being analyzed by examining the sequence alignments with Drosophila full-length cDNAs or ESTs present in the NCBI databanks, and BDGP/CG GadFly predicted genes, in an attempt to define in silico new transcription units. This information will lay the basis for the experimental validation of new transcripts. Financial Support: Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP) and Ludwig Institute for Cancer Research. Representative diagrams of sequence alignments between expressed and genomic sequences flanking novel splicing sites identified by ORESTES Analysis flowchart of the ORESTES which split upon alignment with the Drosophilagenomic sequence Figure 1. The ORESTES were used as queries in BLASTs against the genomic “scaffold” sequences, full-length cDNAs and dbEST databases. This analysis resulted in the identification of 88 (non-redundant) new expressed sequence tags covering putative splicing regions: 2 corresponding to new isoforms of characterized genes, 7 corresponding to new isoforms of predicted but non-characterized genes and 79 validating the gene region for the first time.