640 likes | 1.31k Views
Transcriptome analysis using Open Reading frame ESTs (ORESTES). Emmanuel Dias Neto, PhD Lab of Neurosciences, LIM-27 Instituto de Psiquiatria Faculdade de Medicina Universidade de Sao Paulo, SP - BRAZIL. UNESCO - First North-South Human Genome Conference. Caxambú, MG, Brazil - 1993
E N D
Transcriptome analysis using Open Reading frame ESTs (ORESTES) Emmanuel Dias Neto, PhD Lab of Neurosciences, LIM-27 Instituto de Psiquiatria Faculdade de Medicina Universidade de Sao Paulo, SP - BRAZIL
UNESCO - First North-South Human Genome Conference • Caxambú, MG, Brazil - 1993 • Is there a way to integrate the research performed in developing countries with the US/Europe ‘Human Genome Project’ ? • After the completion of the ‘Human Genome sequencing’, how can we gain access or make use of the technology developed ?
How can we learn ? • Initiate an EST sequencing project of a parasite of local importance (Schistosoma mansoni) • cDNA libraries prepared with Marcelo Bento Soares • cDNA sequencing performed at TIGR (Craig Venter) • Some 1,000 ESTs generated
4 Kb 5’ 3’ 500 nt Open reading frame (ORF) 500 nt ESTs “Expressed Sequence Tags” Partial sequences, usually derived from the ends of cDNA molecules.
Oligo dT primers cDNA Adaptadors
vector Insert ~3kb vector Sequencing Primers ESTs (HGP)
Main problems found • - Repetitive sequencing of highly expressed genes : high redundancy (~60%) • - Necessity of large amounts of mRNA in order to obtain a normalized library • - Reduced information of no matches
Gene expression in a typical eukaryotic cell Diversity <10 500 11.000 Class Abundant Intermediate Rare Abundance/gene 12,000 300 10 Huang et al., 1999
Alternative protocol to generate ESTs • Is there a way to tag rare genes ? • How to generate data from small amounts of mRNA ? • Is it possible to tag the central portion of the transcripts ?
Ideas • The use of a PCR-based strategy, should enable the analysis of small amounts of mRNA. • Using randomly selected primers (in RT-PCR) at low stringency as a means to evaluate other regions of the transcripts...
ORESTES Randomly selected primers
Factors that contribute for the presence of a gene in a cDNA library Nucleotide diversity Abundance ORESTES Usual cDNA libraries
ORESTES - the data normalization
Covering a transcript with ORESTES • The amplification of a gene region requires primer binding at both sides of a point. • The chance of a primer binding, depends on the size of the sequences flanking the amplification point. • If the size of a transcript is taken as 1, and the distance of the 3’ end is taken as S: • The probability (P) of an appropriate amplification of a point is • P = S(1-S) • Coverage of the central point = 0.5(1-0.5) = 0.5x0.5 = 0.25 = 25% • Coverage of the last 10% of a transcript = 0.1x0.9 = 0.09 = 9%
ORESTES- sequence distribution
ORESTES - the data Comparison with dbest data
P P P P P P P P P P P P P P P P P P P P P P P P P P P Project Organisation P P P P P UNICAMP P P P P P P P P P P P P Sequencing Center IQ-USP FM-USP/RP Sequencing Center Sequencing Center Coordination LICR P P P Sequencing Center Sequencing Center FM-USP EPM P P P P P P P P P P
Project Organisation Dept. of Pathology Hospital A.C. Camargo RNA coordination LICR/SP Library coordination LICR/SP Dissected tissue samples Preparation and validation of all mRNAs to be used • cDNA synthesis • and amplification • ORESTES production and development • ORESTES sequencing
Fernando Costa (CM) S é rgio Verjovski (QV) P P Christine Hackel Arthur Gruber P P Helaine Carrer / Dirce Carraro Mari Cleide Sogayar P P Ma F á tima Sonati Edna Kimura P P Gon ç alo G. Pereira Hamza FA El - Dorry P P Maria Aparecida Nagai (MR) Marco Ant ô nio Zago (RC) P P Angelita Gama Enilza Espe á frico P P Daniel Gianella Neto Gustavo H Goldman P P Suely KN Marie Ma Lu í sa Pa çó - Larson P P Elizabeth Martins Paulo L. Hoo Vanderlei Rodrigues P P Eloiza Tajara P Marcelo Briones (PM) Sandro Valentini P P Rui MB Maciel P Luis Eduardo Andrade P Ismael DG Silva P Jo ã o Bosco Pesquero P Maria In ê s Pardini (IL2) Marina N ó brega (IL3) P P S í lvia Rogatto (IL5) P
Using ORESTES to help to define the complete set of genes expressed in different human tissues/tumours
Generation of Colon ESTs HCGP X CGAP = 2,1x more sequences
Generation of Stomach ESTs HCGP X CGAP = 2,5x more sequences
Generation of Breast ESTs HCGP X CGAP = 9,1x more sequences
Generation of Head and Neck ESTs HCGP X CGAP = 34,4x more sequences
Next challenge Data Information
Transcriptional level Tumor Suppressor genes
Looking for putative tumour suppressor genes - Clusters composed of sequences exclusively derived from normal samples - Clusters mapping to genomic regions of frequent Loss (LOH) in H&N tumours Total = 78 clusters
Transcriptional level Oncogenes
Looking for putative oncogenes - Clusters composed of sequences exclusively derived from tumour samples - Clusters mapping to genomic regions frequently amplified in H&N tumours Total = 271 clusters
A B C D
Homo sapiens RAB1, member RAS oncogene family (RAB1), mRNA HSD00365 - TCGTTATGCCAGTGAAAATGTCAACAAATTGTTGGTAGGGAACAAATGTGA RC5-BT0377-030200-012-A06 - .........................a......................... PM2-BT0723-090201-010-c07 - .........................c......................... PM2-BT0723-130900-002-c07 - .........................c......................... MR3-GN0190-301100-004-e08 - .........................c......................... MR4-ET0140-220101-004-d02 - .........................c......................... MR4-EN0075-220101-006-d02 - .........................c......................... IL2-FT0160-070800-121-C02 - .........................a......................... MR4-ET0140-190201-007-h04 - .........................a......................... MR0-RT0037-121200-004-d02 - .........................a......................... CM1-HN0016-161100-568-c06 - .........................a......................... QV3-BN0046-150300-121-a12 - .........................a......................... QV3-DT0045-210100-063-f03 - .........................a......................... QV2-NN0045-220800-323-d03 - .........................a......................... IL5-UM0067-240300-051-g06 - ..............…..........a......................... CM4-HN0021-241100-457-h02 - .........................a......................... MR0-RT0037-011200-002-a07 - .........................a......................... MR2-UM0060-030400-103-g02 - .........................a..............g.......... PM0-IT0018-091100-001-e02 - .........................a......................... PM1-MT0143-101100-003-a06 - .......*.................a......................... PM1-MT0143-101100-003-f11 - .........................a......................... Type Non-Synonymous Codon aaa-caa Nucleotide A-C Aminoacid K(lysine)-Q(glutanine)
"You have made your way from worm to man but much within you is still worm"(Friedrich Nietzche, Zarathustra's Prologue)
S. japonicum 43,707 ESTs 28,839 adult worms 14,868 eggs