370 likes | 531 Views
Constructions and Applications of Alternative Splicing Databases. 逢甲大學. 生物資訊研究中心. speaker: 許芳榮. Outline. Introduction Construction of alternative splicing database Survey of existing solutions Applications . Introduction. RNA Splicing. Alternative Splicing. Definitions
E N D
Constructions and Applications of Alternative Splicing Databases 逢甲大學 生物資訊研究中心 speaker: 許芳榮
Outline • Introduction • Construction of alternative splicing database • Survey of existing solutions • Applications
Alternative Splicing • Definitions Splicing the same pre-mRNA in two or more ways to yield two or more different mRNAs that produce two or more different protein products
The Troponin T (muscle protein) pre-mRNA is alternatively spliced to give rise to 64 different isoforms of the protein Constitutively spliced exons (exons 1-3, 9-15, and 18) Mutually exclusive exons (exons 16 and 17) Alternatively spliced exons (exons 4-8) Exons 4-8 are spliced in every possible way giving rise to 32 different possibilities Exons 16 and 17, which are mutually exclusive, double the possibilities; hence 64 isoforms
EST Genome EST
Exon 1 Exon 2 Exon 3 Exon 4 Intron 1 Intron 2 Intron 3 5’ 3’ EST 1 EST 2 AAA... EST 3 EST 4 EST 5 EST 6 AAA... EST 7
Genome Sequences 3 billion bp dbEST 5 million ESTs alignment Exons, Introns Database Alternative Splicing Gene Discovery SNP
Methods of Alternative Splicing Detection • mRNA – EST alignment (or EST consensus) • Without knowledge of genomic sequence • Genomic sequence to EST alignment • informative
How to cluster ESTs ? • UniGene cluster • Consider the ESTs in the same UniGene cluster • Save time but not informative • Genome template • Genomic sequence to EST alignment • informative but time consuming
The Approaches of EST Clustering • Unigene like approach • Overlapped ESTs are grouped in a cluster as Unigene. • Generating a consensus sequence of each cluster. • Aligning consensus sequences to genome sequence. • Genome template • Cut Human Genome Sequence in 20k base pairs. • Screening in ESTs similarity by BLAST. • Detecting exons by sim4. • Directly alignment
Overlapped ESTs are grouped in a cluster as Unigene. Generating a consensus sequence of each cluster. Aligning consensus sequences to genome sequence. BLAST Unigene like approach consensus sequence genomic seq Candidates of gene location STS gene Report exons
Cut Human Genome Sequence into 20k base pairs. Screening in ESTs similarity by BLAST. Detecting exons by sim4. EST DB genomic template WU-BLAST ESTs with similarity Sim4 exons Genome template
Using UniGene Cluster is not Informative • Many ESTs in different UniGene clusters are aligned to same genome area. • UniGene cluster ID 101131,100437,100738,101182 and 100143 should be grouped together to detect alternative splicing
Avatar: avalue added transtriptome database • Align entire dbEST to genome using PCs
Organism Number. of alternative splicing events 5’ AS 3’ AS Exon skipping mutually exclusive intron retention Homo sapiens 14,989 22,969 11,188 330 7,481 Mus musculus 7,479 13,075 4,850 127 3,493 Rattus norvegicus 531 900 401 4 373 Caenorhabditis elegans 162 28 263 5 174 Drosophila melanogaster 351 117 221 6 221 Arabidopsis thaliana 83 4 77 1 32
Applications • Cross-species analysis • Tissue specific analysis • SNP and alternative splicing • Quantity analysis • Splicing enhancer • Gene prediction through dbEST • SNP finding through dbEST
Tissue distributions of 51 tumor-specific alternative splicing sites
Exon skipping F1 F1 F2 F2 F1 F1 F2 F2 Conserved alternative splicing events (CES events) Non-conserved alternative splicing events (NCES events) If NCES.F1 > K and NCES.F2 == 0
Human SNX3 EST support: 41 ME12713588-1 ME12751459-1 MR12705131-1 91 94 Mouse Snx3 EST support: 90 ME2238811-1 ME2231614-2 Discovering the different constitutive splicing events +
EST frequency >=1 EST frequency >=10
F1 F2 CT 48 0 PSMD13 ME184161-1 MR178998-1 ME184041-1 2 2 TC F1 Psmd13 CT 86 ME582152-1 ME582275-1 ME579264-1 184167,C,T,D,2,2,48,0,0.00452488687782805 184171,T,C,D,2,2,48,0,0.00452488687782805 Human exon GGTGAACCCTTTGTCCCTCGTGGAAATCATTCTTCATGTAGTTAGACAGATGACTG GGTAAACCCTCTGTCCCTGGTAGAAATAATTCTCCATGTGGTTAGACAGATGACCG C T Mouse exon
Finding SNP from dbEST Exon 1 Exon 2 Exon 3 Exon 4 Intron 1 Intron 2 Intron 3 5’ 3’ EST 1 EST 2 AAA... EST 3 EST 4 EST 5 EST 6 AAA... EST 7
EST to genome alignment with profile Exon 1 Exon 2 Exon 3 Exon 4 Intron 1 Intron 2 Intron 3 5’ 3’ EST 3 EST 4 EST 5 EST 6 AAA... EST 7
Finding gene from dbEST Exon 1 Exon 2 Exon 3 Exon 4 Intron 1 Intron 2 Intron 3 5’ 3’ EST 1 EST 2 AAA... EST 3 EST 4 EST 5 EST 6 AAA... EST 7
Transciptome Genomics • Where What Why How