280 likes | 494 Views
Outline for today. Lec 04. Gene Prediction: What are genes ? Where are genes? Why do we care about a definition? Prokaryotic vs. eukaryotic gene models Introns/exons Splicing Alternative splicing Genes-in-genes Genes-ad-genes Multi-subunit proteins Gene identification
E N D
Outline for today Lec 04 • Gene Prediction: What aregenes? Where are genes? Why do we care about a definition? • Prokaryotic vs. eukaryotic gene models • Introns/exons • Splicing • Alternative splicing • Genes-in-genes • Genes-ad-genes • Multi-subunit proteins • Gene identification • Homology-based gene prediction • Similarity Searches (e.g. BLAST, BLAT) • Genome Browsers • RNA evidence (ESTs) • Ab initio gene prediction • Gene prediction programs: prokaryotes, eukaryotes • Promoter prediction • PolyA-signal prediction • Splice site, start/stop-codon predictions Slide 105
How to we get from here … to here, Lec 04 Slide 106
Again, what is gene? Lec 04 • “At the end of the day everybody uses the gene definition that is most appropriate for his work.” http://scienceblogs.com/digitalbio/2007/01/what_is_a_gene_my_definition_i.php • Modern Genetic Analysis (by Griffiths, Lewontin, Miller, and Gelbart): A gene is an operational region of the chromosomal DNA, part of which can be transcribed into a functional RNA at the correct time and place during development. Thus, the gene is composed of the transcribed region and adjacent regulatory regions. • Further definition: Gene is asegment of DNA that is involved in producing a polypeptide chain; it can include regions preceding and following the coding DNA as well as introns between the exons. • A gene is a heritable string of nucleotides that can be transcribed, creating a molecule with biological activity. Slide 107
Why do we care about a definition? Lec 04 • The value of gene/genome sequences lies in their annotation. • Annotation: characterizing genomic features using computational and experimental methods. Four levels of genes annotation. • Gene Prediction – Where are genes? • What do they look like? • Domains – What do the proteins do? • Role – What pathway(s) involved in? • Once again, what is gene? Heritable sequence of nucleotides that can be transcribed, creating a molecule (product) with biological activity. • Proteins • Functional RNA molecules • mRNA (messenger RNA) • rRNA (ribosomal RNA) • tRNA (transfer RNA) • RNAi (interfering RNA) • miRNA (microRNA) • snRNA (small nuclear) • snoRNA (small nucleolar) • scaRNAs (small Cajal body-specific RNA) • siRNA (small interfering RNA ) • srpRNA (signal recognition particle RNA) • piRNA (Piwi-interacting RNA) • gRNA (guide RNA) HOMEWORK Please Explain Their Role! Slide 108
Lec 04 Prokaryotic gene model: ORF-genes • “Small” genomes, high gene density • Operons • One transcript, many genes • No introns • One gene, one protein • Open reading frames • One ORF per gene • ORFs begin with start, end with stop codon Slide 109
Lec 04 Finding genes in prokaryotes • Many more ORFs than genes • In E. coli one finds 6500 ORFs while there are 4290 genes. • In random DNA, one stop codon every 64/3=21 codons on average. • Average protein is ~300 codons long (search long ORFs). • Problems: • Short genes • Overlapping long ORFs on opposite strands http://www.clcbio.com/index.php?id=92 http://www.ncbi.nlm.nih.gov/projects/gorf/ Slide 110
Lec 04 Human and Yeast codon usage Slide 111
Lec 04 Using Codon Frequencies/Usage • Assume each codon is independent • For codon abc calculate frequency f(abc) in coding region • Given coding sequence a1b1c1,…, an+1bn+1cn+1 • Calculate • The probability that the ith reading frame is the coding region: Slide 112
Lec 04 G + C content • G + C content (“isochore”) has strong effect on gene density, gene length etc. • Gene density in G + C rich regions is 5 times higher than moderate G+C regions and 10 times higher than rich A+T regions. Slide 113
A gene includes the nucleotides encoding information Lec 03 • Coding sequence(CDS) • Sequences for controlling synthesis • Enhancer sites • Repressor sites • Promoters • Polyadenylation site • Splice sites • Transcriptional termination site • Ribosome binding site Slide 114
Lec 04 Signals delimit gene features • Coding segments (CDS’s) of genes are delimited by four types of signals: start codons (ATG in eukaryotes), stop codons (usually TAG, TGA, or TAA), donor sites (usually GT), and acceptor sites (AG): Slide 115
Lec 04 Conserved sequences define introns Pre-mRNA Donor site Acceptor site • The ends of nuclear introns are defined by GT-AG rule. • 5’splice site (donor site): the exon-intron boundary at the 5’ end of the intron. • 3’ splice site (acceptor site): the exon-intron boundary at the 3’ end of the intron. • Branch point site: an “A” close to the 3’ end of the intron, which is followed by a polypyrimidine tract (Py tract). Slide 116
Lec 04 Donor sequences: 5’ splice site exon intron %A 30 40 64 9 0 0 62 68 9 17 39 24 %U 20 7 13 12 0 100 6 12 5 63 22 26 %C 30 43 12 6 0 0 2 9 2 12 21 29 %G 19 9 12 73 100 0 29 12 84 9 18 20 A GGU A A G U Acceptor sequences: 3’ splice site intron exon %A 15 10 10 15 6 15 11 19 12 3 10 25 4 100 0 22 17 %U 51 44 50 53 60 49 49 45 45 57 58 29 31 0 0 8 37 %C 19 25 31 21 24 30 33 28 36 36 28 22 65 0 0 18 22 %G 15 21 10 10 10 6 7 9 7 7 5 24 1 0 100 52 25 Y Y Y Y Y Y Y Y Y Y Y N Y AGG Polypyrimidine track (Y = U or C; N = any nucleotide) Frequency of bases in each position of the splice sites Slide 117
Lec 04 Classes of introns Slide 118
Lec 04 Self-splicing introns reveal that RNA can catalyze RNA splicing • Self-splicing introns: the intron itself folds into a specific conformationwithin the precursor RNA and catalyzes the chemistry of its own release and the exon ligation. • Practical definition: the introns that can remove themselves from pre-RNAs in the test tube in the absence of any proteins or other RNAs. • There are two classes of self-splicing introns, group I and group II self-splicing introns. Slide 119
Lec 04 Group I introns signals delimit gene features • Group I introns are distributed in bacteria, lower eukaryotes and higher plants. • However, their occurrence in bacteria seems to be more irregular than in lower eukaryotes, and they have become prevalent in higher plants. • The genes that group I introns interrupt differ significantly: they interrupt mRNA, rRNA, and tRNA genes in bacterial genomes, as well as in mitochondrial and chloroplast genomes of lower eukaryotes, but only invade rRNA genes in the nuclear genome of lower eukaryotes. • In higher plants, these introns seem to be restricted to a few tRNA and mRNA genes of the chloroplasts and mitochondria. • A small number of group I introns are also found to encode a class of proteins called maturases that facilitate the intron splicing. Slide 120
Lec 04 Group I introns splicing • The process is initiated by the OH group of a free guanosine nucleoside phosphate to attack 5' junction (phosphodiester bond) of the exon-intron, resulting in a free 3'-OH group at the upstream exon. • The OH of the 5’ exon attacks the phosphoryl group at the 3’ splice site. As a consequence, the 5’ and 3’ exons are joined and the intron is liberated. Slide 121
Lec 04 Group II introns The splicing reaction proceeds in two steps • Group II self-splicing introns are found in mitochondria and chloroplast pre-mRNAs. • Step 1. Cleavage at the 5’ splice site and joining of the 5’ end of the intron to the branch point “A” within the intron, producing a lariat-like intermediate. • Step 2. Cleavage at the 3’ splice site and simultaneous ligation of the exons, resulting in excision of the intron as a lariat-like s tructure. • Two transesterification reactions: (1) The 5’ P of the intron is attacked by the 2’-OH of the branch site Adenosine, causing cleavage of a 3’, 5’-phosphodiester bond and formation of a 2, 5’-phosphodiester bond (not hydrolysis followed by ligation). (2) The newly formed 3’-OH of exon 1 attacks the 5’ P of exon 2, causing cleavage of a phosphodiester bond and formation of a new bond. Slide 122
Lec 04 Model of spliceosome-mediated splicing Type III introns • Five snRNPs (small nuclear ribonucleoprotein particles) U1, U2, U4, U5 and U6 containing 5 small nuclear snRNAs (U1, U2, U4, U5 and U6), ranging from 107 to 210 nucleotides and their associated proteins (6-10 per snRNP) assemble on the pre-mRNA to form the spliceosome. • There are a total of ~100 proteins in the spliceosome, some of which are not associated with snRNPs. These non-snRNP proteins may contribute to the specificity of recognition of the splice sites by snRNPs and some of them contain RNA helicase activity to help the rearrangements of base pairing in snRNAs during the splicing cycle. • U4 masks the catalytic activity of U6 in the U4/U6/U5 tri-snRNPs prior to the actual transesterification reactions. • Massive rearrangements of base-pairing interactions among various snRNAs converts the spliceosome into a catalytically active form, which releases the U1 and then the U4 snRNPs and brings U2 and U6 together. • RNA molecules play key roles in directing the alignment of splice sites (e.g. U1 and U2 base pairing with the pre-mRNA) and in carrying out the catalysis (a U2/U6 catalytic center). http://www.bioinfocreator.com/animation/group_II_introns.swf Slide 123
Lec 04 E complex A complex Assembly, rearrangement, and catalysis within the spliceosome • Assembly step 1 • U1 recognize 5’ splice site. One subunit of U2AF (splicing factor) binds to Py tract and one other splicing factor (BBP)to the 3’ splice site. The former subunits interacts with BBP and helps it bind to the branch point.Early (E) complex is formed. • Assembly step 2 • U2 binds to the branch site, and then “A” complex is formed.The base-pairing between the U2 and the branch site is such that the branch site “A” is extruded. This “A” residue is available to react with the 5’ splice site. Slide 124
Lec 04 A complex B complex C complex Assembly, rearrangement, and catalysiswithin the spliceosome • Assembly step 3 • U4, U5 and U6 form the tri-snRNPparticle.With the entry of the tri-snRNP, the “A” complex is converted into the “B” complex. • Assembly step 4 • U1 leaves the complex, and U6 replaces it at the 5’ splice site.U4 is released from the complex, allowing U6 to interact with U2.This arrangement called the “C” complex. RNA Pliceosome Slide 125
Lec 04 Assembly, rearrangement, and catalysis within the spliceosome • Catalysis Step 1 • Formation of the “C” complex produces the active site, with U2 and U6 RNAs being brought together. • Formation of the active site juxtaposesthe 5’ splice site of the pre-mRNA and the branch site, allowing the branched “A” residue to attack the 5’ splice site to accomplish the first transesterfication reaction. • Catalysis Step 2 • U5 snRNP helps to bring the two exons together, and aids the second transesterification reaction, in which the 3’-OH of the 5’ exon attacks the 3’ splice site. • Final step • Release of the mRNA product and the snRNPs http://www.bioinfocreator.com/animation/group_II_introns.swf Slide 126
Lec 04 Alternative splicing • Many genes in higher eukaryotes encode RNAs that can be spliced in alternative ways to generate two or more different mRNAs and thusdifferent protein products. • Antibodies (immunoglobulins) are produced in two form: a soluble secreted and a membrane-bound. Both forms of the antibody are produced from the same pre-mRNA by alternative splicing. • Transmembrane anchore region (M exon) either is excluded (secreted antibody) or included (membrane bound antibody). Slide 127
Lec 04 Alternative splicing • In Drosophila tree genes involved in sex determination (Sxl, tra, dsx). Each of these genes produces a pre-mRNA has two possible splicing patterns depending on whether the fly is male or female. • In male flies, splicing resulting inactive Slx and tra gene products. • The dsx gene product is functional and inactivates the female-specific genes. • In female Slx and tra gene products are functional and interact with dsx gene to alter its splicing pattern such that inactivation of female-specific genes does not occur. ♀ ♂ Slide 128
Lec 04 There are five different ways to alternatively splice a pre-mRNA Slide 129
Lec 04 Alternative splicing • Alternative splicing can be either constitutive or regulated • Constitutive alternative splicing: more than one product is always made from a pre-mRNA. • Regulative alternative splicing: different forms of mRNA are produced at different time, under different conditions, or in different cell or tissue types. • Alternative splicing is regulated by activators and repressors. • The regulating sequences : exonic or intronic; splicing enhancers (ESE or ISE) or silencers (ESS and ISS). The former enhance and the latter repress splicing. • Proteins that regulate splicing bind to these specific sites for their action. Mo Chen & James L. Manley (2009): Nature Reviews Molecular Cell Biology 10, 741-754. Slide 130
Nested intronic genes Lec 04 Slide 131