160 likes | 712 Views
Introduction to Bioinformatics II. Lecture 6 By Ms. Shumaila Azam. Gene : A sequence of nucleotides coding for protein Gene Prediction Problem : Determine the beginning and end positions of genes in a genome. Gene Prediction: Computational Challenge.
E N D
Introduction to Bioinformatics II Lecture 6 By Ms. ShumailaAzam
Gene: A sequence of nucleotides coding for protein • Gene Prediction Problem: Determine the beginning and end positions of genes in a genome.
Gene Prediction: Computational Challenge aatgcatgcggctatgctaatgcatgcggctatgctaagctgggatccgatgacaatgcatgcggctatgctaatgcatgcggctatgcaagctgggatccgatgactatgctaagctgggatccgatgacaatgcatgcggctatgctaatgaatggtcttgggatttaccttggaatgctaagctgggatccgatgacaatgcatgcggctatgctaatgaatggtcttgggatttaccttggaatatgctaatgcatgcggctatgctaagctgggatccgatgacaatgcatgcggctatgctaatgcatgcggctatgcaagctgggatccgatgactatgctaagctgcggctatgctaatgcatgcggctatgctaagctgggatccgatgacaatgcatgcggctatgctaatgcatgcggctatgcaagctgggatcctgcggctatgctaatgaatggtcttgggatttaccttggaatgctaagctgggatccgatgacaatgcatgcggctatgctaatgaatggtcttgggatttaccttggaatatgctaatgcatgcggctatgctaagctgggaatgcatgcggctatgctaagctgggatccgatgacaatgcatgcggctatgctaatgcatgcggctatgcaagctgggatccgatgactatgctaagctgcggctatgctaatgcatgcggctatgctaagctcatgcggctatgctaagctgggaatgcatgcggctatgctaagctgggatccgatgacaatgcatgcggctatgctaatgcatgcggctatgcaagctgggatccgatgactatgctaagctgcggctatgctaatgcatgcggctatgctaagctcggctatgctaatgaatggtcttgggatttaccttggaatgctaagctgggatccgatgacaatgcatgcggctatgctaatgaatggtcttgggatttaccttggaatatgctaatgcatgcggctatgctaagctgggaatgcatgcggctatgctaagctgggatccgatgacaatgcatgcggctatgctaatgcatgcggctatgcaagctgggatccgatgactatgctaagctgcggctatgctaatgcatgcggctatgctaagctcatgcgg Gene!
DNA transcription RNA translation Protein Central Dogma: DNA -> RNA -> Protein CCTGAGCCAACTATTGATGAA CCUGAGCCAACUAUUGAUGAA PEPTIDE
Gene Prediction • Gene finding is one of the first and most important steps in understanding the genome of a species once it has been sequenced. • In computational biology gene prediction or gene finding refers to the process of identifying the regions of genomic DNA that encode genes. • protein-coding genes • RNA genes • regulatory regions
Gene Prediction • Statistical analysis of the rates of homologous recombination of several different genes could determine their order on a certain chromosome, and information from many such experiments could be combined to create a genetic map specifying the rough location of known genes relative to each other. • Determining that a sequence is functional should be distinguished from determining the function of the gene or its product. • in vivo experimentation through gene knockout • bioinformatics research are making it increasingly possible to predict the function of a gene based on its sequence alone.
Extrinsic approaches • In extrinsic (or evidence-based) gene finding systems, the target genome is searched for sequences that are similar to extrinsic evidence in the form of the known sequence of a messenger RNA (mRNA) or protein product. • Given an mRNA sequence, it is trivial to derive a unique genomic DNA sequence from which it had to have been transcribed. • Given a protein sequence, a family of possible coding DNA sequences can be derived by reverse translation of the genetic code.
Extrinsic approaches • Once candidate DNA sequences have been determined, it is a relatively straightforward algorithmic problem to efficiently search a target genome for matches, complete or partial, and exact or inexact. • BLAST is a widely used system designed for this purpose.
Ab initio approaches • Ab Initio gene prediction is an intrinsic method based on gene content and signal detection. • Because of the inherent expense and difficulty in obtaining extrinsic evidence for many genes, it is also necessary to resort to Ab initio gene finding. • genomic DNA sequence alone is systematically searched for certain tell-tale signs of protein-coding genes. • These signs can be broadly categorized as either signals, specific sequences that indicate the presence of a gene nearby, or content, statistical properties of protein-coding sequence itself.
Ab initio approaches(prokaryotes) • In the genomes of prokaryotes, genes have specific and relatively well-understood promoter sequences (signals). • the sequence coding for a protein occurs as one contiguous open reading frame (ORF). • one would expect a stop codon approximately every 20–25 codons, or 60–75 base pairs, in a random sequence. • These characteristics make prokaryotic gene finding relatively straightforward, and well-designed systems are able to achieve high levels of accuracy.
Ab initio approaches(Eukaryotes) • Ab initio gene finding in eukaryotes, especially complex organisms like humans, is considerably more challenging. • First: the promoter and other regulatory signals in these genomes are more complex and less well-understood. • Two classic examples of signals identified by eukaryotic gene finders are CpG islands and binding sites for a poly(A) tail. • Second: splicing mechanisms
Combined approaches • combine extrinsic and ab initio approaches by mapping protein and EST data to the genome to validate ab initio predictions.
Comparative genomics approaches • As the entire genomes of many different species are sequenced, a promising direction in current research on gene finding is a comparative genomics approach. • This is based on the principle that the forces of natural selection cause genes and other functional elements to undergo mutation at a slower rate than the rest of the genome. • Genes can thus be detected by comparing the genomes of related species. • This approach was first applied to the mouse and human genomes