1 / 24

Operon Prediction

Operon Prediction. Cao Fan. Operon. A functioning unit of genomic material containing a cluster of genes under the control of a single regulatory signal or promoter Exists primarily in prokaryotes, also found in eukaryotes. Operon. Approaches- wet lab.

fausto
Download Presentation

Operon Prediction

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Operon Prediction Cao Fan

  2. Operon • A functioning unit of genomic material containing a cluster of genes under the control of a single regulatory signal or promoter • Exists primarily in prokaryotes, also found in eukaryotes

  3. Operon

  4. Approaches- wet lab • Demonstrate co-transcription of the candidate gene cluster via RT-PCR of whole cell RNA • Reverse transcribe a specific RNA into a cDNA using a gene specific primer • Amplify the cDNA via PRC using primers designed from genes within the gene cluster • Successful PCR amplification signals the genes are members of an operon Maritza Guacucano, Gloria Levican, David S. Holmes, Eugenia Jedlicki. An RT-PCR artifact in the characterization of bacterial operons. http://www.ejbiotechnology.info/content/vol3/issue3/full/5/index.html

  5. Approaches – dry lab Features used: • Intergenic distance (IG) • Conserved gene clusters (CG) • Functional relations (FR) • Experimental evidence (EE) • Sequence based features (SF) • Phylogenetic profiles(PP)

  6. Intergenic distance • IG(contiguous genes, same operon) < IG(contiguous genes, different operons) • The most widely used parameter for operon prediction • Best single predictor

  7. Conserved gene clusters • Genes in an operon tend to be preserved across phylogenetically related organisms • Order of genes in an operon may not be conserved • Sequence comparison between non-redundant genomes is usually performed to identify conserved clusters

  8. Functional relations • Genes in the same operon tend to encode functionally related proteins • E.g. members of the same protein complex, enzymes part of a single metabolic pathway

  9. Functional relations Functional classifications: • Riley’s functional annotation • Metabolic pathways • Clusters of orthologous groups of proteins (COG) • Gene ontologies (GO)

  10. Sequence-based features • Overrepresented sequence motifs and other sequence elements such as promoters, terminators are used • Gene length ratio is also used. The ratio is shown to be genome specific

  11. Phylogenetic profiles • Indicate a general trend for a set of genes to be simultaneously present or absent in related organisms • PP is shown to be genome specific

  12. Features IG only CG only IG, SF, EE SF Rutger W.W. Brouwer, Oscar P.Kuipers and Sacha A.F.T. van Hijum. The relative value of operon predictions. Briefings in Bioinformatics 2008

  13. Features

  14. Using both genome-specific and general genomic information • PhuonganDam, Victor Olman, Kyle Harris, Zhengchang Su and Ying Xu • Features used: • Intergenic distance • Neighborhood conservation • Phylogenetic distance • Short DNA motifs • Similarity score between GO terms • Length ratio

  15. Prediction of operons in microbial genomes • by Maria D. Ermolaeva, Owen White and Steven L. Salzberg • Features: • Conserved gene clusters • Scoring method: • Log-likely scores

  16. Prediction of operons in microbial genomes • Gene pair: two adjacent genes separated by ≤200 bp • Conserved gene pair: two adjacent genes (A,B) for which a homologous gene pair (A’,B’) can be found in another genome. • Similarity(A,B) < Similarity(B,B’) and Similarity(A,B) < Similarity(A,A’) • Use BLASTP to find homologs

  17. Prediction of operons in microbial genomes • S pair: genes in the pair on the same strand • D pair: genes in the pair on different strands • SO pair: gene pair belong to the same operon • SN pair: gene pair belong to different operons • Directon: a maximal set of adjacent genes located on the same DNA strand

  18. Prediction of operons in microbial genomes • Probability of a conserved S pair being an SO pair: P = 1 – P[SN|(conserved, S)] - Pchance • P[SN|(conserved,S)] = = =

  19. Prediction of operons in microbial genomes Calculate P(SN|S): • Assumption: orientation of operons is random • N(operons) = 2N(directons) • N(SN pairs) = N(operons) – N(adjacent, non-pairs) – N(D pairs) = 2N(directons) – (N(genes) – N(pairs)) – N(D pairs) = 2N(directons) + N(S pairs) – N(genes) • P(SN|S) = N(SN pairs) / N(S pairs)

  20. Prediction of operons in microbial genomes Calculating Pchance: Pchance = (0.1G/N(conserved S))h G is the number of genomes searched, h is the number of genomes where homologs for a given gene is found

  21. Prediction of operons in microbial genomes Result: 7699 gene pairs in 34 bacterial genomes with genes belonging to the same operon with probability >= 0.98 Sensitivity: 30% - 50%

  22. OperonDB • Gene pair: co-linear, maybe separated by other genes with the same orientation • Modified probability estimation with integration of intergenic distances: P = 1 – P(SN|(conserved, S))* - Pchance where P(l|D) and P(l|S) define the probabilities for a given S or D pair to have intergenic distance l.

  23. OperonDB Result: • Sensitivity > 60% • Maximum accuracy: 80%

  24. Relation to UROP

More Related