240 likes | 459 Views
Operon Prediction. Cao Fan. Operon. A functioning unit of genomic material containing a cluster of genes under the control of a single regulatory signal or promoter Exists primarily in prokaryotes, also found in eukaryotes. Operon. Approaches- wet lab.
E N D
Operon Prediction Cao Fan
Operon • A functioning unit of genomic material containing a cluster of genes under the control of a single regulatory signal or promoter • Exists primarily in prokaryotes, also found in eukaryotes
Approaches- wet lab • Demonstrate co-transcription of the candidate gene cluster via RT-PCR of whole cell RNA • Reverse transcribe a specific RNA into a cDNA using a gene specific primer • Amplify the cDNA via PRC using primers designed from genes within the gene cluster • Successful PCR amplification signals the genes are members of an operon Maritza Guacucano, Gloria Levican, David S. Holmes, Eugenia Jedlicki. An RT-PCR artifact in the characterization of bacterial operons. http://www.ejbiotechnology.info/content/vol3/issue3/full/5/index.html
Approaches – dry lab Features used: • Intergenic distance (IG) • Conserved gene clusters (CG) • Functional relations (FR) • Experimental evidence (EE) • Sequence based features (SF) • Phylogenetic profiles(PP)
Intergenic distance • IG(contiguous genes, same operon) < IG(contiguous genes, different operons) • The most widely used parameter for operon prediction • Best single predictor
Conserved gene clusters • Genes in an operon tend to be preserved across phylogenetically related organisms • Order of genes in an operon may not be conserved • Sequence comparison between non-redundant genomes is usually performed to identify conserved clusters
Functional relations • Genes in the same operon tend to encode functionally related proteins • E.g. members of the same protein complex, enzymes part of a single metabolic pathway
Functional relations Functional classifications: • Riley’s functional annotation • Metabolic pathways • Clusters of orthologous groups of proteins (COG) • Gene ontologies (GO)
Sequence-based features • Overrepresented sequence motifs and other sequence elements such as promoters, terminators are used • Gene length ratio is also used. The ratio is shown to be genome specific
Phylogenetic profiles • Indicate a general trend for a set of genes to be simultaneously present or absent in related organisms • PP is shown to be genome specific
Features IG only CG only IG, SF, EE SF Rutger W.W. Brouwer, Oscar P.Kuipers and Sacha A.F.T. van Hijum. The relative value of operon predictions. Briefings in Bioinformatics 2008
Using both genome-specific and general genomic information • PhuonganDam, Victor Olman, Kyle Harris, Zhengchang Su and Ying Xu • Features used: • Intergenic distance • Neighborhood conservation • Phylogenetic distance • Short DNA motifs • Similarity score between GO terms • Length ratio
Prediction of operons in microbial genomes • by Maria D. Ermolaeva, Owen White and Steven L. Salzberg • Features: • Conserved gene clusters • Scoring method: • Log-likely scores
Prediction of operons in microbial genomes • Gene pair: two adjacent genes separated by ≤200 bp • Conserved gene pair: two adjacent genes (A,B) for which a homologous gene pair (A’,B’) can be found in another genome. • Similarity(A,B) < Similarity(B,B’) and Similarity(A,B) < Similarity(A,A’) • Use BLASTP to find homologs
Prediction of operons in microbial genomes • S pair: genes in the pair on the same strand • D pair: genes in the pair on different strands • SO pair: gene pair belong to the same operon • SN pair: gene pair belong to different operons • Directon: a maximal set of adjacent genes located on the same DNA strand
Prediction of operons in microbial genomes • Probability of a conserved S pair being an SO pair: P = 1 – P[SN|(conserved, S)] - Pchance • P[SN|(conserved,S)] = = =
Prediction of operons in microbial genomes Calculate P(SN|S): • Assumption: orientation of operons is random • N(operons) = 2N(directons) • N(SN pairs) = N(operons) – N(adjacent, non-pairs) – N(D pairs) = 2N(directons) – (N(genes) – N(pairs)) – N(D pairs) = 2N(directons) + N(S pairs) – N(genes) • P(SN|S) = N(SN pairs) / N(S pairs)
Prediction of operons in microbial genomes Calculating Pchance: Pchance = (0.1G/N(conserved S))h G is the number of genomes searched, h is the number of genomes where homologs for a given gene is found
Prediction of operons in microbial genomes Result: 7699 gene pairs in 34 bacterial genomes with genes belonging to the same operon with probability >= 0.98 Sensitivity: 30% - 50%
OperonDB • Gene pair: co-linear, maybe separated by other genes with the same orientation • Modified probability estimation with integration of intergenic distances: P = 1 – P(SN|(conserved, S))* - Pchance where P(l|D) and P(l|S) define the probabilities for a given S or D pair to have intergenic distance l.
OperonDB Result: • Sensitivity > 60% • Maximum accuracy: 80%