Genomic Sequence Analysis using Electron-Ion Interaction Potential

Genomic Sequence Analysis using Electron-Ion Interaction Potential Masumi Kobayashi Performance Evaluation Laboratory University of Aizu

Purpose • To find the gene regions by using Lindley Equation and Electron-Ion Interaction Potential (EIIP). • To judge similarity of two DNA sequences that shortens the processing time by using Lindley equation and Electron-Ion Interaction Potential (EIIP).

DNA • DNA sequence consists of four nucleotide letters: A(adenine), T(thymine), G(guanine), and C(cytosine). • Base A is always paired with base T, and C is always paired with D, and DNA is double helix.

DNA Sequence and Amino Acid Sequence • A DNA sequence consists of a row of four nucleotides, and each nucleotide triplet is called a codon. And a codon corresponds to an amino acid. DNA Sequence |・・・|ATG|CGA|TAT|AAA|GCT|TTC|・・・| Amino Acid Sequence |・・・| M | R | L | K | A | F |・・・| Codon

Codon • 61 codons are transformed into amino acid.　 • For example, both TTT and TTC code for Phenylalanine(F). • 3 codons, TAA, TAG, and TGA are called Stop Codon.

The waiting time of the customer of queuing theory and a DNA sequence • In order to use Lindley equation, we need to describe the relation between the waiting time of the customer of queuing theory and a DNA sequence. • A score is given for the similarity of the amino acid of two target gene sequences, and sum of score is made to correspond to waiting time of queuing theory.

Lindley Equation : The score of the n-th letter. : The sum of the score to the n-th letter. Amino Acid Sequence Negative value

Electron-Ion Interaction Potential (EIIP) • Prof. Toyoizumi and Tuchiya showed a technique to find gene coding regions by using Lindley equation. But there is a problem, the determination of score required for Lindley equation is artificial. • In this research, we decide theoretical score by using Electron-Ion Interaction Potential. Each amino acid is represented by the EIIP value, which describes the average energy states of all valance electrons in particular amino acids.

Gene Finding Experiment • The target sequence of this experiment is the genome data of Escherichia coil O157:H7 Sakai. • Escherichia coil O157:H7 Sakai is a major food-born infection pathogen that causes diarrhea, coilitis, and hemolytic uremia syndrome. • We calculate using Lindley equation and EIIP.

Example of Amino Acid Scores and the Stop Codon Score (1) Score = EIIP - 0.0885 Negative Score Positive Score Stop Codon Score -2 × 0.0085

Example of Amino Acid Scores and the Stop Codon Score (2-1) Score = EIIP – 0.0045 Negative Score Positive Score Stop Codon Score -2 × 0.0445

Example of Amino Acid Scoresand the Stop Codon Score (2-2) Change the Stop Codon Score. -0.089 → -0.178 (-4 × 0.0445)

Threshold of Amino Acid Sequence • may become high by chance in the region that is meaningless at an amino acid sequence. • The threshold is used in order to distinguish from meaningless regions. • The score sequence of an amino acid sequence assumes that it is independent and identically distribution. • can be considered to be the waiting time of GI/GI/1 queuing system.

Threshold and the Probabilitythat will exceed the Threshold accidentally for any then The waiting time GI/GI/1 queuing system fills the following inequalities. is the probability judged to be a meaningful sequence although it is a meaningless sequence. The probability that will exceed (Threshold) by chance is 0.05.

Distinction of gene coding regions and junk regions by Threshold

Similarity Comparison Experiment • The target sequence of this experiment is the genome data of human - and -Hemoglobins. • Hemoglobin is contained in erythrocyte and consists of a “hem” containing iron, and a “globin” which is protein, and has the important role of carrying oxygen inside of the body. • We calculate using Lindley equation and EIIP.

Sequences of Human - and -Hemoglobins • The genome data that we use is a gene coding region of Human - and -Hemoglobins. • A gene coding region of Human -Hemoglobin VHLTPEEKSAVTALWGKVNVDEVGGEALGRLLVVYPWTQRFFESFGDLSTPDAVMGNPKVKAHGKKVLGAFSDGLAHLDNLKGTFATLSELHCDKLHVDPENFRLLGNVLVCVLAHHFGKEFTPPVQAAYQKVVAGVANALAHKYH • A gene coding region of Human -Hemoglobin VLSPADKTNVKAAWGKVGAHAGEYGAEALERMFLSFPTTKTYFPHFDLSHGSAQVKGHGKKVADALTNAVAHVDDMPNALSALSDLHAHKLRVDPVNFKLLSHCLLVTLAAHLPAEFTPAVHASLDKFLASVSTVLTSKYR

Amino Acid and the Stop Codon Scores EIIP - 0.0532 -2 × 0.0532

Calculation Results of in -Hemoglobin and -Hemoglobin

The difference (absolute value) of calculation results of in -Hemoglobin and -Hemoglobin

Conclusion • We could find the gene regions from the DNA sequence by Lindley equation and EIIP. • We could show a technique of similarity comparison which shortened the processing time by Lindley equation and EIIP.

Genomic Sequence Analysis using Electron-Ion Interaction Potential

Genomic Sequence Analysis using Electron-Ion Interaction Potential

Presentation Transcript

Potential Hydrogen Ion Concentration

Transposable Elements (TE) in genomic sequence

Instrumenting Genomic Sequence Analysis Pipeline Mothur on Shared Memory Architecture

Interaction Sequence

Ion Electron Method

SlGDSL2 genomic sequence (3600 bp )

Ion Electron Method

Using cDNA sequence quality value to improve cDNA-genomic sequence alignment

Using BLAST for Genomic Sequence Annotation

Novel Peptide Identification using ESTs and Genomic Sequence

Novel Peptide Identification using ESTs and Genomic Sequence

Electron and Ion Currents

ELECTRON - COLD MOLECULAR ION REACTION USING THE HEAVY ION STORAGE RING TECHNIQUE

Genomic Analysis

Genomic Sequence Alignment

Results Genomic Analysis:

Using BLAST for Genomic Sequence Annotation

Electron and Ion Currents

Genomic Sequence alignments and its application

Ion Electron Method

Electron Ion Collider