140 likes | 304 Views
Convolution and Its Applications to Sequence Analysis. Student: Bo-Hung Wu Advisor: Professor Herng-Yow Chen & R. C. T. Lee Department of Computer Science & Information Engineering National Chi Nan University. The Definition of Convolution in the Continuous Case. Example.
E N D
Convolution and Its Applications to Sequence Analysis Student: Bo-Hung Wu Advisor: Professor Herng-Yow Chen & R. C. T. Lee Department of Computer Science & Information Engineering National Chi Nan University
The Definition of Convolution in the Continuous Case Example Reference: Lecture notes, “Introduction to communication”, R. C. T. Lee et al.
Exact String-Matching Problem Input. Text string T=T1T2…Tn and pattern string P=P1P2…Pm where Ti, Pi ∑(alphabet) and m<=n. Output. All locations i in T where TiTi+1Ti+2…Ti+m-1=P1P2…Pm It is obvious that string matching is related to convolution.
Convolution in the Discrete Case Definition: Let X=<x0, …, xm>, Y=<y0, … , yn> be two given vectors, xi, yi D. Let and be two given functions, where Then the convolution of X and Y with respect to and is for k=0~ m+n
Consider the exact string-matching problem, how can we use convolution to solve it?[FP74] First we reverse Y to be Second we define the functions and to be as follows: Note that the process of this convolution is equal to the one of the sliding window approach. [FP74]
Applying Convolution to Sequence Analysis • The common substring with k-mismatch allowed problem • Common substrings with k-mismatches allowed among multiple sequences problem • Determining the similarity of two DNA sequences • Searching in a DNA sequences database • Finding repeating groups in a DNA sequence • An aid for detection in transposition • An aid for detecting insertion/deletion • An aid for detecting the overlapping of segments resulting from the shot-gun operations • The corresponding pair-wise nucleotides in a DNA sequence • An aid for looking for similar regions in a DNA sequence with a distance constraint
The Corresponding Pair-wise Nucleotides in a DNA Sequence Substitution rule: A T T A C G G C Example: S=”acttgacgtgaac”
Experiments • We apply convolution on DNA sequences and English compositions to find the similarity of them. • In the following experiments, we used the following DNA sequences as the input data. (Clustering was known in advance for evaluating.) C1(0-25) : Hepatitis B virus; C2(26-162) : Human mitochondrion; C3(163-1041): Other viruses
Experiment : The Comparison of English compositions. We applied convolution on two English compositions to detect whether they are similar or not.
Conclusion and Future Work • We have shown that several applications related to sequences analysis which we discovered can be solved by means of convolution. • Convolution can be used as a negative answer filter. • In practical parts, we did some experiments. The experimental results confirm that this approach is feasible. • By arranging appropriate operations to be the functions in the convolution, we can solve more problems related to sequences analysis. • For example, we hope that we may apply convolution to help solve protein structure comparison.