290 likes | 392 Views
On the R ange M aximum-Sum S egment Q uery Problem. Kuan-Yu Chen and Kun-Mao Chao Department of Computer Science and Information Engineering, National Taiwan University, Taiwan 2004/12. Outline. Motivation Problems that raised from Bioinformatics applications
E N D
On the Range Maximum-Sum Segment Query Problem Kuan-Yu Chen and Kun-Mao Chao Department of Computer Science and Information Engineering, National Taiwan University, Taiwan 2004/12
Outline • Motivation • Problems that raised from Bioinformatics applications • Definition of our research problem (RMSQ) • Our main idea • Finding partners for each indices • Reduce the problem to the Range Minima Query problem (RMQ) • Conclusions and applications • Solving three relevant problems in O(n) time
Applications to biomolecular sequence analysis • Locating conserved regions or GC-rich regions • Assign a real number (also called scores) to each residue • Looking for the maximum-sum or maximum-average segments • With length constraints or average lower bound
What is a Maximum-Sum Segment? • Also called maximum-sum intervals or maximum scoring regions • Given a sequence of numbers, the maximum-sum segment is simply the continuous subsequence having the greatest total sum. • <5, -5.1, 1, 3, -4, 2, 3, -4, 7> Total sum = 8 zero prefix/suffix sum is not allowed
Finding the maximum-sum segment with length constraints • Lin, Jiang, and Chao [JCSS 2002] and Fan et al. [CIAA 2003] gave the O(n)-time algorithm for this problem, respectively. • Length at least L, at most U L U
Finding all maximal-sum segments • Ruzzo and Tompa [ISMB 1999] gave a O(n) time algorithm for this problem. • Recursive calls. L R S
Finding the longest segment with average constraints • Wang and Xu [Bioinformatics 2003] gave a linear time algorithm
Our results • We propose an algorithm that runs in O(n) preprocessing time and O(1) query time • We use the RMSQ techniques we developed to solve the three problems mentioned above in O(n) time
Problem Definition • Range Maximum-Sum Segment Query problem • The input is a sequence <a1,a2,……an> of real numbers which is to be preprocessed. A query is comprised of two intervals [i, j] and [k, l], our goal is to return the maximum-sum segment whose starting index lies in [i, j] and ending index lies in [k, l].
A Nonoverlapping Example • Input Sequence: • 9, -10, 4, -2, 4, -5, 4, -3, 6, -11, 8, -3, 4, -5, 3 Total sum = 6 Starting region End region
An Overlapping Example • Input Sequence: • 9, -10, 4, -2, 4, -5, 4, -3, 6, -11, 8, -3, 4, -5, 3 Total sum = 8 Starting region End region
Main Idea • Reduce to the RMQ problem • Theorem. If there is a <f(n), g(n)>-time solution for the RMQ problem, then there is a <f(n)+O(n), g(n)+O(1)>-time solution for the RMSQ problem. O(n) RMSQ RMQ O(1)
A relevant problem - RMQ • Range Minima Query Problem (also called Discrete Range Searching)
Case 1: Nonoverlapping Maximize Maximize Minimize sum(i, j ) = prefix-sum(j) – prefix-sum(i-1) • 9, -10, 4, -2, 4, -5, 4, -3, 6, -11, 8, -3, 4, -5, 3 Can be reduced to the RMQ problem Find a highest point here Find a lowest point here
Case 2: Overlapping • Some problems occur in the overlapping case: • 9, -10, 4, -2, 4, -5, 4, -3, 6, -11, 8, -3, 4, -5, 3 Negative Sum !! Find a highest point here Find a lowest point here
Case 2: Overlapping • Divide into 3 possible cases: • 9, -10, 4, -2, 4, -5, 4, -3, 6, -11, 8, -3, 4, -5, 3 Find a highest point here Find a highest point here Find a lowest point here Find a lowest point here
A special case of RMSQ:single range query • Input Sequence: • 9, -10, 4, -2, 4, -5, 4, -3, 6, -11, 8, -3, 4, -5, 3 • Challenge: Can this special case be reduced to the RMQ problem? Total sum = 6
Idea • Step 1. Find a partner for each index. • Step 2. Record the sum of each pair in an array • Step 3. Reduce to the RMQ problem -- retrieve the maximum-sum pair within the querying interval
Our First Attempt (1) • Step 1: For each index i, we define the lowest point preceding i as its partner i partner(i)
Our First Attempt (2) • Step 2: Record sum(i, partner(i)) in an array i partner(i) sum(i, partner(i))
Our First Attempt (3) • Step 3: Apply the RMQ techniques to an array i Retrieve the maximum-sum pair partner(i) sum(i, partner(i))
Faults • What if its partner go beyond the querying interval? i Worst case Needs to be updated partner(i) sum(i, partner(i))
Nesting Property • Input Sequence: • 9, -10, 4, -2, 4, -5, 4, -3, 6, -11, 8, -3, 4, -5, 3 9,-10, 4,-2, 6,-5, 4,-3 ,8, -11, 8,-3, 9,-5, 3 Apply RMQ techniques Update can be done in O(1) time
Use RMSQ Techniques to Solve the Other two relevant problems • 1. Finding the Maximum-Sum Segment with length constraints in O(n) time. - Y.-L. Lin, T. Jiang, K.-M. Chao, 2002 - T.-H Fan, S. Lee, H.-I. Lu, T.-S. Tsou, 2003 • 2. Finding all maximal scoring subsequences in O(n) time. - W. L. Ruzzo & M. Tompa, 1999
Maximum-Sum Segment with length constraints • Length at least L, at most U L U Runs in O(n) time since each query costs O(1) time
All Maximal Scoring Subsequences • Recursive calls. L R S Runs in O(n) time since each query costs O(1) time
The End • Thank You.