210 likes | 357 Views
Computer Processing for Amyotrophic Lateral Sclerosis. On Parallelizing a Dynamic Programming Algorithm for RNA Folding. Outline. Work to be Undertaken Background and Significance Expected Significance Relation to Class Materials Relation to Present State of Knowledge
E N D
Computer Processing for Amyotrophic Lateral Sclerosis On Parallelizing a Dynamic Programming Algorithm for RNA Folding
Outline • Work to be Undertaken • Background and Significance • Expected Significance • Relation to Class Materials • Relation to Present State of Knowledge • Preliminary Studies / Progress Report • General Plan of the Work • Broad Design of Activities to be Undertaken • Project Design and Methods
Work to be Undertaken • Take existing parallel code • Modify it for use with MPI. • Find or create code to measure Altix performance • Time complexity • Message complexity • See whether analysis in a published article matches experimental measurements of complexity. • See whether knowledge about data dependencies in the algorithm can be used to design localization, and thereby improve parallel performance by reducing unnecessary communication.
Background and Significance There was a physician who was very kind to me who had this disease. I would like to be of service in helping people with this disease. There is increasing evidence for RNA processing problems causing motor neuron degeneration. (ALSR Today, 2008, Vol4, Advances in ALS Genetics, Ammar Al-Chalabi, Ph.D., F.R.C.P. )
RNA Processing • RNA is created when genes are transcribed. (“Central dogma”, Watson and Crick) • “When termination finally occurs, the RNA transcript is released from the DNA template and in the case of eukaryotes it is rapidly processed.”(Proudfoot and Whitelaw, p.97). • 3’ end processing has ...role in regulation of gene expression (ibid, p. 98) • “RNA splicing is a series of cleavage and ligation reactions the result in the precision excision of introns from the precursor RNA”, p. 131 (Krainer and Maniatis) • “it appears that recognition of splice sites by the tRNA splicing enzymes is based primarily on common structural features of the exons, and on the conserved position of the intron”, p. 133
RNA Processing and Folding • “the presence of the splice sites in single-stranded loops are characteristic features of S. cerevisiae pre-tTNAs, but these features are not required for splicing” (ibid., p. 132)
Background and Significance The remaining obstacles are therefore the statistical methods needed to analyze the data, the huge computing resources needed to handle billions of DNA results in thousands of people, and the money required to finance the research. (ALSR Today, 2008, Vol4, Advances in ALS Genetics, Ammar Al-Chalabi, Ph.D., F.R.C.P. )
Computational Aspects • Many algorithms have been developed for the inference and (database) similarity search of RNA secondary structure. However, the execution time (and memory requirement) is often a polynomial with a degree as high as 6. This complexity limits the application of these algorithms. For instance, Baird et al. estimated that it would take 6 months to search for a single regulatory element in a database of 20,000 entries (untranslated regions) [2].
Background of Computation RNA folding is predicted with several algorithms, at least one of which is a dynamic programming algorithm. Dynamic programming algorithms divide larger problems into smaller problems by means of computing the value of smaller problems, and storing these values into an array (matrix). Often values stored in different parts of the array can be computed in parallel, for example the elements along one diagonal of the matrix might be independent of one another.
Time Complexity According to Ogoubi et al., “Many algorithms have been developed for the inference and (database) similarity search of RNA secondary structure. However, the execution time (and memory requirement) is often a polynomial with a degree as high as 6. This complexity limits the application of these algorithms. For instance, Baird et al. estimated that it would take 6 months to search for a single regulatory element in a database of 20,000 entries (untranslated regions) [2].”
Parallelized Dynamic Programming • Dynamic programming has a matrix creation/fill phase, and a readback phase. • The readback phase is sequential, but it is linear. • According to Ogoubi, et al., the “execution time of the fill stage of the RNA folding algorithm can be done in O(N 2).”
Dynamic Programming The dependencies of one cell of the matrix upon previously computed cells of the matrix is a pattern known at compile time. This pattern might be represented as a graph. If we were to imagine a fine-grained case where each cell of the matrix was computed separately, the dependencies would result in interprocessor communication along the graph edges. When we consider how most efficiently to deploy the individual cell computations onto coarser-grained processors, we would consider this dependency graph, trying to form subsets of the vertices, to enclose within a subset boundary, as much of the flow on the graph as possible.
Expected Significance To quickly and efficiently fold long ribonucleic acid (RNA) sequences, fast computational models are needed. This paper compares two parallel multiprocessor computer architectures for the prediction of RNA secondary structure. We show promising experimental results using the OpenMP programming environment. This work is intended to be a testbed for the development of new approaches for the prediction of consensus RNA secondary structure from multiple sequences. Parallel Multiprocessor Approaches to the RNA Folding Problem, Etienne Ogoubi, David Pouliot, Marcel Turcotte, and Abdelhakim Hafid, PPAM 2007, LNCS 4967, pp. 1230–1239, 2008.
Relation to Class Materials In class we are studying the use of parallel machines including libraries MPI and shared memory with the underlying cache coherence protocols. This class has taught us that shared memory implementations imply interprocessor communication. A statement such as the following might be subject to doubt: “The ability of all the processors to access the same pool of variables with no communication overheads cost and no network transit time cost makes OpenMP more suitable for our application compared to Message Passing Interface (MPI).”
Relation to Student’s Research Interests These researchers are working on RNA folding, which is part of RNA processing, which is implicated as problematic for ALS sufferers. Perhaps these researchers can benefit from the material taught in this course.
Relations to Present State of Knowledge in the Field The proposed work is, compared to the present state of knowledge in computer science, probably not a contribution. However, it might be that the researchers in RNA folding would benefit.
Preliminary Studies / Progress Report • Minimal: • Single process program runs on SGI Altix. • Excel spreadsheet tool for prediction of bus saturation in preparation.
General Plan of the Work Find out what measurement tools, for time and message complexity, are available with SGI Altix, for monitoring loading of processor resources. Design a deployment of the algorithm onto processors. Measure the time and messages of the MPI and shared memory implementations. Compare with paper. Check apparent assertion in paper about no interprocessor communication.
Broad Design of the Activities to be Undertaken • See what can be measured • Implement some code • Predict its bus use with spreadsheet tool. • Measure its performance, • Comparing the MPI implementation with • Shared memory implementation
Project Design and Methods Establish a monitor which can measure interprocessor communication, especially cache coherence bus use. Establish a procedure for measuring elapsed time. Predict communication patterns from algorithm, and attempt to minimize interprocess communication by using localization considerations to deploy matrix cell computation onto specific processors. Compare different implementations and deployments with different localities.
References • Proudfoot and Whitelaw, in Transcription and Splicing, Hames and Glover, eds., IRL Press, 1988 • Krainer and Maniatis, in and Splicing, p. 97, Hames and Glover, eds., IRL Press, 1988 • Baird, S.D., Turcotte, M., Korneluk, R.G., Holcik, M.: Searching for IRES., RNA 12(10), 1755–1785 (2006)cited in Ogoubi, Pouliot, et al., Parallel Multiprocessor Approaches to the RNA Folding Problemin R. Wyrzykowski et al. (Eds.): PPAM 2007, LNCS 4967, pp. 1230–1239, 2008.