10 likes | 97 Views
Visualization of Sequence Alignment for Large Genomes. 指導老 師 : 黃耀廷 專題生 : 洪碩懋. Method : Records mapping position. Abstract.
E N D
Visualization of Sequence Alignment for Large Genomes 指導老師:黃耀廷專題生 : 洪碩懋 Method : Records mapping position Abstract I created a table to record the file seek positions of the mapping positions when preprocessing.This way, searching from the beginning of a file is not needed. Searching now uses these file pointers. The alignment of Next-Generation Sequencing compares millions of reads with reference sequence. The alignment results are written to Sequence Alignment/Map File (SAM File). However the huge size of SAM file can not be easily checked by human eye. In this project, I developed visualization software of sequence alignment for large genomes using OpenGL API. The size of the SAM file may be up to gigabytes. I invented methods for prefetching the necessary alignments according to the user interface navigation behavior. Additional Improvement : File Mapping Originally, using standard read/write operations on each segment of data over and over again created a very large overhead. I changed from using standard read/write operations to using file mapping in order to increase I/O performance. The reason I did this is there is no overhead copying data into user space since file mapping directly maps the file into virtual memory. The Problem Encountered Experiments • The size of a SAM file can be 10+ gigabytes, so it is not feasible to load an entire SAM file into a system's memory. • Poor performance that makes users unpleasantwhile using the viewer. To verify whether my methods really improved the performance or not, I tested performance with the above methods. The results show my methods are better than using fstream and no records. Program Features ■The ComboBoxdisplays the list of reference sequences. ■ The button labelled “Take A Screenshot” can take a screenshot. ■ The progress bar is used to check whether the pre-processing is done. ■ After the user specifies all settings and clicks the button labelled“View”, the OpenGL window iscreated to visualizethe sequence alignment as follows. ■ In the bottom half of the window, the green, blue, yellow and red colors respectively represent A,T, C and G. ■ In the middle section, the curve represents the coverage of segment sequences. ■ In the top-right corner, it displays properties like thecurrent position, average coverage and the value on the curve, depending on where the mouse cursor is moved. ■ Pressing the “A” and “D” keys moves to the left or right. ■ Users can zoom in/out to view varying ranges withthe mouse wheel. ■ In the top-left corner, clicking the button will immediately convert all color squares into the letter A, T, C and G as shown in the figure above.