530 likes | 663 Views
HPC technologies applied to the Burrows-wheeler TRANSFORM to enhance short read assembly. Ignacio Blanquer. Objectives. To justify the suitability of Burrows-Wheeler Transform for problems related with NGS , especially alignment and assembly.
E N D
HPC technologies applied to the Burrows-wheeler TRANSFORM to enhance short read assembly Ignacio Blanquer
Objectives • To justify the suitability of Burrows-Wheeler Transform for problems related with NGS, especially alignment and assembly. • To present how GPUs can provide the computing resources needed for the large scale problems in assembly and NGS. • To discuss about limitations and other approaches. • All the work presented here is part of a collaboration between the I3M and the CIPF • Thanks to Ignacio Medina, José Salavert, Joaquin Tárraga and Joaquin Dopazo. • First results published in Salavert J, BlanquerI, Tomas A, et al. "UsingGPUsfortheExactAlignment of Short-readGeneticSequencesbyMeans of theBurrows—WheelerTransform“ IEEE/ACMtransactionsoncomputationalbiology and bioinformatics / IEEE, ACM. 2012. SeqAhead Workshop on HPC4NGS
I am a Computer scientist(nobodyisperfect) Member of the High Performance and Grid Computing Research Group. Working in Medical and LifeScienceapplications. Responsible of applicationcommunities in theSpanish e-Science Network, and VENUS-C cloudcomputingproject. SeqAhead Workshop on HPC4NGS
Content • The problem of assembly • The overlap detection: a main bottleneck for NGS • Techniques for efficiently mapping short reads • Suffix tries and Suffix arrays. • Burrows-Wheeler Transform. • FM-Index. • Porting of FM-Index based searching tool in GPUs • Zero-error techniques. • One-error techniques. • Bottlenecks and improvements. • Extension to the problem of assembly. • Conclusions. SeqAhead Workshop on HPC4NGS
Theproblem of Assembly in NGS • A puzzle with tens of thousands of million of pieces • Many of them are repeated. • Some of them are missing. • There is no exact reference, and sometimes even there is no reference at all. • Finding 1010 needles in 1010 haystacks! SeqAhead Workshop on HPC4NGS
Stages in theAssembly Idury, R.M. & Waterman, M.S. “A new algorithm for DNA sequence assembly”. J. Comput. Biol. 2, 291–306 (1995). T. Chen, S.. Skiena, “Trie..Based Data Structures for Sequence Assembly”, Combinatorial Pattern Matching 1997 Experiments Preprocessing OverlapDetection Layout ConsensusSequence Analysis SeqAhead Workshop on HPC4NGS
ComputationalAnalysis 1Jared T Simpson and Richard Durbin, Efficient de novo assembly of large genomes using compresseddata structures, Genome Res. December, 2011 2 Medvedev P, Georgiou K, Myers G, Brudno M: Computability of Models for Sequence Assembly. Lecture Notes in Computer Science 2007, 4645:289-301. • For the hard stages • Overlap detection1 • In practical terms, limited in the best case by |X|·log(|X|)+|X|·avg(|Xi|). • Layout and Consensus Sequence2 • Described as a bidirectional weighted graph • Nodes are the different sequences and arrows describe overlaps. • Arrows’ weights typically define the unoverlapped fragment. • Typically NP-Hard, but there exist solutions in O(|E|2log2(|V|)) • |E| is the number of cycles and |V| is the maximum number of nodes. SeqAhead Workshop on HPC4NGS
Overlapdetection SeqAhead Workshop on HPC4NGS
Nomenclature • X = {X1…Xu} is the set of all the u sequences in a NGS experiment. • Each sequence Xi has a length of ni elements over an alphabet S of 4 symbols. • For simplicity, quality indicators are not considered in the algorithms. • Actually implemented in the final versions. • W denotes a sequence to be searched over X or Xi. SeqAhead Workshop on HPC4NGS
The Overlap Detection Xi Xj k • Problem: • To find all pairs Xi ,Xj that fulfil • Xi -> Xjin at least k elements • Xi -> Xjif Xj[ni-k..ni] == Xi[1: k ]. • In a brute-force approach, it will require checking each Xi with respect any Xj , i != j • A NGS experiment may involve 20 Gigabases. • Unfeasible for any traditional searching process (FASTA, BLAST, SW, etc.). • Need for advanced searching structures. SeqAhead Workshop on HPC4NGS
The Overlap Detection • Issues • Computational time • We should avoid complete cross comparison. • Linear or quasi-linear methods are needed. • Memory storage • Indexed searching requires 9-10 bytes per base. • This would mean around 200 GB RAM. • Need for efficient structures SeqAhead Workshop on HPC4NGS
Searchingstructures • Different structures speed-up the process of searching also reducing memory requirements • Suffix arrays • Suffix tries • BWT-based Suffix tries • FM-Index. • These techniques are also valuable when searching for short seeds that are then extended using dynamic programming • E.g. Smith-Waterman. SeqAhead Workshop on HPC4NGS
SuffixArrays & SuffixTREEs SeqAhead Workshop on HPC4NGS
SuffixArrays 123456 X = AGGAGC 6 C 5 GC 4 AGC 3 GAGC 2 GGAGC 1 AGGAGC 3 5 1 4 6 2 123456 SA = 416352 W = “GAG” Li=1; Ls=6-> k=(6+1)/2-> 3 SA(3) = 6-> X(6:$) = “C” < W Li=4; Ls=6-> k=(4+6)/2 -> 5 SA(5) = 5-> X(5:$) = “GC” > W Li=4; Ls=4 -> k=(4+4)/2-> 4 SA(4) = 3-> X(3:$) = “GAGC” = W • A sorted list of the indexes to the different suffixes of a sequence. • Can be built in O(n·log(n)) time • Being “n” the size of the text. • Need 6n bytes and searching for a string of length “p” requires O (p·log(n)). SeqAhead Workshop on HPC4NGS
SuffixTREEs and SUFFIX TRIES 123456 X = AGGAGC Suffix Trie C C A A G G 6 6 A C C GAGC G AGC C C G GAGC 3 5 5 2 4 4 1 SuffixTree G A A C G G 3 C C 2 1 A Trie (fromreTRIEval) is a specialtreeusedtocodethesuffixes of a stringorgroup of strings. EquivalenttotheSuffixArray. Bycondexingthedifferentleaves, a SuffixTreeisobtained. SeqAhead Workshop on HPC4NGS
Burrows-WheelerTransform SeqAhead Workshop on HPC4NGS
Burrows-WheelerTransform • Typically used in bzip compression and text searching. • It consist on a sequence of all the previous characters to the beginnings of a Suffix Array. • It can be seen as the last character of all the sorted rotations of the reference sequence. • The BWT groups all possible suffixes speeding up the searching. 0 1 2 3 4 5 6 0 1 2 3 4 5 6 6 3 0 5 2 4 1 SeqAhead Workshop on HPC4NGS
Recovering the original text from BWT • In order to recover the original text, only first and last (BWT) columns are needed • Starting from the last simbol B(2) -> ‘$’. • The first symbol of the original string should be F(2) -> ‘A’. • The following symbol should be the first one in the row that ended by this ‘A’. • Since it is the second ‘A’, it should be also the second one in the BWT -> B(6). • So first one is F(6) -> ‘G’. • The recurring sequence gives the original string: • F(2), F(6), F(4), F(1), F(5), F(3), F(0) • AGGAG$ 0123456 X = AGGAGC$ 0123456 B = CG$GGAA F B 6 3 0 5 2 4 1 0 1 2 3 4 5 6 SeqAhead Workshop on HPC4NGS
Transformada Burrows-Wheeler (BWT) 0,6 • Theleaves of thesuffixtree are anotatedwiththerange of suffixesthat match the input sequence. • Theadvantageisthateachnodegroupsseveralmatches and providesadditionalinformationtodealwitherrors. C G A 3,3 1,2 ^ 4,6 G G A G 5,5 1,2 4,4 1,2 6,6 A G ^ 1,1 6,6 A G 2,2 1,2 4,4 G A 4,4 2,2 G ^ 2,2 6 3 0 5 2 4 1 6,6 G 0 1 2 3 4 5 6 ^ 6,6 2,2 A 2,2 A 2,2 ^ 2,2 ^ 0 1 2 3 4 5 6 2,2 R= “^AGGAGC$” SeqAhead Workshop on HPC4NGS
Transformada Burrows-Wheeler (BWT) 0,6 0,6 • AGC (Max 1F) • $ [0,6] • agA (1F) [1,2] • A,Cexclued. • aGA (1F) [4,4] • A,Cexcluded. • GGA (2F) [6,6] > X • agC (0F) [3,3] • A,Cexcluded. • aGC (0F) [5,5] • C,Gexcluded. • AGC (0F) [1,1] > V • agG (1F) [4,6] • C excluded. • aAG (2F) [1,2] > X • aGG (1F) [6,6] • C, G excluded. • AGG (1F) [2,2] V C C G A G A 3,3 1,2 3,3 1,2 ^ 4,6 G G ^ 4,6 G A G G 5,5 1,2 4,4 A G 5,5 1,2 4,4 1,2 6,6 A G ^ A 1,2 6,6 1,1 6,6 A G G ^ 1,1 2,2 6,6 1,2 4,4 G A A G 4,4 2,2 G 2,2 1,2 4,4 G ^ A 2,2 6,6 G ^ 4,4 2,2 G ^ 6,6 2,2 A 2,2 6,6 G ^ 2,2 A 6,6 2,2 6 3 0 5 2 4 1 2,2 A 0 1 2 3 4 5 6 ^ 2,2 A 2,2 ^ 2,2 2,2 ^ 2,2 ^ 0 1 2 3 4 5 6 2,2 R= “AGGAGC$” SeqAhead Workshop on HPC4NGS
FM-INDEX SeqAhead Workshop on HPC4NGS
FM-Index • PresentedbyFerragina and Manzini (*) • Providesanefficientwaytoconstruct and traverse a BWTsuffixtree • Construction in O(n) time once theBWTisconstructed. • Searching in linear time proportionaltothelength of the input sequence. (*) Ferragina, P. and Manzini, G. (2000).Opportunistic data structureswithapplications. In 41st IEEE SumposiumonFoundations of Computer Science, FOCS, 390-398 SeqAhead Workshop on HPC4NGS
FM-INDEX • Using the BWT, two data structures are created enabling searching in linear time. • Vector C contains the cummulative number of occurences in the BWT for each one of the symbols in the alphabet, including their predecessors. • Matrix O contains the number of occurences for each symbol at each element in the BWT. 0 1 2 3 4 5 6 7 BWT = “C$GGGAA” A C G T A C G T 0 1 2 3 4 5 6 7 SeqAhead Workshop on HPC4NGS
Searchingwiththe FM-INDEX • Searching along the tree • We use the formula: k = C(b) + O(b, k) + 1 l = C(b) + O(b, l + 1) Where b is the character to be processed. • String is searched reversely. • C represents the number of suffixes whose starts is alphabetically lower • E.g. the offset in the M matrix of the BWT. • O represents the offset within the block of sequences where the complete actual sequence could appear. 6 3 0 5 2 4 1 0 1 2 3 4 5 6 SeqAhead Workshop on HPC4NGS
SearchingwiththeFM-INDEXExample • X = “AGGAGC”, W=“GAG” • “G”: k=0, l=6 • k=C(G)+O(G,k)+1=3+0+1=4 • l=C(G)+O(G,l+1)=3+3=6 • [4,6] are the 3 sequencesendingby “G”. • “A”: k=4, l=6 • k=C(A)+O(A,k)+1=0+0+1=1 • l=C(A)+O(A,l+1)=0+2=2 • [1,2] are the2 sequencesendingby“AG”. • “G”: k=1, l=2 • k=C(G)+O(G,k)+1=3+0+1=4 • l=C(G)+O(G,l+1)=3+1=4 • [4,4] isthesequenceendingby“GAG”. 6 3 0 5 2 4 1 0 1 2 3 4 5 6 A C G T A C G T 0 1 2 3 4 5 6 7 SeqAhead Workshop on HPC4NGS
Searchingwiththe FM-INDEX(*) (*) Fast and accurate short read alignment with Burrows–Wheeler Transform, Heng Li and Richard Durbin, BIOINFORMATICS Vol. 25 no. 14 2009, doi:10.1093/bioinformatics/btp324 • Searching starts from the end of the string • At element “j”, up to 9 possible branches should be explored • Exact matching: The counter for the number of errors is not increased, new values for k and l are calculated according to the formula. Process continues with element j-1. • Mismatch: If the search tree indicates that there are additional matches in the reference that differ in the current symbol. If the counter of errors has not reached the maximum allowed, all the possible branches are explored (up to 3). New values for k and l are calculated, error counter is incremented and processing continues in the element j-1. • Deletion: In this case, the algorithm consider that current symbol may have been inserted and searches for matches in the reference skipping the current symbol. The error counter is increased (if possible), processing continues in the element j-1, but values of k and l are kept unmodified. • Insertion: In this case, the algorithm consider that a symbol is missing and checks in the tree for the possible branches including a new symbol at the present position (up to 4 branches). The error counter is increased, new values for k and l are calculated and processing continues in the same symbol. SeqAhead Workshop on HPC4NGS
Dealing with errors • Early termination • It is possible to predict if a branch will lead to an unfeasible solution by computing the number of errors that have to be assumed. • It requires computing vector D for each searched sample, using the inverted reference string. • It reduces the branching explosion. ComputeD(W) z←0 j←0 fori=0 to|W|−1 do if W[j,i]X then z←z+1 j←i+1 fi D(i)←z end end SeqAhead Workshop on HPC4NGS
The general case – Recursiveapproach CalculateD(W) k←1 l←|X|−1 z←0 fori=0 to|W|−1 do k←C(W[i])+O(W[i],k−1)+1 l←C(W[i])+O(W[i],l) ifk>l k←1 l←|X|−1 z←z+1 end done D(i)←z InexRecur(W,i,z,k,l) ifz<D(i) return∅ ifi<0 return{[k,l]} I←∅ I←I∪ InexRecur(W,i−1,z−1,k,l) for each b∈{A,C,G,T} do k←C(b)+O(b,k−1)+1 l←C(b)+O(b,l) ifk≤l I←I∪ InexRecur(W,i,z−1,k,l) ifb=W[i] I←I∪ InexRecur(W,i−1,z,k,l) else I←I∪ InexRecur(W,i−1,z−1,k,l) end done returnI SeqAhead Workshop on HPC4NGS
Difficulties in usingGPUs • Recursive model, although supported in the last versions, is not effective. • Multiple branches will reduce the parallelism degree. • GPUs memory is reduced (insufficient for human genome). • Memory access coherence has a critical impact on the final performance. • Simplifications may be needed • Cooperation GPU-CPU is the key to success. SeqAhead Workshop on HPC4NGS
ExactSearch • By removing (or limiting) the branching for multiple errors, code for processing multiple sequences simultaneously can be homogenenous. • Different sequences have different values of k and l • However, different sequences can stop at different steps • Due to different lengths or the presence of mismatches. • Pres-sorting by size could speed-up the algorithm. SeqAhead Workshop on HPC4NGS
ExactSearchwithGPUs • void BWSearchGPU(W[][], nW[], k[], l[], k_ini, l_ini, C, O) • { • id_thread = blockIdx.x * blockDim.x + threadIdx.x; • if (threadIdx.x<4) CopyToSharedMemory(C); • __syncthreads(); • k2 = k_ini; l2 = l_ini; • for (i=nW[id_thread]-1; (k2<=l2) && (i>=0); i--) • BWiteration(k2, l2, k2, l2, W[id_thread][i], C, O); • k[id_thread] = k2; • l[id_thread] = l2; • } • GPU algorithm parallelization is achieved by running simultaneous searches on each CUDA thread. • FM index (C and O vectors) of the reference is copied to the GPU before searching. • The search strings (W) and the transform intervals (k, l) must be transfered between CPU and GPU. SeqAhead Workshop on HPC4NGS
Extensionto 1 error G G 0,6 • Eachnode can lead up to 9 branches • Ej. AGC A aGA XGA $ agA C G G A G A Match 3,3 1,2 AGG aGG T agG C ^ 4,6 G G A G 5,5 1,2 4,4 A agC 1,2 6,6 A G C ^ 1,1 6,6 A G T 2,2 1,2 4,4 G A 4,4 2,2 G ^ G A aGC 2,2 6,6 G ^ C 6,6 2,2 A T 2,2 A 2,2 ^ G AGC Match 2,2 ^ 2,2 SeqAhead Workshop on HPC4NGS
Supportfor a variable number of errors • Use of exact searching for pre-filtering sequences leading to an exact matching • Short computing time • It may reduce the problem by a 39%. • For the sequences not found, define a threshold and use the matching fragments as seeds • The rest of the sequence can be done using Smith-Waterman or similar approaches. • 1-error searching slightly increases alignment time (overlapped), but increases accuracy(42%). SeqAhead Workshop on HPC4NGS
Towards a usefultool, combination of CPU and GPU SeqAhead Workshop on HPC4NGS
OtherOptimizations • O matrix is huge • The number of elements stored can be reduced by storing only one element of each 32 and storing the changes as bits in a 32-long word. • Enables storing the whole O for the human genome is state-of-the art GPU boards. • Performance is not compromised by the use of machine instructions, such as (_popcnt). • Partial sorting of the reference • Considering genetic variability, the ordering of the S array can take into account only the first n<|X| elements. • However, this is incompatible with the compression of the S vector. • Overlaping I/O and processing • Input and Output of the different sequences of blocks during the processing in the GPU. SeqAhead Workshop on HPC4NGS
Other Optimizations • Compression of the Suffix Array • Suffix array, again is huge (one integer per element of the BWT). • Compression is feasible by storing a fraction of the elements (with a fixed stride) and iterating with the formula • S(k)=S((Y-1)(j)(k))+j • Y-1(i)=C(B[i])+O(B[i],i) • Combining searching in both strands. SeqAhead Workshop on HPC4NGS
RESPONSE TIME Find and show 1 match SeqAhead Workshop on HPC4NGS
Response time Show allmatches SeqAhead Workshop on HPC4NGS
Response time Fordifferentblock times SeqAhead Workshop on HPC4NGS
Speed-UP Fordifferent input sizes SeqAhead Workshop on HPC4NGS
Distribution of processing time With disk caching Without disk caching SeqAhead Workshop on HPC4NGS
Use of BWT in Assembly SeqAhead Workshop on HPC4NGS
Can we apply directly the fm-index? • Construct a BWT for the Xi sequences • Computational time: sum(|Xi|·log(|Xi|)+|Xi|) • Search each Xi sequence over all the BWTs • Computational time: |X|·sum(|Xi|) • Memory Requirements (FM-index) • O: 4·8·sum(|Xi|) • S: 8·sum(|Xi|) • Instantiation: 200 Million sequences of 100 bases • Computational time> Pf. • Memory Reqs: >600GB • Unfeasible!!!! SeqAhead Workshop on HPC4NGS
The SGA* algorithm (*) Efficient de novo assembly of large genomes using compresseddata structuresJared T Simpson and Richard Durbin, Genome Res. December, 2011 • An approach could be to create a single BWT that could be used to search all the hits for each sequence simultaneously . • Ideally, once the Multiple BWT is created, the computing time will be linear with the size of the sequencing. • Moreover, the Multiple BWT gives already an information about similar sequences. SeqAhead Workshop on HPC4NGS
Suffixarraysformultiplesequences • A Suffix Array can be extended to cover a set of sequences • SA(i) = (j,k) • In the j-th sequence, the suffix [Sj(k).. Sj(|Sj(k)|)] occupies the i-th position in an alphabet order. • All sequences are terminated by a $j symbol, being $j alphabetically lower than any symbol of the alphabet and being $p < $q if p<q. • If two sequences are equal, the order is given by the order of the sequence. SeqAhead Workshop on HPC4NGS
A pictureisworth a millionwords 1 – (1,7) 2 – (2,8) 3 – (3,7) 4 – (3,6) 5 – (2,6) 6 – (3,4) 7 – (1,4) 8 – (2,2) 9 – (1,1) 10 – (1,6) 11 – (2,4) 12 – (3,2) 13 – (2,7) 14 – (3,5) 15 – (1,3) 16 – (2,1) 17 – (1,5) 18 – (2,3) 19 – (3,1) 20 – (1,2) 21 – (2,5) 22 – (3,3) 01234567 R1 = AGGAGC$1 R2= GAGCTAG$2 R3= GCTAGA$3 8 $2 7 G$2 6 AG$2 5 TAG$2 4 CTAG$2 3 GCTAG$2 2 AGCTAG$2 1 GAGCTAG$2 7 $1 6 C$1 5 GC$1 4 AGC$1 3 GAGC$1 2 GGAGC$1 1 AGGAGC$1 7 $3 6 A$3 5 GA$3 4 AGA$3 3 TAGA$3 2 CTAGA$3 1 GCTAGA$3 R2 R1 R3 SeqAhead Workshop on HPC4NGS
DefinitionfortheBWT i – SA(i) – B(i) - F(i) 1 – (1,7) – C - $1 2 – (2,8) – G - $2 3 – (3,7) – A - $3 4 – (3,6) – G - A 5 – (2,6) – T - A 6 – (3,4) – T - A 7 – (1,4) – G - A 8 – (2,2) – G - A 9 – (1,1) – $1 - A 10 – (1,6) – G - C 11 – (2,4) – G - C 12 – (3,2) – G - C 13 – (2,7) – A - G 14 – (3,5) – A - G 15 – (1,3) – G - G 16 – (2,1) – $2 - G 17 – (1,5) – A - G 18 – (2,3) – A - G 19 – (3,1) – $3 - G 20 – (1,2) – A - G 21 – (2,5) – C - T 22 – (3,3) – C - T SA(i) = (j, k) B(i) = Rj(k-1) 01234567 R1 = AGGAGC$1 R2= GAGCTAG$2 R3= GCTAGA$3 SeqAhead Workshop on HPC4NGS
FM-INDEXformultiplesequences C= [391220] 12345678910111213141516171819202122 O(A)=0011111111112333455666 O(C)=1111111111111111111123 O(G)=0112223445677788888888 O(T)=0000122222222222222222 BWT = CGAGTTGG$1GGGAAG$2AA$3ACC SeqAhead Workshop on HPC4NGS
Searchingwiththefm-index 01234567 W= GAGCTAG$2 i – SA(i) – B(i) - F(i) 1 – (1,7) – C - $1 2 – (2,8) – G - $2 3 – (3,7) – A - $3 4 – (3,6) – G - A 5 – (2,6) – T - A 6 – (3,4) – T - A 7 – (1,4) – G - A 8 – (2,2) – G - A 9 – (1,1) – $1 - A 10 – (1,6) – G - C 11 – (2,4) – G - C 12 – (3,2) – G - C 13 – (2,7) – A - G 14 – (3,5) – A - G 15 – (1,3) – G - G 16 – (2,1) – $2 - G 17 – (1,5) – A - G 18 – (2,3) – A - G 19 – (3,1) – $3 - G 20 – (1,2) – A - G 21 – (2,5) – C - T 22 – (3,3) – C - T (k, l) = (2, 22) k’ = C(x)+O(x,k-1)+1 l’ = C(x)+O(x,l) G -> (C(G)+O(G,1)+1, C(G)+O(G,22) -> (12+0+1, 12+8) = (13, 20) A -> (C(A)+O(A,12)+1, C(A)+O(A,20) -> (3+1+1, 3+6) = (5, 9) T -> (C(T)+O(T,4)+1, C(T)+O(T,9) -> (20+0+1, 12+8) = (21, 22) C -> (C(C)+O(C,20)+1, C(C)+O(C,22) -> (9+1+1, 9+3) = (11, 12) G -> (C(G)+O(G,10)+1, C(G)+O(G,12) -> (12+5+1, 12+7) = (18, 19) SeqAhead Workshop on HPC4NGS
A pictureisworth a millionwords 1 – (1,7) 2 – (2,8) 3 – (3,7) 4 – (3,6) 5 – (2,6) 6 – (3,4) 7 – (1,4) 8 – (2,2) 9 – (1,1) 10 – (1,6) 11 – (2,4) 12 – (3,2) 13 – (2,7) 14 – (3,5) 15 – (1,3) 16 – (2,1) 17 – (1,5) 18 – (2,3) 19 – (3,1) 20 – (1,2) 21 – (2,5) 22 – (3,3) 01234567 R1 = AGGAGC$1 R2= GAGCTAG$2 R3= GCTAGA$3 01234567 W= GAGCTAG$2 8 $2 7 G$2 6 AG$2 5 TAG$2 4 CTAG$2 3 GCTAG$2 2 AGCTAG$2 1 GAGCTAG$2 7 $1 6 C$1 5 GC$1 4 AGC$1 3 GAGC$1 2 GGAGC$1 1 AGGAGC$1 7 $3 6 A$3 5 GA$3 4 AGA$3 3 TAGA$3 2 CTAGA$3 1 GCTAGA$3 R2 R1 R3 SeqAhead Workshop on HPC4NGS