110 likes | 241 Views
Processor Memory Networks Based on Steiner Systems. Tom VanCourt Boston University Martin C. Herbordt ECE Department. Introduction. Problem: “Big Science” brought genomes to the desktop Primary analysis engine is still PC! Our Goal: Bring “Big Computation” to the desktop.
E N D
Processor Memory Networks Based on Steiner Systems Tom VanCourt Boston University Martin C. Herbordt ECE Department
Introduction • Problem: “Big Science” brought genomes to the desktop Primary analysis engine is still PC! • Our Goal: Bring “Big Computation” to the desktop. • Proposed Solution: PC/Workstation plus FPGA-based coprocessor
Desired Problem Characteristics • Low resolution data Data have few bits • Large but manageable data sets Thousands of genes, thousands of samples (at a time) • High-dimensional parameter set Must be searched or enumerated to identify solution • Simple performance criteria (score functions) Evaluated for each candidate parameter vector • Decomposable search strategy Multiple small problems to solve in parallel • Heavy reuse of data Combinations, permutations, orientations
Biology/Chemistry Problems • Sequence alignment Approximate string matching, dynamic programming • Molecule interactions – voxel model 3-axis rotation, 3D convolution • Microarray data analysis Typical: 10 to 10 2 samples,10 3 to 10 4 genes/sample • Hidden Markov models Baum-Welch training, soft Viterbi decode • Compute-intensive statistical analysis Bootstraps, sampling background models
Va f Vb Vc Sample Task-Specific Processor • Input: Vectors V1, V2, … Vv • Query: Which set of t vectors maximizes f(Va, Vb, …)? • Architecture: Parallel PEs on FPGA coprocessor
Problem: Input Bandwidth • Assume ~128 PEs × 3 inputs per PE = 384 values per cycle × 4 bits per value = 1536 bits per cycle • Wide-word solution: INFEASIBLE 400-ported RAM? Data fetched faster than host can load it 384 input values needed
k 3 Distribution Network • X memory: k data values supply PEs. k= 9 84 PEs, 252 PEs X inputs, 28× reuse k= 10 120 PEs, 360 PEs X inputs, 36× reuse • Generates all size-3 subsets of k data values X1-10 Vector Data Memory … PE119 PE0 PE1 PE2 PE3 PE4
Vector Data Memory • Steiner systemS(v, k, t) Dividevobjects into subsets of sizek, so that every size-tsubset is in just onesize-ksubset. • t= 3 (triplets),k= X memories,v= total genes • Host selects vector sets {m1, m2,…}via RAM content ~100 vectors per X memory 105-106 sets of vector choices indx1 Vector Select RAM X1 indx2 X2 indxm Xm indxSEL
Two-level data reuse Temporal reuse by Vector Select Ram Spatial reuse by Distribution Network Whole VDM duplicated Double buffering Overlaps reload, reading Memory and Data Distribution k … Vector data memory (VDM) Dividevobjects into subsets of sizek … … so that every size-3subset is injust onesize-ksubset
Conditions for Success • Data must be strings/vectors If scalar, then Vector Select RAM would be enough … host to PE transfer would be enough • Longer vectors better Longer vector More time per Vector Select word … Longer time between reloads • Narrow data words better Fewer bits per vector, more vectors for bandwidth
Open problems • Steiner systems are special cases k-setsthat contain eacht-setexactly once Theory guarantees large numbers of cases • Set-covering problem in other cases k-setsthat contain eacht-setat least once • Finding collections of k-setshard Believed NP hard Constructive forms of existence theorems?