90 likes | 277 Views
SAXS Scatter Performance Analysis. Chris Wilcox 2/6/2008. Scatter Status. Prototype of basic algorithm, arbitrary number of atoms and topology. Atom types: C, N, O, H, P, S, Zn, and very easy to add more. Matches results with original R prototype from Stefan, for several small molecules.
E N D
SAXS Scatter Performance Analysis Chris Wilcox 2/6/2008
Scatter Status • Prototype of basic algorithm, arbitrary number of atoms and topology. • Atom types: C, N, O, H, P, S, Zn, and very easy to add more. • Matches results with original R prototype from Stefan, for several small molecules. • Computes intensity function divided into specified number of steps.
Scatter Performance (Current) • Original algorithm, no optimization, debug version: 5000 atoms = ~ 60 hours • Original algorithm, no optimization, release version: 5000 atoms = ~ 4 hours • Obvious restructuring, pre-compute factors, release version: 5000 atoms = ~39 minutes. • Avoid redundant work, compiler flags, release version: 5000 atoms = ~19 minutes. Pentium Core Duo, mobile CPU, 166Mhz
Scatter Performance (Analysis) • Scatter factors are pre-computed, requires ~0% of the fastest calculation. • Distance calculations are step independent, requires ~3% only because of SQRT function. • FSIN function appears to be consuming ~60% of processor cycles, is there an alternative? • Intensity calculation itself uses ~86% of the cycles, need to verify again on latest calculation. No real optimization yet, compiler wins anyway!
Scatter Performance (Model) N = # of atoms, S = # of steps, A = # of types • Scatter factors are O(S•A) * (4 exp+4 pow+4 fmul), i.e. 10K iterations for 1000 steps, 10 types. • Distance math is O(N2/2) * (1 sqrt+3 fmul+2 fadd), i.e. 12.5M iterations for 1000 steps, 5000 atoms. • Intensity math is O(S•N2/2) * (1 fsin+9 fmul+2 fadd), i.e. 12.5G iterations for same as previous. • Operations shown are based on code reading, actual floating point instructions are ~2X more frequent.
Scatter Performance (Future) • Complete optimizations, convert sine function to lookup table: 5000 atoms = ~500 seconds? • Find faster floating point performance, not hard to beat by 8x: 5000 atoms = ~60 seconds? • Intensity calculations are independent, so use more processors: 5000 atoms = ~10 seconds? • Question: How many molecules need to be run to represent non-rigid structure?
Next Steps (Short Term) • Add precise timing, develop model to predict performance for arbitrary number of atoms. • Analyze instructions in inner loop of scatter, but may be impossible to improve on compiler. • Extend to read .pdb file format, or integrate with existing Python code. • Try on processor with better floating point, or on parallel machine, what is required to do this? Project setup takes precedence for several weeks.
Next Steps (Long Term) • Close the loop with experimental data on known molecule, algorithms changes as necessary. • Develop streaming version of program that accepts multiple molecules and averages. • New program for modeling elastic topology, previously called “parametric” model. • Investigate change to streaming architecture, may prototype simple framework user interface.