510 likes | 576 Views
Disconnected Diagrams, Multi-grid, Nvidia & all that y. Richard Brower (Boston University) James Brannick (Penn) Ron Babich (BU) Kipton Barros (BU) Mike Clark (BU) George Fleming (Yale) James Osborn (Argonne) Claudio Rebbi (BU) QCDNA 2008 – Regensburg Sept 5, 2008.
E N D
QCDNA 2008 - Sept 5, 2008 Rich Brower (Boston U.) Disconnected Diagrams, Multi-grid, Nvidia & all thaty Richard Brower (Boston University) James Brannick (Penn) Ron Babich (BU) Kipton Barros (BU) Mike Clark (BU) George Fleming (Yale) James Osborn (Argonne) Claudio Rebbi (BU) QCDNA 2008 – Regensburg Sept 5, 2008 yWARNING: Much here is a FUTURE plan NOT proven results but .....
QCDNA 2008 - Sept 5, 2008 Rich Brower (Boston U.) What is QCD? Acronym Definition: QCD Qualified Charitable Distribution (IRS) QCD Quality, Cost, Delivery QCD Quantum Chromodynamics QCD QuarkCopyDesk (file extension) QCD Quasi-Cyclic Dyadic QCD Quick Change Directory QCD Quick Claim Deed (real estate) QCD Quintessential CD (PC media player) QCD Quit Claim Deed (real estate) QCD Quality Control Department
QCDNA 2008 - Sept 5, 2008 Rich Brower (Boston U.) What do these mean? • QCD From Wikipedia, the free encyclopedia • Quintessential Player, formerly known as Quintessential CD • Quality, Cost, Delivery, a three-letter acronym used in lean manufacturing • Quad City DJ's, Southern rap group • Quick Control Dial, a control on many DSLR cameras, like the Canon EOS 40D • Quote-Comma-Delimited known also as Comma-separated values • Quantum chromodynamics, the theory describing the Strong Interaction
QCDNA 2008 - Sept 5, 2008 Rich Brower (Boston U.) Outline • Physics: (How strange is the proton?) • Algorithms: (Multi-grid to the rescue?) • Hardware: (GPU propagator farm?)
QCDNA 2008 - Sept 5, 2008 Rich Brower (Boston U.) Physics: Disconnected Diagrams Connected vs. Disconnected Want matrix element:
QCDNA 2008 - Sept 5, 2008 Rich Brower (Boston U.) How strangey is the proton? Who cares? • Violation of Standard Model: • Dark Energy (Neutralino scattering): • NuTev anomaly: • Nucleon Physics (include u/d + s quares): • iso-scalar Form Factors, nucleon structure function, Spin crisis for proton, matrix element etc. y see Lattice 2008: http://conferences.jlab.org/lattice2008/parallel-bytopic-struct.htmlS.Collins, G. Bali, A.Schafer“Hunting for the strangeness ... nucleon”Takumi Doi et al “Strangeness and glue in the nucleon from lattice QCDRon Babich et al “Strange quark content of the nucleon”
QCDNA 2008 - Sept 5, 2008 Rich Brower (Boston U.)
QCDNA 2008 - Sept 5, 2008 Rich Brower (Boston U.) Direct detection of dark matter • In SUSY, the neutralino scatters from a nucleon via Higgs exchange: • The strange scalar matrix element is a major uncertainty: • Uncertainty in fTs gives up to a factor of 4 uncertainty in the cross-section! • Bottino et al., hep-ph/0111229; • Ellis et al., hep-ph/0502001
QCDNA 2008 - Sept 5, 2008 Rich Brower (Boston U.) Nuclear Experiment • PVES + BNL E734 (νp scattering) Parity-violating electron scattering (SAMPLE, HAPPEx, PVA4, G0) J. Liu et al., arXiv:0706.0226 [nucl-ex] (see also Young et al., nucl-ex/0605010) Pate et al., arXiv:0805.2889 [hep-ex]
QCDNA 2008 - Sept 5, 2008 Rich Brower (Boston U.) Algorithm • Monte Carlo update (Long auto correlations times) • Global Heat bath aka “Stochastic Estimator:” (Zero auto correlations) Find Á = D-1´ for ´ Gaussian or Gauge or Z2 (Zero auto correlations!) • With < ´y´x > = ±yx Axy
QCDNA 2008 - Sept 5, 2008 Rich Brower (Boston U.) Improving Stochastic Estimate • Variance reduction: • Dilution vs hopping parametery (Short distance) • Multi-grid vs “deflation”/truncationy (Long distance) • Curing volume divergence • Trace versus Gauge fluctuations • Better and more source (all to all?). • Full multi-grid O(N long N) Trace? x y y S.Collins, G. Bali, A.Schafer“Hunting for the strangeness ... nucleon”
QCDNA 2008 - Sept 5, 2008 Rich Brower (Boston U.) Trace estimation • Two sources of error: gauge noise and error in trace. In this calculation, we largely eliminate the second source by calculating a “nearly exact” trace on four time-slices. • 864 sources (x12 for color/spin). A given source is nonzero on 4 sites on each of 4 time-slices. • Minimal spatial separation between sites is . Small residual contamination is gauge-variant and averages to zero. • Equivalent to using a single stochastic source with “extreme dilution.” 4 x 63 = 864
QCDNA 2008 - Sept 5, 2008 Rich Brower (Boston U.) Preliminary Methods • Configurations were provided by the LHPC “Spectrum Collaboration” • anisotropic lattice with • 2 dynamical flavors, Wilson fermion and gauge actions • 863 configurations • 64 (x 12) inversions per configuration at the light quark mass, for the nucleon correlators • 864 (x 12) inversions per configuration at the strange mass, for the trace
QCDNA 2008 - Sept 5, 2008 Rich Brower (Boston U.) Strange scalar form factor
QCDNA 2008 - Sept 5, 2008 Rich Brower (Boston U.) Ratio approach • Conventionally, one extracts the (e.g. zero-momentum) form factor from the large t behavior of the ratio • (or from a similar expression integrated over time). • Instead, we fit the numerator directly, since this allows us • to avoid contamination from backward-propagating states, which are problematic due to the short temporal • extent of our lattice ( ). • to explicitly take into account the contribution of (forward-propagating) excited states. • In the following, we always treat the system • symmetrically with
QCDNA 2008 - Sept 5, 2008 Rich Brower (Boston U.)
QCDNA 2008 - Sept 5, 2008 Rich Brower (Boston U.) Direct fit • First, we perform a fit to the nucleon two-point function, of the form • The coefficients and masses are very well-determined, since we are required to calculate correlators from all initial times (a total of 863 x 64 = 55,232). • Next, we perform a fit to the three-point function, • Here j1 and j2 are the form factors for the proton and its first excited state, and j12 is a transition matrix element between them. In practice, we expect j2 and j12 to absorb the contribution of still higher states, and trust only j1 to be reliable.
QCDNA 2008 - Sept 5, 2008 Rich Brower (Boston U.) Strange scalar form factor • For the renormalization-invariant quantity fTs, we estimate where we have inserted the physical nucleon mass. The second error is the uncertainty in relating this mass to the lattice scale, the first error is statistical, and no other systematics are included. • Note that the matrix element in the numerator was calculated for a world with a 400 MeV pion. If we work consistently in such a world by inserting our calculated nucleon mass, the scale dependence drops out, and we find
QCDNA 2008 - Sept 5, 2008 Rich Brower (Boston U.) Momentum dependence of GS(q2) s PRELIMINARY
QCDNA 2008 - Sept 5, 2008 Rich Brower (Boston U.) Strange axial form factor • Results have not been renormalized. • Calculated value is distinct from zero at the 3-s level. PRELIMINARY
QCDNA 2008 - Sept 5, 2008 Rich Brower (Boston U.) Error = O(L3/2) ) as L3)1For Exact Trace in a Connect correlator,
QCDNA 2008 - Sept 5, 2008 Rich Brower (Boston U.)
QCDNA 2008 - Sept 5, 2008 Rich Brower (Boston U.)
QCDNA 2008 - Sept 5, 2008 Rich Brower (Boston U.)
QCDNA 2008 - Sept 5, 2008 Rich Brower (Boston U.)
QCDNA 2008 - Sept 5, 2008 Rich Brower (Boston U.)
QCDNA 2008 - Sept 5, 2008 Rich Brower (Boston U.)
QCDNA 2008 - Sept 5, 2008 Rich Brower (Boston U.)
QCDNA 2008 - Sept 5, 2008 Rich Brower (Boston U.)
QCDNA 2008 - Sept 5, 2008 Rich Brower (Boston U.)
QCDNA 2008 - Sept 5, 2008 Rich Brower (Boston U.)
QCDNA 2008 - Sept 5, 2008 Rich Brower (Boston U.)
QCDNA 2008 - Sept 5, 2008 Rich Brower (Boston U.)
QCDNA 2008 - Sept 5, 2008 Rich Brower (Boston U.)
QCDNA 2008 - Sept 5, 2008 Rich Brower (Boston U.) Most Important New Trick: Multi-grid Variance Reduction • The signal and variance of the first term is down by 1 to 2 orders of magnitude because Dc» D • The Coarse level Trace for D-1c is as cheap to calculate as the level down operator inverse. • This can of course be done recursively giving (I think) an O(N log N)trace calculation to fixed tolerance.
QCDNA 2008 - Sept 5, 2008 Rich Brower (Boston U.) HARDWARE • Graphics hardware is well suited to highly parallel numerical tasks. • Hardware vendors provide development tools to support high performance computing. • NVIDIA'S CUDA offers direct access to graphics hardware through a programming language similar to C. • Dirac-Wilson operator which runs at an effective 68 Gigaflops on the Tesla C870 GPU. • The recently released GTX 280 GPU at 92 Gigaflops and we expect improvement pending code optimization. • (Now 98 Gigaflops hope to get O(150) Gigaflops)
QCDNA 2008 - Sept 5, 2008 Rich Brower (Boston U.) Nvidia GPU architecture
QCDNA 2008 - Sept 5, 2008 Rich Brower (Boston U.) Two Generations Consumer vs HPC GPUs Consumer cards ) High Performance (HPC) GPUs I. 8880 GTX ) Tesla C870 (16 multi-processor with 8 cores each) II. GTX 280 ) Tesla C1060 (30 multi-processor with 8 cores each)
QCDNA 2008 - Sept 5, 2008 Rich Brower (Boston U.) C870 code using 60% of the memory bandwidth.
QCDNA 2008 - Sept 5, 2008 Rich Brower (Boston U.)
QCDNA 2008 - Sept 5, 2008 Rich Brower (Boston U.)
QCDNA 2008 - Sept 5, 2008 Rich Brower (Boston U.)
QCDNA 2008 - Sept 5, 2008 Rich Brower (Boston U.)
QCDNA 2008 - Sept 5, 2008 Rich Brower (Boston U.) http://www.scala-lang.org/
QCDNA 2008 - Sept 5, 2008 Rich Brower (Boston U.) Future software Plans • Need find out why we are only saturating 60% of Memory bandwidth • Further educe memory traffic: 8 real number per SU(3) matrix (2/3 of 12 used now) shear spinors in 43 blocks (5/9 of used now) • Generalize to clover Wilson & Domain Wall operator (slightly better flops/mem ratio). • DMA between GPU on Quad system and network for cluster • Start to design SciDAC API for many-core technologies.
QCDNA 2008 - Sept 5, 2008 Rich Brower (Boston U.) Tesla 10-Series: What’s the Big Deal?
QCDNA 2008 - Sept 5, 2008 Rich Brower (Boston U.) Consumer Chip GTX 280 ) Tesla C1060
QCDNA 2008 - Sept 5, 2008 Rich Brower (Boston U.) 1 U Quad S1070 System $8K
QCDNA 2008 - Sept 5, 2008 Rich Brower (Boston U.) CUDA 2.0 (Compute Unified Device Architecture) • Can compile CUDA code into highly efficient SSE-based multi-threaded C code
QCDNA 2008 - Sept 5, 2008 Rich Brower (Boston U.) Need a GPU Dirac Propagator Farm • The Clark-Kennedy RHMC Paradox:(Faster you go harder it is to keep up) • Analysis is now the “Ἀχιλλεύςheel” • Solution: Dedicated Analysis farm. • GPU can deliver O(10) to O(100) gain in flops/$ • Two quad Tesla ) 1 Sustained Teraflop! • Two quad Tesla @ 25K ´ One BG/L rack @ 2,000 K