Starting in 2008, USQCD has explored high performance Dirac solvers in CUDA on NVIDIA GPUs [73]. This effort was initially supported by NSF funding, but has rapidly expanded into a major SciDAC project with the development of the QUDA (QCD in CUDA) library [74, 75, 76], and the rapid deployment of GPU accelerated clusters at Jefferson Laboratory and Fermilab. Our ability to respond rapidly to this new architecture demonstrates the advantage of our clear factorization of Level 3 solvers in the QCD API. At present the QUDA library has expanded to include all Dirac solvers used in QCD (Wilson-Clover, HISQ/asqtad, domain wall and twisted mass). The result has been a dramatic improvement in price/performance for a range of analysis work that is dominated by Dirac solvers. The most recent advance has been the extension of code from single to multiple GPUs. The multiple-GPU codes enables us to analyze the full set of lattices sizes generated by USQCD members with excellent weak scaling. In a paper presented to Super Computing 2011 we demonstrated that we have achieved good strong scaling on up to 256 GPUs for the HISQ/asqtad and Wilson-Clover solvers running on the Edge cluster at LLNL
Dsg, the latest cluster built at FNAL, (upper left) consists of 76 nodes, each with 2.53 GHz dual CPU eight core Intel Xeon ("Westmere") conventional processors, two NVidia M2050 GPU processors, and a QDR fabric. It has a throughput of over 8 TF, depending on the task.
The latest GPU-accelerated cluster at JLab is 10g (lower left). This cluster consists of 32 host systems, each containing 4 NVIDIA Tesla C2050 GPUs with 3 GBytes of memory per GPU, and 20 host systems, each containing 4 GTX-480 GPUs with 1.5 GBytes of memory per GPU, for a total of 128 Tesla and 80 GTX-480 GPUs. The 10g host systems are dual socket, quad core 2.53 GHz Intel Westmere computers with 48 GBytes of host memory. The hosts are interconnected with either single-data-rate (GTX-480 GPUs) or quad-data-rate (Tesla C2050 GPUs) Infiniband.
Members of USQCD employed directly by the NVIDIA Corporation are working in collaboration with academic members of USQCD to optimize code for Titan, the GPU-based supercomputer at the Oak Ridge Leadership Computing Facility.