...

Slides - Agenda

by user

on
Category: Documents
11

views

Report

Comments

Transcript

Slides - Agenda
APE
group
Does HPC really fit GPUs ?
Davide Rossetti
INFN – Roma
[email protected]
Napoli, 26-27 January 2010
January 25-27 2010
Incontro di lavoro della CCR
1
A case study: Lattice QCD
V. Lubicz – CSN4 talk, settembre 2009
January 25-27 2010
Incontro di lavoro della CCR
2
A case study: Lattice QCD
●
●
●
Most valuable product is the gauge configuration
●
Different types: Nf, schemes
●
Different sizes
A grid-enabled community (www.ildg.org)
●
Storage sites
●
Production sites
●
Analysis sites
Gauge configuration production really expensive !!!
January 25-27 2010
Incontro di lavoro della CCR
3
HPC in INFN
●
●
●
Focus on compute intensive Physics (excluding LHC stuff):
LQCD, Astro, Nuclear, Medical
Needs for 2010-2015:
●
~ 0.01-1 Pflops for single research group
●
~ 0.1-10 Pflops nationwide
Translates to:
●
Big infrastructure (cooling, power, …)
●
High procurement costs (€/Gflops)
●
High maintenance costs (W/Gflops)
January 25-27 2010
Incontro di lavoro della CCR
4
LQCD on GPU ?
●
Story begins with video games (Egri, Fodor et al. 2006)
●
Wilson-Dirac operator at 120Gflops (K.Ogawa 2009)
●
Domain Wall fermions (Tsukuba/Taiwan 2009)
●
Definitive work: Quda lib (M.A.Clark et al. 2009):
o
Double, Single, Half-precision
o
Half-prec solver with reliable updates > 100Gflops
o
MIT/X11 Open Source License
January 25-27 2010
Incontro di lavoro della CCR
5
INFN on GPUs


2D Spin models (Di Renzo et al, 2008)
LQCD Stag. fermions on Chroma (Cossu, D'Elia et al,
Ge+Pi 2009)

Bio-Computing on GPU (Salina, Rossi et al, ToV 2010?)

Gravitational wave analysis (Bosi, Pg 2010?)

Geant4 on GPU (Caccia, Rm 2010?)
January 25-27 2010
Incontro di lavoro della CCR
6
How many GPUs ?
Raw estimate for memory footprint:
•Full solver in GPU
•Gauge field + 15 fermion fields
•No symmetry tricks
•No half-prec tricks
January 25-27 2010
Lattice
size
Single
precision
memory
(GiB)
Double
# GTX280
precision
memory
(GiB)
# Tesla
C1060
# Tesla
C2070
243x48
1
2.1
3
1
1
323x64
3.3
6.7
4-8
2
1-2
483x96
17
34
17-35
5-9
3-6
643x128
54
108
55-110
14-28
9-18
Incontro di lavoro della CCR
7
If one GPU is not enough
Multi-GPU, the Fastra II* approach:
●
Stick 13 GPUs together
●
12TFLOPS @ 2KW
●
CPU threads feed GPU kernels
●
Embarrassingly parallel → great!!!
●
Full problem fits → good!
●
Enjoy the warm weather 
* University of Antwerp, Belgium
January 25-27 2010
Incontro di lavoro della CCR
8
multi-GPUs need scaling!
Seems easy:
1.
2.
3.
4.
1-2-4 GPUs in 1-2U system (or buy Tesla
M1060)
Stack many
Add an interconnect (IB, Myrinet 10G,
custom) & plug accurately :)
Simply write your program in
C+MPI+CUDA/OpenCL(+threads)
Multi-node Single GPU
parallelism
kernel
January 25-27 2010
Multi-GPU
mgmt
Incontro di lavoro della CCR
9
Some near-term solutions for LQCD
Two INFN approved projects:
●
QUonG: cluster of GPUs with
custom 3D torus network
APEnet+ talk by R.Ammendola
●
Aurora: dual Xeon 5500
custom blade with IB & 3D
first-neighbor network
January 25-27 2010
Incontro di lavoro della CCR
10
INFN assets
●
20 years of experience in high-speed 3D torus interconnects
(APE100, APEmille, apeNEXT, APEnet)
●
20 years writing parallel codes
●
Control over HW architecture vs. algorithms
January 25-27 2010
Incontro di lavoro della CCR
11
Wish list for multi-GPU computing
Open the GPU to the world:
Provide APIs to hook inside
your drivers
•
Allow PCIe to PCIe DMAs or
better …
•
… add some high-speed data
I/O port toward an external
device (FPGA, custom ASIC)
•
Promote GPU from simple accelerator to
main computing engine status !!
January 25-27 2010
Incontro di lavoro della CCR
P
C
I
e
x
p
r
e
s
s
DRAM Main
Memory
GPU
12
In conclusions
●
●
GPUs are good at small scales
Scaling from single GPU, to multi-GPU, to multinode, hierarchy deepen
●
Programming complexity increases
●
Watch GPU → Network latency
●
Please, help us link your GPU to our 3D network !!!
Game over :)
January 25-27 2010
Incontro di lavoro della CCR
13
Fly UP