...

Riunione con Referee Calcolo LHC - Indico

by user

on
Category: Documents
17

views

Report

Comments

Transcript

Riunione con Referee Calcolo LHC - Indico
Riunione con Referee Calcolo LHC
Padova, 25 Maggio 2016
Calcolo ALICE:
stato e richieste finanziarie
Domenico Elia
Domenico Elia
Riunione Referee Calcolo LHC / Padova, 25.5.2016
1
Outline
 ALICE Computing status:
 impiego delle risorse 2015, attività calcolo Run2
 performance siti italiani, attività di R&D
 Richieste finanziarie:
 situazione CPU e storage nei Tier-2, dismissioni
 richieste suppletive 2016 (Tier-1)
 richieste ordinarie 2017 (Tier-1 e Tier-2)
Domenico Elia
Riunione Referee Calcolo LHC / Padova, 25.5.2016
2
ALICE Computing status
First year Run2 data taking
pp @ 13 TeV
PbPb @ 5.02 TeV
Domenico Elia
Riunione Referee Calcolo LHC / Padova, 25.5.2016
3
ALICE Computing status
First year Run2 data taking
2010-2013 – 7.3 PB (one replica)
All data processed in final
reconstruction pass
2015 – 7.2 PB (one replica)
Domenico Elia
Riunione Referee Calcolo LHC / Padova, 25.5.2016
4
ALICE Computing status
Resource usage in 2015
 Overall CPU/DISK/TAPE usage:
 CPU @ T1, T2 over pledge (opportunistic, extra-WLCG)
 DISK usage below request (delay in 2015 data reconstruction)
 high TAPE usage (unexpected high pile-up in pp 13 TeV bs 25 ns)
CERN-RRB-2016-049
Domenico Elia
Riunione Referee Calcolo LHC / Padova, 25.5.2016
5
ALICE Computing status
Resource usage in 2015
 ALICE Grid:
 new entries in 2015-2016:
Domenico Elia
Riunione Referee Calcolo LHC / Padova, 25.5.2016
6
ALICE Computing status
Resource usage in 2015
 ALICE Grid:
 new entries in 2015-2016
 HLT farm used for offline activities (when not in run):

Domenico Elia
included in the Grid as a fully virtual site
Riunione Referee Calcolo LHC / Padova, 25.5.2016
7
ALICE Computing status
Resource usage in 2015
 ALICE Grid:
 new entries in 2015-2016
 HLT farm used for offline activities (when not in run)
 usual share of the activities:



Domenico Elia
~150 MC cycles (papers + first physics analysis of 2015 data)
Run1 raw data re-processing, Run2 data processing (bulk of raw and
MC production for Run2, both pp and PbPb, still to be done)
organized and user (chaotic) analysis
Riunione Referee Calcolo LHC / Padova, 25.5.2016
8
RAW data processing:
9%
ALICE Computing status
Resource usage in 2015
 ALICE Grid:
User analysis:
6%
Organized analysis:
14%
 new entries in 2015-2016
 HLT farm used for offline activities
 usual share of the activities:
MC productions:
71%
61K parallel jobs on average
Domenico Elia
Riunione Referee Calcolo LHC / Padova, 25.5.2016
9
ALICE Computing status
Resource usage in 2015
 Current activities:
 MC productions (papers, first physics 2015, upgrade)
 Raw data processing:


code improved to reduce memory consumption (now 2 GB/job)
2015 data reconstructed partially:
-
Domenico Elia
distortions in the TPC occur in runs with high interaction rate
specific corrections need to be developed, currently being validated
plan to complete fully calibrated reconstruction by next ~1-1.5 months
Riunione Referee Calcolo LHC / Padova, 25.5.2016
10
ALICE Computing status
Resource usage in 2015
 Current activities:
 MC productions (papers, first physics 2015, upgrade)
 Raw data processing
 Changing replication policy:


needed to cope with the available storage  single ESD replica
global disk space needed for 2015 processing:
-
Domenico Elia
5-6 PB (RAW + MC)
barely feasible with the expected resources
Riunione Referee Calcolo LHC / Padova, 25.5.2016
11
ALICE Computing status
Resource usage in 2015
 Current activities:
 MC productions (papers, first physics 2015, upgrade)
 Raw data processing
 Changing replication policy
 Popularity and cleanup:
-
removed very old MC productions
removed second ESD replica for low
acces productions
Volume of data vs
Nr of accesses in X=3,6,12 months
First bin: data created before period X
began and not accessed during that period
Domenico Elia
Riunione Referee Calcolo LHC / Padova, 25.5.2016
12
ALICE Computing status
Performance of the Italian sites
TO
BA
LNL
CT
~14%
CNAF
INFN
Problems with the
LUSTRE FS in the
old Bari site (BC2S)
 fully migrated to
the new ReCaS
datacenter
Domenico Elia
Riunione Referee Calcolo LHC / Padova, 25.5.2016
13
ALICE Computing status
Performance of the Italian sites
 Resource usage @ T2:
 following the usual internal coordination plan
 monthly meetings (performance recording) + annual workshop
 overall ~50% increase in total WCT from 2014 to 2015
Domenico Elia
Riunione Referee Calcolo LHC / Padova, 25.5.2016
14
ALICE Computing status
Performance of the Italian sites
 Resource usage @ T2:
 following the usual internal coordination plan
 monthly meetings (performance recording) + annual workshop
 overall ~50% increase in total WCT from 2014 to 2015
 large upgrade in 2 sites (ReCaS) within 2015:

CATANIA (in production since April, ~1500 core, 1 PB: Catania-VF)

BARI (in production for ALICE since mid-August):
 ~300 server, 105 kHS06 (~10000 core)
- 25 kHS06 CMS pledge + 10 kHS06 ALICE pledge
 ~4 PB disk storage + 2.75 PB tape library
- 900 TB CMS pledge + 900 TB ALICE pledge
 20 Gbit/s network connection (ready for 40 Gbit/s)
Domenico Elia
Riunione Referee Calcolo LHC / Padova, 25.5.2016
15
ALICE Computing status
New ReCaS BA infrastructure
Official opening July 9, 2015:
https://agenda.infn.it/conferenceDisplay.py?confId=9856

BARI Tier-2 from BC2S to ReCaS:
-
migration from LUSTRE to pure XRootD
large opportunistic use of CPU (up to
~6000 slots)
ReCaS
BC2S
Domenico Elia
Pledge 2015
Riunione Referee Calcolo LHC / Padova, 25.5.2016
16
ALICE Computing status
Performance of the Italian sites
New ReCaS center in Bari
New ReCaS center in
Catania: Catania-VF
Pledge:
Catania
Bari
Torino
PD-LNL
Domenico Elia
Riunione Referee Calcolo LHC / Padova, 25.5.2016
17
ALICE Computing status
Performance of the Italian sites
 Monitoring T2 data from APEL:
https://faust01.to.infn.it/#/dashboard/script/pledge_mc_sum.js
BA
LNL
CT
TO
Domenico Elia
Riunione Referee Calcolo LHC / Padova, 25.5.2016
T1
18
ALICE Computing status
R&D activity and s/w for Run3
 Virtual Analysis Facility (STOA-LHC PRIN):
 Cloud-based VAF deployed in BA, CA, LNL, TO and TS
 XRootD-based Data Federation (DF) set-up and populated:

local redirectors in each site + national redirector in BA
 system fully tested, final PRIN report completed by end of April ’16
 Software development for Run3:
 ITS standalone tracking based on cellular automaton (TO)
 ITS geometry (AL)
 response simulation for the pixel (pAlpide) chip (TS, BS-PV)
 First experience with EOS at TS
Domenico Elia
Riunione Referee Calcolo LHC / Padova, 25.5.2016
19
ALICE Computing status
R&D activity on the Dashboard
The project: a Dashboard concentrate in a single graphical interface all
the information concerning the ALICE activity in each site (MonALISA,
local Batch system, local Monitoring system metrics).
• Currently running in the Bari T2 site (since ~2 years)
• Recently exported also
to the Torino site
• Next steps:
– export in all ALICE T2
and others WLCG sites
– global dashboard for the
Italian computing in ALICE
• Abstract submitted to
CHEP’16
Domenico Elia
Riunione Referee Calcolo LHC / Padova, 25.5.2016
20
Sito web calcolo ALICE Italia
https://web2.infn.it/ALICE-Italia-computing/index.php/it/
Domenico Elia
Riunione Referee Calcolo LHC / Padova, 25.5.2016
21
Sito web calcolo ALICE Italia
https://web2.infn.it/ALICE-Italia-computing/index.php/it/
Domenico Elia
Riunione Referee Calcolo LHC / Padova, 25.5.2016
22
Situazione risorse
e richieste finanziarie
Domenico Elia
Riunione Referee Calcolo LHC / Padova, 25.5.2016
23
Richieste finanziarie
Situazione CPU/storage Italia
 In produzione al Tier-1:
 CPU:
 DISK:
 TAPE:
Quota pledge 2015-2016
da Febbraio 2016 (x2)
29000 HS06 (pledge 2016)
3900 TB (pledge 2016)
5500 TB (pledge 2016)
 In produzione ai Tier-2 (+ Cagliari):
Pledge 2016:
43845 HS06 + 4829 TB
Disponibili (incluso obsoleti non ancora dismessi) Maggio 2016
Bari
HS06
TB
Domenico Elia
Catania
PadovaLNL
Torino
Cagliari
Totale
12080
13147
16881
10373
1120
53601
984
1204
1152
1123
70
4533
Riunione Referee Calcolo LHC / Padova, 25.5.2016
24
Richieste finanziarie
Situazione CPU/storage Tier-2
 Acquisti seconda metà 2015:
 CPU: 1720 HS06 a LNL (bonus ~450 HS06) + 1400 HS06 a TO
 storage: espansioni 4x180 TB a BA e LNL (bonus ~50 TB)
 esito ottimizzato con combinazione gare (BA) e acquisti di sito
Domenico Elia
Riunione Referee Calcolo LHC / Padova, 25.5.2016
25
Richieste finanziarie
Situazione CPU/storage Tier-2
 Finanziamento 2016 da CSN3:
 richieste: 435 k€ (387 crescita e rimpiazzi + 48 overhead)
 assegnazioni: 332 k€ (308 crescita e rimpiazzi + 24 overhead)
 rinvio dismissioni storage CT/CA, per metà dismissioni PD-LNL e TO
 assegnata al 50% la richiesta overhead
 pledge 2016 garantite in accordo all’esito CRSG/RRB
Domenico Elia
Riunione Referee Calcolo LHC / Padova, 25.5.2016
26
Richieste finanziarie
Situazione CPU/storage Tier-2
 Finanziamento 2016 da CSN3:
 richieste: 435 k€ (387 crescita e rimpiazzi + 48 overhead)
 assegnazioni: 332 k€ (308 crescita e rimpiazzi + 24 overhead)
 Schema suddivisione tra i siti:
CPU: ~14400 HS06
BA:
LNL:
TO:
 10200 HS06 (1950 crescita + 1568 rimpiazzi = 3518 HS06)
 10200 HS06 (2500 crescita + 5496 rimpiazzi = 7996 HS06)
 10300 HS06 (1300 crescita + 1584 rimpiazzi = 2884 HS06)
DISK: ~620 TB
BA:
LNL:
TO:
Domenico Elia
 1184 TB (260 crescita = 260 TB)
 1202 TB (50 crescita + 130 rimpiazzi = 180 TB)
 1223 TB (100 crescita + 80 rimpiazzi = 180 TB)
Riunione Referee Calcolo LHC / Padova, 25.5.2016
27
Richieste finanziarie
Situazione CPU/storage Tier-2
 Finanziamento 2016 da CSN3:
 richieste: 435 k€ (387 crescita e rimpiazzi + 48 overhead)
 assegnazioni: 332 k€ (308 crescita e rimpiazzi + 24 overhead)
 Situazione acquisti 2016:
 completati:

BA:

LNL: 8600 HS06 (LNL) + licenza per espansione storage
3840 HS06 (BA) + 180 TB (espansione per LNL)
 da finalizzare:

BA:
260 TB (gara comune con CMS, totale ~200 k€)

TO:
2880 HS06 + 180 TB (sinergie con acquisti altre sigle e C3S)

overhead (ricognizione esigenze completata e storni effettuati)
Domenico Elia
Riunione Referee Calcolo LHC / Padova, 25.5.2016
28
Richieste finanziarie
Situazione CPU/storage Tier-2
 Situazione aggiornata con risorse 2016:
 CPU:
45333 HS06
 DISK:
4876 TB
 in eccesso al pledge: 1488 HS06
 in eccesso al pledge: 47 TB
Disponibili a fine 2016 (fatte dismissioni + completati acquisti 2016*)
Bari
HS06
TB
Catania
PadovaLNL
Torino
Cagliari
Totale
10512
13147
11385
10289
0
45333
1244
1204
1202
1226
0
4876
* Ipotesi di buon esito acquisti residui a BA e TO
Pledge 2016:
43845 HS06 + 4829 TB
Domenico Elia
Riunione Referee Calcolo LHC / Padova, 25.5.2016
29
Richieste finanziarie
Dismissioni 2016-17
Anno di
dismissione
Bari
Catania
LNLPadova
Torino
Cagliari
Totale
HS06
2016
1568
0
5496
1584
1120
9768
TB
2016
0
130
260
157
20
567
HS06
2017
0
0
0
3840
0
3840
TB
2017
0
114
0
117
0
231
Rinvio dismissioni storage dalla seconda metà del 2016 al 2017:
130 TB (CT) + 130 TB (LNL) + 80 TB (TO) + 20 TB (CA) = 360 TB
Domenico Elia
Riunione Referee Calcolo LHC / Padova, 25.5.2016
30
Richieste finanziarie
Dismissioni 2016-18
Anno di
dismissione
Bari
Catania
LNLPadova
Torino
Cagliari
Totale
HS06
2016
1568
0
5496
1584
1120
9768
TB
2016
0
0
130
77
0
207
HS06
2017
0
0
0
3840
0
3840
TB
2017
0
244
130
197
20
591
HS06
2018
6672
13147
0
2149
0
21968
TB
2018
0
0
0
205
0
205
Rinvio dismissioni storage dalla seconda metà del 2016 al 2017:
130 TB (CT) + 130 TB (LNL) + 80 TB (TO) + 20 TB (CA) = 360 TB
Dismissioni ReCaS (BA e CT) previste nel 2018 = 20000 HS06
Domenico Elia
Riunione Referee Calcolo LHC / Padova, 25.5.2016
31
Richieste finanziarie
Dismissioni 2016-18
Anno di
dismissione
Bari
Catania
LNLPadova
Torino
Cagliari
Totale
HS06
2016
1568
0
5496
1584
1120
9768
TB
2016
0
0
130
77
0
207
HS06
2017
0
0
0
3840
0
3840
TB
2017
0
244
130
197
20
591
HS06
2018
6672
13147
0
2149
0
21968
TB
2018
0
0
0
205
0
205
 Situazione complessiva Tier-2 a inizio 2017:
 CPU:
45333 – 3840 = 41493 HS06
 DISK:
4876 – 591 = 4285 TB
Domenico Elia
Riunione Referee Calcolo LHC / Padova, 25.5.2016
32
Richieste finanziarie
RRB Aprile 2016
RRB October 2015
 Share INFN per 2017:
 CPU, DISK per Tier-1 e Tier-2: 18.9% (18.5% per 2016)
 TAPE per Tier-1: 34.8% (35.2 per 2016, 41.1% per 2015)
Domenico Elia
Riunione Referee Calcolo LHC / Padova, 25.5.2016
33
Richieste finanziarie
RRB Aprile 2016
+7% (4%) CPU al Tier-1 (0)
increased processing time
for high pile-up pp events (x2)
+ TPC calibration issues
+30% (22%) TAPE al Tier-1 (0)
increased raw data volume
for pp events (x3.5) as
observed in 2015 sample
Domenico Elia
Riunione Referee Calcolo LHC / Padova, 25.5.2016
34
Richieste finanziarie
RRB Aprile 2016
+7% (4%) CPU al Tier-1 (0)
increased processing time
for high pile-up pp events (x2)
+ TPC calibration issues
+30% (22%) TAPE al Tier-1 (0)
increased raw data volume
for pp events (x3.5) as
observed in 2015 sample
 Richiesta suppletiva 2016 per Tier-1:
 CPU:
2700 HS06  35 k€ (pledge 2016 rev: 31752 HS06)
 TAPE:
1.6 PB  40 k€ (pledge 2016 rev: 7.1 PB)
Domenico Elia
Riunione Referee Calcolo LHC / Padova, 25.5.2016
35
Richieste finanziarie
RRB Aprile 2016
RRB October 2015
 Incrementi 2016 rev.  2017:
 CPU:
13.8% (T0)
31.5% (T1)
17% (T2)
 DISK:
27.4%
16.8%
19.9%
 TAPE:
30.8%
39.9%
Domenico Elia
Riunione Referee Calcolo LHC / Padova, 25.5.2016
36
Richieste finanziarie
Richieste 2017: Tier-1 e Tier-2
CPU Tier-1
(HS06)
DISK Tier-1
(TB)
TAPE Tier-1
(TB)
CPU Tier-2
(HS06)
DISK Tier-2
(TB)
Pledged T1
Disp. – dismiss. T2
31752
3885
7064
41493
4285
Scrutinati
ALICE 2017
41769
4725
9883
51975
5916
Delta
10017
840
2819
10482
1631
Stima costo (k€)
130.2
176.4
70.5
115.3
326.2
377.1
Totale (k€)
441.5
54.1
Overhead T2 (k€)
Stima costi T2 (T1): 11 (13) € / HS06 e 200 (210) € / TB
Dismissioni Tier-1: non incluse
Overhead Tier-2: 6% CPU + 5% DISCO (rete) + 7% totale (server aggiuntivi)
Domenico Elia
Riunione Referee Calcolo LHC / Padova, 25.5.2016
37
Richieste finanziarie
Richieste 2017: per sito Tier-2
Dismissioni
HS06 / TB
Bari
Dismissioni totale
k€
0
0
Catania
0
244
LNL-Padova
0
130
Torino
3840
197
Cagliari
0
20
HS06 / TB
0,0
0,0
0,0
0,0
48,8
48,8
0,0
26,0
26,0
42,2
39,4
81,6
0,0
4,0
4,0
k€
3840
591
Crescita netta
HS06 / TB
42,2
118,2
160,4
k€
6642
1040
Dismissioni + crescita
HS06 / TB
73,1
207,9
281,0
k€
10482
1631
Domenico Elia
Riunione Referee Calcolo LHC / Padova, 25.5.2016
115,3
326,1
441,4
38
Backup
Domenico Elia
Riunione Referee Calcolo LHC / Padova, 25.5.2016
39
ALICE Computing status
Resource usage in 2015
 CPU resource evolution:
 steady grouth of the number of active jobs
 system scaled from 500 to 100,000 concurrently running jobs
 scheduled analysis now prevaling on chaotic analysis
organized analysis
+60% in 2015 wrt 2014
 better efficiency
Domenico Elia
Riunione Referee Calcolo LHC / Padova, 25.5.2016
40
ALICE Computing status
Run2 overview
Domenico Elia
Riunione Referee Calcolo LHC / Padova, 25.5.2016
41
ALICE Computing status
Status of 2015 data processing
• Substantial IR-induced distortions in the TPC
• Affect both p-p and Pb-Pb data
• Sophisticated correction
algorithms development in the
past 6 months
• Data reconstructed partially
(first physics, Lower IR runs)
• Bulk of reconstruction still
pending
Domenico Elia
Riunione Referee Calcolo LHC / Padova, 25.5.2016
42
42
Fly UP