Comments
Transcript
Riunione con Referee Calcolo LHC - Indico
Riunione con Referee Calcolo LHC Padova, 25 Maggio 2016 Calcolo ALICE: stato e richieste finanziarie Domenico Elia Domenico Elia Riunione Referee Calcolo LHC / Padova, 25.5.2016 1 Outline ALICE Computing status: impiego delle risorse 2015, attività calcolo Run2 performance siti italiani, attività di R&D Richieste finanziarie: situazione CPU e storage nei Tier-2, dismissioni richieste suppletive 2016 (Tier-1) richieste ordinarie 2017 (Tier-1 e Tier-2) Domenico Elia Riunione Referee Calcolo LHC / Padova, 25.5.2016 2 ALICE Computing status First year Run2 data taking pp @ 13 TeV PbPb @ 5.02 TeV Domenico Elia Riunione Referee Calcolo LHC / Padova, 25.5.2016 3 ALICE Computing status First year Run2 data taking 2010-2013 – 7.3 PB (one replica) All data processed in final reconstruction pass 2015 – 7.2 PB (one replica) Domenico Elia Riunione Referee Calcolo LHC / Padova, 25.5.2016 4 ALICE Computing status Resource usage in 2015 Overall CPU/DISK/TAPE usage: CPU @ T1, T2 over pledge (opportunistic, extra-WLCG) DISK usage below request (delay in 2015 data reconstruction) high TAPE usage (unexpected high pile-up in pp 13 TeV bs 25 ns) CERN-RRB-2016-049 Domenico Elia Riunione Referee Calcolo LHC / Padova, 25.5.2016 5 ALICE Computing status Resource usage in 2015 ALICE Grid: new entries in 2015-2016: Domenico Elia Riunione Referee Calcolo LHC / Padova, 25.5.2016 6 ALICE Computing status Resource usage in 2015 ALICE Grid: new entries in 2015-2016 HLT farm used for offline activities (when not in run): Domenico Elia included in the Grid as a fully virtual site Riunione Referee Calcolo LHC / Padova, 25.5.2016 7 ALICE Computing status Resource usage in 2015 ALICE Grid: new entries in 2015-2016 HLT farm used for offline activities (when not in run) usual share of the activities: Domenico Elia ~150 MC cycles (papers + first physics analysis of 2015 data) Run1 raw data re-processing, Run2 data processing (bulk of raw and MC production for Run2, both pp and PbPb, still to be done) organized and user (chaotic) analysis Riunione Referee Calcolo LHC / Padova, 25.5.2016 8 RAW data processing: 9% ALICE Computing status Resource usage in 2015 ALICE Grid: User analysis: 6% Organized analysis: 14% new entries in 2015-2016 HLT farm used for offline activities usual share of the activities: MC productions: 71% 61K parallel jobs on average Domenico Elia Riunione Referee Calcolo LHC / Padova, 25.5.2016 9 ALICE Computing status Resource usage in 2015 Current activities: MC productions (papers, first physics 2015, upgrade) Raw data processing: code improved to reduce memory consumption (now 2 GB/job) 2015 data reconstructed partially: - Domenico Elia distortions in the TPC occur in runs with high interaction rate specific corrections need to be developed, currently being validated plan to complete fully calibrated reconstruction by next ~1-1.5 months Riunione Referee Calcolo LHC / Padova, 25.5.2016 10 ALICE Computing status Resource usage in 2015 Current activities: MC productions (papers, first physics 2015, upgrade) Raw data processing Changing replication policy: needed to cope with the available storage single ESD replica global disk space needed for 2015 processing: - Domenico Elia 5-6 PB (RAW + MC) barely feasible with the expected resources Riunione Referee Calcolo LHC / Padova, 25.5.2016 11 ALICE Computing status Resource usage in 2015 Current activities: MC productions (papers, first physics 2015, upgrade) Raw data processing Changing replication policy Popularity and cleanup: - removed very old MC productions removed second ESD replica for low acces productions Volume of data vs Nr of accesses in X=3,6,12 months First bin: data created before period X began and not accessed during that period Domenico Elia Riunione Referee Calcolo LHC / Padova, 25.5.2016 12 ALICE Computing status Performance of the Italian sites TO BA LNL CT ~14% CNAF INFN Problems with the LUSTRE FS in the old Bari site (BC2S) fully migrated to the new ReCaS datacenter Domenico Elia Riunione Referee Calcolo LHC / Padova, 25.5.2016 13 ALICE Computing status Performance of the Italian sites Resource usage @ T2: following the usual internal coordination plan monthly meetings (performance recording) + annual workshop overall ~50% increase in total WCT from 2014 to 2015 Domenico Elia Riunione Referee Calcolo LHC / Padova, 25.5.2016 14 ALICE Computing status Performance of the Italian sites Resource usage @ T2: following the usual internal coordination plan monthly meetings (performance recording) + annual workshop overall ~50% increase in total WCT from 2014 to 2015 large upgrade in 2 sites (ReCaS) within 2015: CATANIA (in production since April, ~1500 core, 1 PB: Catania-VF) BARI (in production for ALICE since mid-August): ~300 server, 105 kHS06 (~10000 core) - 25 kHS06 CMS pledge + 10 kHS06 ALICE pledge ~4 PB disk storage + 2.75 PB tape library - 900 TB CMS pledge + 900 TB ALICE pledge 20 Gbit/s network connection (ready for 40 Gbit/s) Domenico Elia Riunione Referee Calcolo LHC / Padova, 25.5.2016 15 ALICE Computing status New ReCaS BA infrastructure Official opening July 9, 2015: https://agenda.infn.it/conferenceDisplay.py?confId=9856 BARI Tier-2 from BC2S to ReCaS: - migration from LUSTRE to pure XRootD large opportunistic use of CPU (up to ~6000 slots) ReCaS BC2S Domenico Elia Pledge 2015 Riunione Referee Calcolo LHC / Padova, 25.5.2016 16 ALICE Computing status Performance of the Italian sites New ReCaS center in Bari New ReCaS center in Catania: Catania-VF Pledge: Catania Bari Torino PD-LNL Domenico Elia Riunione Referee Calcolo LHC / Padova, 25.5.2016 17 ALICE Computing status Performance of the Italian sites Monitoring T2 data from APEL: https://faust01.to.infn.it/#/dashboard/script/pledge_mc_sum.js BA LNL CT TO Domenico Elia Riunione Referee Calcolo LHC / Padova, 25.5.2016 T1 18 ALICE Computing status R&D activity and s/w for Run3 Virtual Analysis Facility (STOA-LHC PRIN): Cloud-based VAF deployed in BA, CA, LNL, TO and TS XRootD-based Data Federation (DF) set-up and populated: local redirectors in each site + national redirector in BA system fully tested, final PRIN report completed by end of April ’16 Software development for Run3: ITS standalone tracking based on cellular automaton (TO) ITS geometry (AL) response simulation for the pixel (pAlpide) chip (TS, BS-PV) First experience with EOS at TS Domenico Elia Riunione Referee Calcolo LHC / Padova, 25.5.2016 19 ALICE Computing status R&D activity on the Dashboard The project: a Dashboard concentrate in a single graphical interface all the information concerning the ALICE activity in each site (MonALISA, local Batch system, local Monitoring system metrics). • Currently running in the Bari T2 site (since ~2 years) • Recently exported also to the Torino site • Next steps: – export in all ALICE T2 and others WLCG sites – global dashboard for the Italian computing in ALICE • Abstract submitted to CHEP’16 Domenico Elia Riunione Referee Calcolo LHC / Padova, 25.5.2016 20 Sito web calcolo ALICE Italia https://web2.infn.it/ALICE-Italia-computing/index.php/it/ Domenico Elia Riunione Referee Calcolo LHC / Padova, 25.5.2016 21 Sito web calcolo ALICE Italia https://web2.infn.it/ALICE-Italia-computing/index.php/it/ Domenico Elia Riunione Referee Calcolo LHC / Padova, 25.5.2016 22 Situazione risorse e richieste finanziarie Domenico Elia Riunione Referee Calcolo LHC / Padova, 25.5.2016 23 Richieste finanziarie Situazione CPU/storage Italia In produzione al Tier-1: CPU: DISK: TAPE: Quota pledge 2015-2016 da Febbraio 2016 (x2) 29000 HS06 (pledge 2016) 3900 TB (pledge 2016) 5500 TB (pledge 2016) In produzione ai Tier-2 (+ Cagliari): Pledge 2016: 43845 HS06 + 4829 TB Disponibili (incluso obsoleti non ancora dismessi) Maggio 2016 Bari HS06 TB Domenico Elia Catania PadovaLNL Torino Cagliari Totale 12080 13147 16881 10373 1120 53601 984 1204 1152 1123 70 4533 Riunione Referee Calcolo LHC / Padova, 25.5.2016 24 Richieste finanziarie Situazione CPU/storage Tier-2 Acquisti seconda metà 2015: CPU: 1720 HS06 a LNL (bonus ~450 HS06) + 1400 HS06 a TO storage: espansioni 4x180 TB a BA e LNL (bonus ~50 TB) esito ottimizzato con combinazione gare (BA) e acquisti di sito Domenico Elia Riunione Referee Calcolo LHC / Padova, 25.5.2016 25 Richieste finanziarie Situazione CPU/storage Tier-2 Finanziamento 2016 da CSN3: richieste: 435 k€ (387 crescita e rimpiazzi + 48 overhead) assegnazioni: 332 k€ (308 crescita e rimpiazzi + 24 overhead) rinvio dismissioni storage CT/CA, per metà dismissioni PD-LNL e TO assegnata al 50% la richiesta overhead pledge 2016 garantite in accordo all’esito CRSG/RRB Domenico Elia Riunione Referee Calcolo LHC / Padova, 25.5.2016 26 Richieste finanziarie Situazione CPU/storage Tier-2 Finanziamento 2016 da CSN3: richieste: 435 k€ (387 crescita e rimpiazzi + 48 overhead) assegnazioni: 332 k€ (308 crescita e rimpiazzi + 24 overhead) Schema suddivisione tra i siti: CPU: ~14400 HS06 BA: LNL: TO: 10200 HS06 (1950 crescita + 1568 rimpiazzi = 3518 HS06) 10200 HS06 (2500 crescita + 5496 rimpiazzi = 7996 HS06) 10300 HS06 (1300 crescita + 1584 rimpiazzi = 2884 HS06) DISK: ~620 TB BA: LNL: TO: Domenico Elia 1184 TB (260 crescita = 260 TB) 1202 TB (50 crescita + 130 rimpiazzi = 180 TB) 1223 TB (100 crescita + 80 rimpiazzi = 180 TB) Riunione Referee Calcolo LHC / Padova, 25.5.2016 27 Richieste finanziarie Situazione CPU/storage Tier-2 Finanziamento 2016 da CSN3: richieste: 435 k€ (387 crescita e rimpiazzi + 48 overhead) assegnazioni: 332 k€ (308 crescita e rimpiazzi + 24 overhead) Situazione acquisti 2016: completati: BA: LNL: 8600 HS06 (LNL) + licenza per espansione storage 3840 HS06 (BA) + 180 TB (espansione per LNL) da finalizzare: BA: 260 TB (gara comune con CMS, totale ~200 k€) TO: 2880 HS06 + 180 TB (sinergie con acquisti altre sigle e C3S) overhead (ricognizione esigenze completata e storni effettuati) Domenico Elia Riunione Referee Calcolo LHC / Padova, 25.5.2016 28 Richieste finanziarie Situazione CPU/storage Tier-2 Situazione aggiornata con risorse 2016: CPU: 45333 HS06 DISK: 4876 TB in eccesso al pledge: 1488 HS06 in eccesso al pledge: 47 TB Disponibili a fine 2016 (fatte dismissioni + completati acquisti 2016*) Bari HS06 TB Catania PadovaLNL Torino Cagliari Totale 10512 13147 11385 10289 0 45333 1244 1204 1202 1226 0 4876 * Ipotesi di buon esito acquisti residui a BA e TO Pledge 2016: 43845 HS06 + 4829 TB Domenico Elia Riunione Referee Calcolo LHC / Padova, 25.5.2016 29 Richieste finanziarie Dismissioni 2016-17 Anno di dismissione Bari Catania LNLPadova Torino Cagliari Totale HS06 2016 1568 0 5496 1584 1120 9768 TB 2016 0 130 260 157 20 567 HS06 2017 0 0 0 3840 0 3840 TB 2017 0 114 0 117 0 231 Rinvio dismissioni storage dalla seconda metà del 2016 al 2017: 130 TB (CT) + 130 TB (LNL) + 80 TB (TO) + 20 TB (CA) = 360 TB Domenico Elia Riunione Referee Calcolo LHC / Padova, 25.5.2016 30 Richieste finanziarie Dismissioni 2016-18 Anno di dismissione Bari Catania LNLPadova Torino Cagliari Totale HS06 2016 1568 0 5496 1584 1120 9768 TB 2016 0 0 130 77 0 207 HS06 2017 0 0 0 3840 0 3840 TB 2017 0 244 130 197 20 591 HS06 2018 6672 13147 0 2149 0 21968 TB 2018 0 0 0 205 0 205 Rinvio dismissioni storage dalla seconda metà del 2016 al 2017: 130 TB (CT) + 130 TB (LNL) + 80 TB (TO) + 20 TB (CA) = 360 TB Dismissioni ReCaS (BA e CT) previste nel 2018 = 20000 HS06 Domenico Elia Riunione Referee Calcolo LHC / Padova, 25.5.2016 31 Richieste finanziarie Dismissioni 2016-18 Anno di dismissione Bari Catania LNLPadova Torino Cagliari Totale HS06 2016 1568 0 5496 1584 1120 9768 TB 2016 0 0 130 77 0 207 HS06 2017 0 0 0 3840 0 3840 TB 2017 0 244 130 197 20 591 HS06 2018 6672 13147 0 2149 0 21968 TB 2018 0 0 0 205 0 205 Situazione complessiva Tier-2 a inizio 2017: CPU: 45333 – 3840 = 41493 HS06 DISK: 4876 – 591 = 4285 TB Domenico Elia Riunione Referee Calcolo LHC / Padova, 25.5.2016 32 Richieste finanziarie RRB Aprile 2016 RRB October 2015 Share INFN per 2017: CPU, DISK per Tier-1 e Tier-2: 18.9% (18.5% per 2016) TAPE per Tier-1: 34.8% (35.2 per 2016, 41.1% per 2015) Domenico Elia Riunione Referee Calcolo LHC / Padova, 25.5.2016 33 Richieste finanziarie RRB Aprile 2016 +7% (4%) CPU al Tier-1 (0) increased processing time for high pile-up pp events (x2) + TPC calibration issues +30% (22%) TAPE al Tier-1 (0) increased raw data volume for pp events (x3.5) as observed in 2015 sample Domenico Elia Riunione Referee Calcolo LHC / Padova, 25.5.2016 34 Richieste finanziarie RRB Aprile 2016 +7% (4%) CPU al Tier-1 (0) increased processing time for high pile-up pp events (x2) + TPC calibration issues +30% (22%) TAPE al Tier-1 (0) increased raw data volume for pp events (x3.5) as observed in 2015 sample Richiesta suppletiva 2016 per Tier-1: CPU: 2700 HS06 35 k€ (pledge 2016 rev: 31752 HS06) TAPE: 1.6 PB 40 k€ (pledge 2016 rev: 7.1 PB) Domenico Elia Riunione Referee Calcolo LHC / Padova, 25.5.2016 35 Richieste finanziarie RRB Aprile 2016 RRB October 2015 Incrementi 2016 rev. 2017: CPU: 13.8% (T0) 31.5% (T1) 17% (T2) DISK: 27.4% 16.8% 19.9% TAPE: 30.8% 39.9% Domenico Elia Riunione Referee Calcolo LHC / Padova, 25.5.2016 36 Richieste finanziarie Richieste 2017: Tier-1 e Tier-2 CPU Tier-1 (HS06) DISK Tier-1 (TB) TAPE Tier-1 (TB) CPU Tier-2 (HS06) DISK Tier-2 (TB) Pledged T1 Disp. – dismiss. T2 31752 3885 7064 41493 4285 Scrutinati ALICE 2017 41769 4725 9883 51975 5916 Delta 10017 840 2819 10482 1631 Stima costo (k€) 130.2 176.4 70.5 115.3 326.2 377.1 Totale (k€) 441.5 54.1 Overhead T2 (k€) Stima costi T2 (T1): 11 (13) € / HS06 e 200 (210) € / TB Dismissioni Tier-1: non incluse Overhead Tier-2: 6% CPU + 5% DISCO (rete) + 7% totale (server aggiuntivi) Domenico Elia Riunione Referee Calcolo LHC / Padova, 25.5.2016 37 Richieste finanziarie Richieste 2017: per sito Tier-2 Dismissioni HS06 / TB Bari Dismissioni totale k€ 0 0 Catania 0 244 LNL-Padova 0 130 Torino 3840 197 Cagliari 0 20 HS06 / TB 0,0 0,0 0,0 0,0 48,8 48,8 0,0 26,0 26,0 42,2 39,4 81,6 0,0 4,0 4,0 k€ 3840 591 Crescita netta HS06 / TB 42,2 118,2 160,4 k€ 6642 1040 Dismissioni + crescita HS06 / TB 73,1 207,9 281,0 k€ 10482 1631 Domenico Elia Riunione Referee Calcolo LHC / Padova, 25.5.2016 115,3 326,1 441,4 38 Backup Domenico Elia Riunione Referee Calcolo LHC / Padova, 25.5.2016 39 ALICE Computing status Resource usage in 2015 CPU resource evolution: steady grouth of the number of active jobs system scaled from 500 to 100,000 concurrently running jobs scheduled analysis now prevaling on chaotic analysis organized analysis +60% in 2015 wrt 2014 better efficiency Domenico Elia Riunione Referee Calcolo LHC / Padova, 25.5.2016 40 ALICE Computing status Run2 overview Domenico Elia Riunione Referee Calcolo LHC / Padova, 25.5.2016 41 ALICE Computing status Status of 2015 data processing • Substantial IR-induced distortions in the TPC • Affect both p-p and Pb-Pb data • Sophisticated correction algorithms development in the past 6 months • Data reconstructed partially (first physics, Lower IR runs) • Bulk of reconstruction still pending Domenico Elia Riunione Referee Calcolo LHC / Padova, 25.5.2016 42 42