Comments
Description
Transcript
Slides
DB ES Experiment Support AliEn v2-20 and beyond Workshop dei Tier-2 italiani di ALICE A. Abramyan, S. Bagnasco, L. Betev, D. Goyal, A. Grigoras, C. Grigoras, M. Litmaath, N. Manukyan, M. Martinez, J. Porter, P. Saiz, S. Sankar, S. Schreiner CERN IT Department CH-1211 Geneva 23 Switzerland www.cern.ch/it ES Content • What is AliEn • New features on v2.20 – TaskQueue – Catalogue – Service communication • What is next? • Summary CERN IT Department CH-1211 Geneva 23 Switzerland www.cern.ch/it 19 Dec 2012 Pablo Saiz Workshop dei Tier-2 italiani di ALICE 2 ES AliEn • All components to create a GRID • File Catalogue – – – – UNIX-like file system Mapping to physical files Metadata information SE discovery • Transfer Model – With different plugins • TaskQueue – Job Agent & pull model – Automatic installation of software packages – Simulation, reconstruction, analysis... • Developed by ALICE – Used by several communities CERN IT Department CH-1211 Geneva 23 Switzerland www.cern.ch/it 19 Dec 2012 Pablo Saiz Workshop dei Tier-2 italiani di ALICE 3 ES AliEn File Catalogue • Global Unique name space – Mapping from LFN to PFN • • • • • • • • CERN IT Department CH-1211 Geneva 23 Switzerland www.cern.ch/it UNIX-like file system interface Powerful metadata catalogue Automatic SE selection Integrated quota system Multiple storage protocols: xrootd, torrent, srm, file Collections of files Physical file archival Roles and users 19 Dec 2012 Pablo Saiz Workshop dei Tier-2 italiani di ALICE 4 ES Job execution JOB JobJOB JOB Manager JOB TASKQUEUE Site C Job Broker CE Site B JA MonALISA CE File catalogue JA xrootdCREAMCE MonALISA LFN GUID Meta data Site A xrootd CREAMCE JA CE MonALISA xrootdCREAMCE CERN IT Department CH-1211 Geneva 23 Switzerland www.cern.ch/it 19 Dec 2012 Pablo Saiz Workshop dei Tier-2 italiani di ALICE 5 ES CERN IT Department CH-1211 Geneva 23 Switzerland www.cern.ch/it New in v2.20 19 Dec 2012 Pablo Saiz Workshop dei Tier-2 italiani di ALICE 6 ES TaskQueue database layout • Single DB • Innodb tables – Row locking – Foreign keys – Transactions • not used… • Lookup tables • 2 JDLs per job • JDL fields mapped to columns • Link to full graph CERN IT Department CH-1211 Geneva 23 Switzerland www.cern.ch/it 19 Dec 2012 Pablo Saiz Workshop dei Tier-2 italiani di ALICE 7 ES Brokering • Avoid Classad matching – Less fields to parse • Match in a single SQL statement. • Four attempts at matching: – – – – CERN IT Department CH-1211 Geneva 23 Switzerland www.cern.ch/it With packages already installed With any packages With remote data and packages already installed With remote data, any packages 19 Dec 2012 Pablo Saiz Workshop dei Tier-2 italiani di ALICE 8 ES File brokering Site A Site B Site C File 1 Current schema Submit 4 jobs: File 2 File1 File 4 File 3 File2 File3 File 5 File 4 File 5 Broker per file Submit 3 empty subjobs If nothing left, just exit File1, 2,4,5 When a job starts, analyze as much as possible CERN IT Department CH-1211 Geneva 23 Switzerland www.cern.ch/it 19 Dec 2012 Pablo Saiz File 3 Workshop dei Tier-2 italiani di ALICE 9 ES More TaskQueue • MaxWaitingTime: amount of time that job can stay in ‘WAITING’ – If time exceeded, job ends up in error – New state: ERROR_EW (Expired Waiting) • Retrial: – Number of times that a single job can be resubmitted – Resubmission done by central services • Reusing JobId in resubmission • Direct removal of KILLED jobs CERN IT Department CH-1211 Geneva 23 Switzerland www.cern.ch/it 19 Dec 2012 Pablo Saiz Workshop dei Tier-2 italiani di ALICE 10 ES Some results… • DB time to insert a job, and 8 change status: Time to process all 250M ALICE jobs: 4.8 days CERN IT Department CH-1211 Geneva 23 Switzerland www.cern.ch/it 19 Dec 2012 Pablo Saiz Workshop dei Tier-2 italiani di ALICE 11 ES Service communication • Replacing SOAP with JSON – Less overhead (no XML encoding) – Easier to interact with other clients • Backward incompatible change To be deployed in ALICE… CERN IT Department CH-1211 Geneva 23 Switzerland www.cern.ch/it 19 Dec 2012 Pablo Saiz Workshop dei Tier-2 italiani di ALICE 12 ES SOAP vs JSON • Apache web server • 32 hosts for clients – 16 cores – 8000 calls per client • Without SSL CERN IT Department CH-1211 Geneva 23 Switzerland www.cern.ch/it 19 Dec 2012 Pablo Saiz Workshop dei Tier-2 italiani di ALICE 13 ES Catalogue • Innodb tables – Row locking – Transactions – Foreign keys To be deployed in ALICE… CERN IT Department CH-1211 Geneva 23 Switzerland www.cern.ch/it 19 Dec 2012 Pablo Saiz Workshop dei Tier-2 italiani di ALICE 14 ES CERN IT Department CH-1211 Geneva 23 Switzerland www.cern.ch/it What is next? 19 Dec 2012 Pablo Saiz Workshop dei Tier-2 italiani di ALICE 15 ES And for the next versions… • • • • • • • • CERN IT Department CH-1211 Geneva 23 Switzerland www.cern.ch/it Trust model File popularity Interactive jobs Correlate Monitoring data Multi core jobagents Catalogue crawler Error classification Distributed brokering 19 Dec 2012 Pablo Saiz Workshop dei Tier-2 italiani di ALICE 16 ES File catalogue • Removal of GUID – Decrease size of the catalogue – Storage on the sites based on lfn+timestamp • Using file system instead of Database – Keep database for metadata, quotas, SE. • Improve handling of zip archives – More than 80% of the lfn are inside an archive CERN IT Department CH-1211 Geneva 23 Switzerland www.cern.ch/it 19 Dec 2012 Pablo Saiz Workshop dei Tier-2 italiani di ALICE 17 ES TaskQueue • Compression of JDL – And/or storing diffs • Brokering alternatives: – 2-level brokering • JA ask CM, CM asks in bulk the CS – Combining jobs with similar input • And dispatch them together • Multicore jobagent – One agent per core or per machine? CERN IT Department CH-1211 Geneva 23 Switzerland www.cern.ch/it 19 Dec 2012 Pablo Saiz Workshop dei Tier-2 italiani di ALICE 18 ES Human grid Scotland, VO to VO USA, JA memory Germany, ORACLE Switzerland, Main dev. Armenia, XML model File Popularity China, Trust Model Italy, CREAMCE India, File deletion South Korea, Quota system Chile, Trust model CERN IT Department CH-1211 Geneva 23 Switzerland www.cern.ch/it 19 Dec 2012 Pablo Saiz Workshop dei Tier-2 italiani di ALICE 19 ES Summary • Parts of AliEn v2.20 already deployed for ALICE! – Needs another intervention, with 48h downtime – PANDA runs all the latest components • TaskQueue speed improved drastically – 40 times insertion rate – 20 times resubmission time – Improved concurrency • Plenty of areas to develop and contribute CERN IT Department CH-1211 Geneva 23 Switzerland www.cern.ch/it 19 Dec 2012 Pablo Saiz Workshop dei Tier-2 italiani di ALICE 20