Comments
Description
Transcript
CareAndFeeding1
Care and feeding of the alice grid Torino, Jan 15-16, 2009 Alice and the grid S. Bagnasco, INFN Torino Care & Feeding of the ALICE Grid – Torino, Jan 15-16 2009 outline The ALICE Computing Model AliEn, the Alice Environment Integration with LCG/INFNGrid Then: Aliensh basics Job submission hands-on Job postmortem hands-on Monitoring hands-on Stefano Bagnasco - INFN Torino Care & Feeding of the ALICE Grid – Torino Jan 15-16, 2009 - 3/3475 The ALICE Computing Model Stefano Bagnasco - INFN Torino Care & Feeding of the ALICE Grid – Torino Jan 15-16, 2009 - 4/3475 The ALICE Computing Model For pp similar to the other experiments Quasi-online data distribution and first reconstruction at T0 Further reconstructions at T1’s For AA different model Calibration, alignment, pilot reconstructions and partial data export during data taking Data distribution and first reconstruction at T0 in the four months after AA run (shutdown) Further reconstructions at T1’s Stefano Bagnasco - INFN Torino Care & Feeding of the ALICE Grid – Torino Jan 15-16, 2009 - 5/3475 The ALICE Computing Model Three kinds of data analysis T0 (CERN) Does: first pass reconstruction; calibration and alignment Stores: one copy of RAW, calibration data and first-pass ESDs T1s Fast pilot analysis of the data “just collected” to tune the first reconstruction at CERN Analysis Facility (CAF) Scheduled batch analysis on the Grid (ESDs and AODs) End-user interactive or batch analysis using PROOF and GRID (AODs and ESDs) Does: reconstructions and scheduled batch analysis Stores: second collective copy of RAW, one copy of all data to be kept, disk replicas of ESDs and AODs T2s Does: simulation and end-user analysis Stores: disk replicas of AODs and ESDs Stefano Bagnasco - INFN Torino Care & Feeding of the ALICE Grid – Torino Jan 15-16, 2009 - 6/3475 The alice computing model Stefano Bagnasco - INFN Torino Care & Feeding of the ALICE Grid – Torino Jan 15-16, 2009 - 7/3475 The components AliRoot • • AliEn Data access MonALISA Data catalogue Job management Xrootd ROOT + Geant3 + … (You probably know this better than I do…) Monitoring Underlying infrastructure LCG/INFNGrid • But also OSG, NorduGrid,… that use different middleware Stefano Bagnasco - INFN Torino Care & Feeding of the ALICE Grid – Torino Jan 15-16, 2009 - 8/3475 ALICE Computing Centres Stefano Bagnasco - INFN Torino Care & Feeding of the ALICE Grid – Torino Jan 15-16, 2009 - 9/3475 2 Alien The alice environment Stefano Bagnasco - INFN Torino Care & Feeding of the ALICE Grid – Torino Jan 15-16, 2009 - 10/3475 credits Stefano Bagnasco - INFN Torino Care & Feeding of the ALICE Grid – Torino Jan 15-16, 2009 - 11/3475 credits Stefano Bagnasco - INFN Torino Care & Feeding of the ALICE Grid – Torino Jan 15-16, 2009 - 12/3475 Alien components Job management Task Queue Run on the TQ Enforce policies, split jobs, etc. File Catalogue Similar to the Task Queue Uses FTS or xrootd Storage Element Run jobs on sites With metadata File Transfer Service Job Agents Job optimizers Database of all submitted jobs Keeps track of status, etc. Data management Not really a piece of AliEn Several “flavours” exist Cluster Monitor Site service working as a proxy for Job Agents Package Manager Did not know where to put this Stefano Bagnasco - INFN Torino Care & Feeding of the ALICE Grid – Torino Jan 15-16, 2009 - 13/3475 Job execution basic concepts “Pull model” Task Queue Central DB holds record of ALL jobs VO-Box Works better than push… “Edge service”, acts as an interface between AliEn and underlying Grid Job Agent A.k.a. “Pilot Job”, “Joblet”, “Dirty trick”, “Damn ALICE thing” “Virtual grid” on top of different flavours Identity issue: all jobs on a site run with the same credentials Stefano Bagnasco - INFN Torino Care & Feeding of the ALICE Grid – Torino Jan 15-16, 2009 - 14/3475 AliEn Structure Central Services Mon ALIS A API Authen Task Queue Manager Opt. Broke r Mon ALISA ISS Opt. LDAP Proxy CM IS File Catalogue Transfers Opt. Logger Manager Site Services Broke r ~ 70 in ALICE xrootd JA FTD SE CE Pack Man CM Mon ALISA xrootd JA FTD SE CE Pack Man … CM Mon ALISA xrootd JA CE SE FTD Pack Man Pablo Saiz’s talk @ Offline Week Oct 2008 Deployed for: Stefano Bagnasco - INFN Torino Care & Feeding of the ALICE Grid – Torino Jan 15-16, 2009 - 15/3475 Central services Stateless services Several instances New services: Messages IS Opt. Authen PackManMaster SEMaster Messages Proxy Manager Logger Opt. Broke r Opt. Broke r Security envelope Reduced Proxy Running on alias SEMaster API Biggest improvements Mon ALISA PackMan Master Only servers below a certain threshold may answer If all services loaded, no new connections Keep connection to database Manager Pablo Saiz’s talk @ Offline Week Oct 2008 Stefano Bagnasco - INFN Torino Care & Feeding of the ALICE Grid – Torino Jan 15-16, 2009 - 16/3475 Site services Reduced connections PackMan talks to PackManMaster SE talks to SEMaster JA talks to Authen Access to replicas To do: On jobs (Artem’s banking system) On files Pre-staging of files Mon ALISA JA FTD SE Verify PackMan dependencies Enable automatic orphan file deletion Quotas: • • xrootd CM CE Pack Man Pablo Saiz’s talk @ Offline Week Oct 2008 Stefano Bagnasco - INFN Torino Care & Feeding of the ALICE Grid – Torino Jan 15-16, 2009 - 17/3475 Beware! An AliEn “job” is different from an ALICE LCG “job” An LCG job: is run with alicesgm credentials It is submitted to an RB/WMS and shipped to a CE It starts the AliEn JobAgent It goes through LCG job state machine (ready, waiting, scheduled, etc.) It is NEVER directly submitted by an ALICE user! An AliEn job: Is submitted by a user or by the production system It is run by a JobAgent (which was started by the LCG job) It goes through the AliEn jobs states Stefano Bagnasco - INFN Torino Care & Feeding of the ALICE Grid – Torino Jan 15-16, 2009 - 18/3475 Data catalogue Used by all other services Mapping from LFN to SE and PFN UNIX-like file system GUID Tier1 |--./ | |--cern.ch/ | | |--user/ | | | |--a/ | | | | |--admin/ | | | | | | | | | |--aliprod/ | | | | | | | |--f/ | | | | |--fca/ | | | | | | | |--p/ | | | | |--psaiz/ | | | | | |--as/ | | | | | | | | | | | |--dos/ | | | | | | | | | | | |--local/ ALICE LOCAL ALICE USERS ALICE SIM |--simulation/ | |--2001-01/ | | |--V3.05/ | | | |--Config.C | | | |--grun.C | | | | | | | | | | | | | | |--36/ | |--stderr | |--stdin | |--stdout | |--37/ | |--stderr | |--stdin | |--stdout | |--38/ | |--stderr | |--stdin | |--stdout | | | | | | | |--b/ | | | | |--barbera/ Pablo Saiz’s talk @ CHEP07 Stefano Bagnasco - INFN Torino Care & Feeding of the ALICE Grid – Torino Jan 15-16, 2009 - 19/3475 Job state machine Stefano Bagnasco - INFN Torino Care & Feeding of the ALICE Grid – Torino Jan 15-16, 2009 - 20/3475 Data catalogue features Split between LFN and GUID catalogues Automatic PFN generation File collections Triggers Metadata Thus no need for ‘Local File catalogue’ on the SE Advanced features Fast queries if GUID cached User-defined schemA At the file or diectory level Expiration time of the entries Depending on the storage system, no need for the user to ‘clean up’ Pablo Saiz’s talk @ CHEP07 Stefano Bagnasco - INFN Torino Care & Feeding of the ALICE Grid – Torino Jan 15-16, 2009 - 21/3475 Independent LFN and guid catalogues AliEn File & Metadata Catalogue GUID Catalogue Index LFN GUID Index / /alice /alice/user/p/psaiz /alice/simulation/2 006 … 1-JAN-1970 1-JAN-2006 14-FEB-2007 23-AUG-2008 … GUID PFN Pablo Saiz’s talk @ CHEP07 LFN Catalogue Stefano Bagnasco - INFN Torino Care & Feeding of the ALICE Grid – Torino Jan 15-16, 2009 - 22/3475 Data transfers All “scheduled” transfers use the FTD T0- T1 transfers use LCG’s FTS Transfer queue similar to the TQ Aliensh “mirror” command Defined “channels” Data go in and out the SEs via SRM interface T1-T2 and T2-T2 use xroot No predefined channels Data go in and out the SEs via xrootd server Stefano Bagnasco - INFN Torino Care & Feeding of the ALICE Grid – Torino Jan 15-16, 2009 - 23/3475 Authentication and authorization Authentication via Grid Proxy certificate VOMS extensions And subsequently via session token Authorization: All authorization and policies enforced in the central services (TQ for jobs, FC for data) Authorization information for storage sent via secure “sealed envelope” mechanism (see Andreas Peters and Derek Feichtinger’s presentation) • SB note: nobody except AP and DF really understand how this works Pablo Saiz’s talk @ Offline Week Oct 2008 Stefano Bagnasco - INFN Torino Care & Feeding of the ALICE Grid – Torino Jan 15-16, 2009 - 24/3475 Connection via gapi service libgapiUI GAPI Server AlienAS.pl Aliensh:[1]> API Clients API Service Middleware Authentication chain The user cert is used to generate a proxy • The proxy is used to obtain a session token • This is done automatically or by hand Encrypted communications Submission is done via ‘alicesgm’ user proxies • (at least for now) Stefano Bagnasco - INFN Torino Care & Feeding of the ALICE Grid – Torino Jan 15-16, 2009 - 25/3475 Alien2 Stefano Bagnasco - INFN Torino Care & Feeding of the ALICE Grid – Torino Jan 15-16, 2009 - 26/3475 User interface Aliensh New commands: a nearly standard bash shell with extensions setSElimit: view only the part of the catalogue present in a particular SE jobListMatch: print requirements that prevent a job from running get collections: Copy all the files of a collection, keeping the same lfn Automatic transfer resubmission To do: Combine ‘find’ and setSElimit Pablo Saiz’s talk @ Offline Week Oct 2008 Stefano Bagnasco - INFN Torino Care & Feeding of the ALICE Grid – Torino Jan 15-16, 2009 - 27/3475 Monalisa Stefano Bagnasco - INFN Torino Care & Feeding of the ALICE Grid – Torino Jan 15-16, 2009 - 28/3475 We know where you are! Stefano Bagnasco - INFN Torino Care & Feeding of the ALICE Grid – Torino Jan 15-16, 2009 - 29/3475 monalisa Stefano Bagnasco - INFN Torino Care & Feeding of the ALICE Grid – Torino Jan 15-16, 2009 - 30/3475 monalisa Stefano Bagnasco - INFN Torino Care & Feeding of the ALICE Grid – Torino Jan 15-16, 2009 - 31/3475 On to the gory details Please don appropriate equipment Stefano Bagnasco - INFN Torino Care & Feeding of the ALICE Grid – Torino Jan 15-16, 2009 - 32/3475 Integration with LCG and infngrid Stefano Bagnasco - INFN Torino Care & Feeding of the ALICE Grid – Torino Jan 15-16, 2009 - 33/3475 Job submission loop Submits job ALICE Job Catalogue ALICE File Catalogue Job 1.1 1 lfn1, lfn1lfn2, lfn3, lfn4 lfn guid {se’s} Job 1.2 2 lfn1, lfn2lfn2, lfn3, lfn4 lfn guid {se’s} Job 1.3 3 lfn1, lfn3, lfn2, lfn4lfn3 lfn guid {se’s} Job 2.1 lfn1, lfn3 lfn guid {se’s} Job 2.1 lfn2, lfn4 lfn guid {se’s} Job 3.1 lfn1, lfn3 Job 3.2 lfn2 Optimizer Registers output ALICE central services Execs agent Close SE’s & Software Matchmakes Updates TQ User Retrieves workload Yes Site Env OK? Asks work-load No Die with grace Receives work-load Sends job result packman Computing Agent RB CE WN Sends job agent to site Stefano Bagnasco - INFN Torino Care & Feeding of the ALICE Grid – Torino Jan 15-16, 2009 - 34/3475 Integration with infngrid User interaction always through AliEn Job submission & tracking • Catalogue query & data management • Aliensh “ls”, “find”, “cp”, “tag”… Data access for analysis • • • Aliensh “submit”, “ps”… Aliensh “cp” to a local file TGrid::Connect(“alien://”) from root Tfile::Open(“alien://<LFN>”) through xrootd from root No need to use an LCG UI AliEn installs on laptop Interacts with UI at sites (“VO-Box”) Stefano Bagnasco - INFN Torino Care & Feeding of the ALICE Grid – Torino Jan 15-16, 2009 - 35/3475 The vo-box LCG RB Job submission VO-Box File Catalogue CE Interface LCG Site LCG CE LCG SE File Registration SE Interface PackMan WN JobAgent Job configuration request(s) SB’s talk @ almost everywhere TQ Stefano Bagnasco - INFN Torino Care & Feeding of the ALICE Grid – Torino Jan 15-16, 2009 - 36/3475 Vobox bits and pieces Implement as much as possible thin interface services Use the VO-Box manager’s certificate All jobs in a site still share the same LCG user As requested by some sites, an enhancement for security: glexec is still under discussion Service interfaces on the VO-Box: To (stable) LCG standard services Be “good citizens” of the Grid – xrootd is now a front door Job Submission (WMS clients) are more or less ready to use gLite SRM clients useful in T-1 only, xrootd redirector on VO-Box not recommended Xroot is used for T1-T2 and T2-T2 data transfer LFC not used any more (if it ever was…) Proprietary services: Package Manager Cluster Monitor SB’s talk @ INFNGRID Workshop 2006 Stefano Bagnasco - INFN Torino Care & Feeding of the ALICE Grid – Torino Jan 15-16, 2009 - 37/3475 advanced features in the Vobox Failover submission WMS monitoring Several RB, with memory and fallback Via queries to L&B and IS SAM tests Monitoring LCG & AliEn services, proxy lifetimes, WMS,… Stefano Bagnasco - INFN Torino Care & Feeding of the ALICE Grid – Torino Jan 15-16, 2009 - 38/3475 configuration The LDAP Database Local configuration file ldap://aliendb06a.cern.ch:8389 DN o=alice,dc=cern,dc=ch http://alien.cern.ch/twiki/bin/view/AliEn/VOBoxConfigurationReference On the VO-Box: ~alicesgm/.alien/alice.conf Used only for tests & debugging if localconfig=“add” or “overwrite” Environment files ${ALIEN_HOME}/.Environment ~/.alien/Environment Stefano Bagnasco - INFN Torino Care & Feeding of the ALICE Grid – Torino Jan 15-16, 2009 - 39/3475 Site configuration Stefano Bagnasco - INFN Torino Care & Feeding of the ALICE Grid – Torino Jan 15-16, 2009 - 40/3475 Proxies on the vo-box Intricate issue… The proxies used to submit JobAgent (that are LCG jobs!) are kept in a DB on the VO-Box They are kept alive by a specific service using a myproxy server Proxy lifetime monitored by MonALISA See also: http://alien.cern.ch/twiki/bin/view/AliEn/HowToManageVOBoxProxies Stefano Bagnasco - INFN Torino Care & Feeding of the ALICE Grid – Torino Jan 15-16, 2009 - 41/3475 Proxy mgmt VO-Box MyProxy Server VOMS DB PRS voms-proxy-init --voms alice:/alice/Role=lcgadmin AliEn The GridTM WMS FTS Server CREAM Resource Broker LCG User Interface Stefano Bagnasco - INFN Torino Care & Feeding of the ALICE Grid – Torino Jan 15-16, 2009 - 42/3475 Proxy mgmt MyProxy Server VOMS DB VO-Box PRS AliEn myproxy-init -s myproxy.cern.ch -d -n -t 48 -c 720 WMS FTS Server CREAM Resource Broker LCG User Interface Stefano Bagnasco - INFN Torino Care & Feeding of the ALICE Grid – Torino Jan 15-16, 2009 - 43/3475 Proxy mgmt VO-Box MyProxy Server VOMS DB PRS AliEn Gsissh –p 1975 [email protected] WMS FTS Server CREAM Resource Broker LCG User Interface Stefano Bagnasco - INFN Torino Care & Feeding of the ALICE Grid – Torino Jan 15-16, 2009 - 44/3475 Proxy mgmt MyProxy Server VOMS DB VO-Box PRS AliEn vobox-proxy --vo alice --voms alice:/alice/Role=lcgadmin register WMS FTS Server CREAM Resource Broker LCG User Interface Stefano Bagnasco - INFN Torino Care & Feeding of the ALICE Grid – Torino Jan 15-16, 2009 - 45/3475 Proxy mgmt MyProxy Server VOMS DB VO-Box PRS AliEn /opt/lcg/bin/lcg-proxy-renew –a $file –d –t 72 – –cert –o /tmp/tmpfile.$$ $X509_USER_PROXY – –key $X509_USER_KEY WMS FTS Server CREAM Resource Broker LCG User Interface Stefano Bagnasco - INFN Torino Care & Feeding of the ALICE Grid – Torino Jan 15-16, 2009 - 46/3475 Proxy mgmt MyProxy Server VOMS DB VO-Box PRS AliEn WMS FTS Server CREAM Resource Broker LCG User Interface Stefano Bagnasco - INFN Torino Care & Feeding of the ALICE Grid – Torino Jan 15-16, 2009 - 47/3475 Proxy mgmt VO-Box MyProxy Server VOMS DB PRS AliEn The MyProxy The “user” Proxy The Certificate The Login Proxy The UI Proxy WMS FTS Server CREAM Resource Broker LCG User Interface Stefano Bagnasco - INFN Torino Care & Feeding of the ALICE Grid – Torino Jan 15-16, 2009 - 48/3475 Data storage: xrootd Uniform protocol for data access Developed by SLAC and INFN for BaBar ALICE is integrating xrootd capability in most SRMs available CASTOR2 • • dCache • • Not a plugin but a Java reimplementation Under test at FZK and GSI DPM • Under test at CERN Not yet deployed elsewhere Under test in Torino and Catania StoRM • This is still to be developed… Stefano Bagnasco - INFN Torino Care & Feeding of the ALICE Grid – Torino Jan 15-16, 2009 - 49/3475 xrootd Logical File Names: Physical file names (TURL): Alien://alice/cern.ch/users/s/sbagnasc/testfile.1 Root://grid008.to.infn.it:1094//dpm/to.infn.it/home/xrootd/ … (and there is of course a GUID) Stefano Bagnasco - INFN Torino Care & Feeding of the ALICE Grid – Torino Jan 15-16, 2009 - 50/3475 references Registration & Certificates: • • AliEn: • http://alien.cern.ch/twiki/bin/view/AliEn/GAPI User’s guide: • http://alien.cern.ch GAPI: • http://alien.cern.ch/twiki/bin/view/Alice/UserRegistration https://ca.cern.ch/ca/ http://project-arda-dev.web.cern.ch/project-arda-dev/alice/apiservice/AAUserGuide-0.0m.pdf aliensh Grid Command Online Reference: • http://project-arda-dev.web.cern.ch/project-arda-dev/alice/apiservice/guide/guide1.0.htm