...

CareAndFeeding1

by user

on
Category: Documents
17

views

Report

Comments

Transcript

CareAndFeeding1
Care and feeding of the
alice grid
Torino, Jan 15-16, 2009
Alice and the grid
S. Bagnasco, INFN Torino
Care & Feeding of the ALICE Grid – Torino, Jan 15-16 2009
outline



The ALICE Computing Model
AliEn, the Alice Environment
Integration with LCG/INFNGrid
Then:
 Aliensh basics
 Job submission hands-on
 Job postmortem hands-on
 Monitoring hands-on
Stefano Bagnasco - INFN Torino
Care & Feeding of the ALICE Grid – Torino Jan 15-16, 2009 - 3/3475
The ALICE
Computing Model
Stefano Bagnasco - INFN Torino
Care & Feeding of the ALICE Grid – Torino Jan 15-16, 2009 - 4/3475
The ALICE Computing Model

For pp similar to the other experiments



Quasi-online data distribution and first reconstruction at
T0
Further reconstructions at T1’s
For AA different model



Calibration, alignment, pilot reconstructions and partial
data export during data taking
Data distribution and first reconstruction at T0 in the four
months after AA run (shutdown)
Further reconstructions at T1’s
Stefano Bagnasco - INFN Torino
Care & Feeding of the ALICE Grid – Torino Jan 15-16, 2009 - 5/3475
The ALICE Computing Model

Three kinds of data analysis




T0 (CERN)



Does: first pass reconstruction; calibration and alignment
Stores: one copy of RAW, calibration data and first-pass ESDs
T1s



Fast pilot analysis of the data “just collected” to tune the first
reconstruction at CERN Analysis Facility (CAF)
Scheduled batch analysis on the Grid (ESDs and AODs)
End-user interactive or batch analysis using PROOF and GRID
(AODs and ESDs)
Does: reconstructions and scheduled batch analysis
Stores: second collective copy of RAW, one copy of all data to be kept, disk replicas of
ESDs and AODs
T2s


Does: simulation and end-user analysis
Stores: disk replicas of AODs and ESDs
Stefano Bagnasco - INFN Torino
Care & Feeding of the ALICE Grid – Torino Jan 15-16, 2009 - 6/3475
The alice computing model
Stefano Bagnasco - INFN Torino
Care & Feeding of the ALICE Grid – Torino Jan 15-16, 2009 - 7/3475
The components

AliRoot
•
•

AliEn



Data access
MonALISA


Data catalogue
Job management
Xrootd


ROOT + Geant3 + …
(You probably know this better than I do…)
Monitoring
Underlying infrastructure

LCG/INFNGrid
•
But also OSG, NorduGrid,… that use different middleware
Stefano Bagnasco - INFN Torino
Care & Feeding of the ALICE Grid – Torino Jan 15-16, 2009 - 8/3475
ALICE Computing Centres
Stefano Bagnasco - INFN Torino
Care & Feeding of the ALICE Grid – Torino Jan 15-16, 2009 - 9/3475
2
Alien
The alice environment
Stefano Bagnasco - INFN Torino
Care & Feeding of the ALICE Grid – Torino Jan 15-16, 2009 - 10/3475
credits
Stefano Bagnasco - INFN Torino
Care & Feeding of the ALICE Grid – Torino Jan 15-16, 2009 - 11/3475
credits
Stefano Bagnasco - INFN Torino
Care & Feeding of the ALICE Grid – Torino Jan 15-16, 2009 - 12/3475
Alien components
Job management

Task Queue




Run on the TQ
Enforce policies, split jobs, etc.
File Catalogue




Similar to the Task Queue
Uses FTS or xrootd
Storage Element

Run jobs on sites
With metadata
File Transfer Service

Job Agents



Job optimizers


Database of all submitted jobs
Keeps track of status, etc.
Data management

Not really a piece of AliEn
Several “flavours” exist
Cluster Monitor

Site service working as a proxy
for Job Agents

Package Manager

Did not know where to put this
Stefano Bagnasco - INFN Torino
Care & Feeding of the ALICE Grid – Torino Jan 15-16, 2009 - 13/3475
Job execution basic concepts

“Pull model”


Task Queue


Central DB holds record of ALL jobs
VO-Box


Works better than push…
“Edge service”, acts as an interface between AliEn and underlying
Grid
Job Agent



A.k.a. “Pilot Job”, “Joblet”, “Dirty trick”, “Damn ALICE thing”
“Virtual grid” on top of different flavours
Identity issue: all jobs on a site run with the same credentials
Stefano Bagnasco - INFN Torino
Care & Feeding of the ALICE Grid – Torino Jan 15-16, 2009 - 14/3475
AliEn Structure
Central Services
Mon
ALIS
A
API
Authen
Task
Queue
Manager
Opt.
Broke
r
Mon
ALISA
ISS
Opt.
LDAP
Proxy
CM
IS
File
Catalogue
Transfers
Opt.
Logger
Manager
Site Services
Broke
r
~ 70 in ALICE
xrootd
JA
FTD
SE
CE
Pack
Man
CM
Mon
ALISA
xrootd
JA
FTD
SE
CE
Pack
Man
…
CM
Mon
ALISA
xrootd
JA
CE
SE
FTD
Pack
Man
Pablo Saiz’s talk @ Offline Week Oct 2008
Deployed for:
Stefano Bagnasco - INFN Torino
Care & Feeding of the ALICE Grid – Torino Jan 15-16, 2009 - 15/3475
Central services


Stateless services
Several instances
New services:





Messages
IS
Opt.
Authen
PackManMaster
SEMaster
Messages
Proxy
Manager
Logger
Opt.
Broke
r
Opt.
Broke
r
Security envelope
Reduced Proxy
Running on alias



SEMaster
API
Biggest improvements


Mon
ALISA
PackMan
Master
Only servers below a certain threshold may answer
If all services loaded, no new connections
Keep connection to database
Manager
Pablo Saiz’s talk @ Offline Week Oct 2008

Stefano Bagnasco - INFN Torino
Care & Feeding of the ALICE Grid – Torino Jan 15-16, 2009 - 16/3475
Site services
Reduced connections





PackMan talks to PackManMaster
SE talks to SEMaster
JA talks to Authen
Access to replicas
To do:




On jobs (Artem’s banking system)
On files
Pre-staging of files
Mon
ALISA
JA
FTD
SE
Verify PackMan dependencies
Enable automatic orphan file deletion
Quotas:
•
•
xrootd
CM
CE
Pack
Man
Pablo Saiz’s talk @ Offline Week Oct 2008

Stefano Bagnasco - INFN Torino
Care & Feeding of the ALICE Grid – Torino Jan 15-16, 2009 - 17/3475
Beware!
An AliEn “job” is different from an ALICE LCG “job”

An LCG job:






is run with alicesgm credentials
It is submitted to an RB/WMS and shipped to a CE
It starts the AliEn JobAgent
It goes through LCG job state machine (ready, waiting,
scheduled, etc.)
It is NEVER directly submitted by an ALICE user!
An AliEn job:



Is submitted by a user or by the production system
It is run by a JobAgent (which was started by the LCG job)
It goes through the AliEn jobs states
Stefano Bagnasco - INFN Torino
Care & Feeding of the ALICE Grid – Torino Jan 15-16, 2009 - 18/3475
Data catalogue



Used by all other services
Mapping from LFN to SE and PFN
UNIX-like file system
GUID
Tier1
|--./
| |--cern.ch/
| | |--user/
| | | |--a/
| | | | |--admin/
| | | | |
| | | | |--aliprod/
| | | |
| | | |--f/
| | | | |--fca/
| | | |
| | | |--p/
| | | | |--psaiz/
| | | | | |--as/
| | | | | |
| | | | | |--dos/
| | | | | |
| | | | | |--local/
ALICE
LOCAL
ALICE
USERS
ALICE
SIM
|--simulation/
| |--2001-01/
| | |--V3.05/
| | | |--Config.C
| | | |--grun.C
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|--36/
| |--stderr
| |--stdin
| |--stdout
|
|--37/
| |--stderr
| |--stdin
| |--stdout
|
|--38/
| |--stderr
| |--stdin
| |--stdout
| | | |
| | | |--b/
| | | | |--barbera/
Pablo Saiz’s talk @ CHEP07

Stefano Bagnasco - INFN Torino
Care & Feeding of the ALICE Grid – Torino Jan 15-16, 2009 - 19/3475
Job state machine
Stefano Bagnasco - INFN Torino
Care & Feeding of the ALICE Grid – Torino Jan 15-16, 2009 - 20/3475
Data catalogue features
Split between LFN and GUID catalogues


Automatic PFN generation



File collections
Triggers
Metadata



Thus no need for ‘Local File catalogue’ on the SE
Advanced features


Fast queries if GUID cached
User-defined schemA
At the file or diectory level
Expiration time of the entries

Depending on the storage system, no need for the user to
‘clean up’
Pablo Saiz’s talk @ CHEP07

Stefano Bagnasco - INFN Torino
Care & Feeding of the ALICE Grid – Torino Jan 15-16, 2009 - 21/3475
Independent LFN and guid catalogues
AliEn File & Metadata Catalogue
GUID Catalogue
Index
LFN GUID
Index
/
/alice
/alice/user/p/psaiz
/alice/simulation/2
006
…
1-JAN-1970
1-JAN-2006
14-FEB-2007
23-AUG-2008
…
GUID PFN
Pablo Saiz’s talk @ CHEP07
LFN Catalogue
Stefano Bagnasco - INFN Torino
Care & Feeding of the ALICE Grid – Torino Jan 15-16, 2009 - 22/3475
Data transfers

All “scheduled” transfers use the FTD



T0- T1 transfers use LCG’s FTS



Transfer queue similar to the TQ
Aliensh “mirror” command
Defined “channels”
Data go in and out the SEs via SRM interface
T1-T2 and T2-T2 use xroot


No predefined channels
Data go in and out the SEs via xrootd server
Stefano Bagnasco - INFN Torino
Care & Feeding of the ALICE Grid – Torino Jan 15-16, 2009 - 23/3475
Authentication and authorization
Authentication via Grid Proxy certificate



VOMS extensions
And subsequently via session token
Authorization:


All authorization and policies enforced in the central
services (TQ for jobs, FC for data)
Authorization information for storage sent via
secure “sealed envelope” mechanism (see Andreas
Peters and Derek Feichtinger’s presentation)
•
SB note: nobody except AP and DF really understand how this works
Pablo Saiz’s talk @ Offline Week Oct 2008

Stefano Bagnasco - INFN Torino
Care & Feeding of the ALICE Grid – Torino Jan 15-16, 2009 - 24/3475
Connection via gapi service
libgapiUI
GAPI
Server
AlienAS.pl
Aliensh:[1]>
API Clients

API Service
Middleware
Authentication chain

The user cert is used to generate a proxy
•

The proxy is used to obtain a session token
•

This is done automatically or by hand
Encrypted communications
Submission is done via ‘alicesgm’ user proxies
•
(at least for now)
Stefano Bagnasco - INFN Torino
Care & Feeding of the ALICE Grid – Torino Jan 15-16, 2009 - 25/3475
Alien2
Stefano Bagnasco - INFN Torino
Care & Feeding of the ALICE Grid – Torino Jan 15-16, 2009 - 26/3475
User interface
Aliensh


New commands:





a nearly standard bash shell with extensions
setSElimit: view only the part of the catalogue present in a
particular SE
jobListMatch: print requirements that prevent a job from running
get collections: Copy all the files of a collection, keeping the
same lfn
Automatic transfer resubmission
To do:

Combine ‘find’ and setSElimit
Pablo Saiz’s talk @ Offline Week Oct 2008

Stefano Bagnasco - INFN Torino
Care & Feeding of the ALICE Grid – Torino Jan 15-16, 2009 - 27/3475
Monalisa
Stefano Bagnasco - INFN Torino
Care & Feeding of the ALICE Grid – Torino Jan 15-16, 2009 - 28/3475
We know where you are!
Stefano Bagnasco - INFN Torino
Care & Feeding of the ALICE Grid – Torino Jan 15-16, 2009 - 29/3475
monalisa
Stefano Bagnasco - INFN Torino
Care & Feeding of the ALICE Grid – Torino Jan 15-16, 2009 - 30/3475
monalisa
Stefano Bagnasco - INFN Torino
Care & Feeding of the ALICE Grid – Torino Jan 15-16, 2009 - 31/3475
On to the gory details
Please don appropriate equipment
Stefano Bagnasco - INFN Torino
Care & Feeding of the ALICE Grid – Torino Jan 15-16, 2009 - 32/3475
Integration with
LCG and infngrid
Stefano Bagnasco - INFN Torino
Care & Feeding of the ALICE Grid – Torino Jan 15-16, 2009 - 33/3475
Job submission loop
Submits job
ALICE Job Catalogue
ALICE File Catalogue
Job 1.1
1
lfn1,
lfn1lfn2, lfn3, lfn4
lfn
guid
{se’s}
Job 1.2
2
lfn1,
lfn2lfn2, lfn3, lfn4
lfn
guid
{se’s}
Job 1.3
3
lfn1,
lfn3,
lfn2,
lfn4lfn3
lfn
guid
{se’s}
Job 2.1
lfn1, lfn3
lfn
guid
{se’s}
Job 2.1
lfn2, lfn4
lfn
guid
{se’s}
Job 3.1
lfn1, lfn3
Job 3.2
lfn2
Optimizer
Registers
output
ALICE
central
services
Execs
agent
Close SE’s & Software
Matchmakes
Updates
TQ
User
Retrieves
workload
Yes
Site
Env
OK?
Asks work-load
No
Die
with
grace
Receives work-load
Sends job result
packman
Computing
Agent
RB
CE
WN
Sends job
agent to
site
Stefano Bagnasco - INFN Torino
Care & Feeding of the ALICE Grid – Torino Jan 15-16, 2009 - 34/3475
Integration with infngrid

User interaction always through AliEn

Job submission & tracking
•

Catalogue query & data management
•

Aliensh “ls”, “find”, “cp”, “tag”…
Data access for analysis
•
•
•

Aliensh “submit”, “ps”…
Aliensh “cp” to a local file
TGrid::Connect(“alien://”) from root
Tfile::Open(“alien://<LFN>”) through xrootd from root
No need to use an LCG UI


AliEn installs on laptop
Interacts with UI at sites (“VO-Box”)
Stefano Bagnasco - INFN Torino
Care & Feeding of the ALICE Grid – Torino Jan 15-16, 2009 - 35/3475
The vo-box
LCG
RB
Job submission
VO-Box
File
Catalogue
CE Interface
LCG
Site
LCG CE
LCG SE
File Registration
SE Interface
PackMan
WN
JobAgent
Job configuration
request(s)
SB’s talk @ almost everywhere
TQ
Stefano Bagnasco - INFN Torino
Care & Feeding of the ALICE Grid – Torino Jan 15-16, 2009 - 36/3475
Vobox bits and pieces
Implement as much as possible thin interface services



Use the VO-Box manager’s certificate



All jobs in a site still share the same LCG user
As requested by some sites, an enhancement for security: glexec is still
under discussion
Service interfaces on the VO-Box:





To (stable) LCG standard services
Be “good citizens” of the Grid – xrootd is now a front door
Job Submission (WMS clients) are more or less ready to use gLite
SRM clients useful in T-1 only, xrootd redirector on VO-Box not recommended
Xroot is used for T1-T2 and T2-T2 data transfer
LFC not used any more (if it ever was…)
Proprietary services:


Package Manager
Cluster Monitor
SB’s talk @ INFNGRID Workshop 2006

Stefano Bagnasco - INFN Torino
Care & Feeding of the ALICE Grid – Torino Jan 15-16, 2009 - 37/3475
advanced features in the Vobox

Failover submission


WMS monitoring


Several RB, with memory and fallback
Via queries to L&B and IS
SAM tests

Monitoring LCG & AliEn services, proxy lifetimes, WMS,…
Stefano Bagnasco - INFN Torino
Care & Feeding of the ALICE Grid – Torino Jan 15-16, 2009 - 38/3475
configuration

The LDAP Database




Local configuration file



ldap://aliendb06a.cern.ch:8389
DN o=alice,dc=cern,dc=ch
http://alien.cern.ch/twiki/bin/view/AliEn/VOBoxConfigurationReference
On the VO-Box: ~alicesgm/.alien/alice.conf
Used only for tests & debugging if localconfig=“add” or
“overwrite”
Environment files


${ALIEN_HOME}/.Environment
~/.alien/Environment
Stefano Bagnasco - INFN Torino
Care & Feeding of the ALICE Grid – Torino Jan 15-16, 2009 - 39/3475
Site configuration
Stefano Bagnasco - INFN Torino
Care & Feeding of the ALICE Grid – Torino Jan 15-16, 2009 - 40/3475
Proxies on the vo-box
Intricate issue…


The proxies used to submit JobAgent (that are LCG
jobs!) are kept in a DB on the VO-Box
They are kept alive by a specific service using a
myproxy server
Proxy lifetime monitored by MonALISA

See also:

http://alien.cern.ch/twiki/bin/view/AliEn/HowToManageVOBoxProxies
Stefano Bagnasco - INFN Torino
Care & Feeding of the ALICE Grid – Torino Jan 15-16, 2009 - 41/3475
Proxy mgmt
VO-Box
MyProxy Server
VOMS DB
PRS
voms-proxy-init --voms alice:/alice/Role=lcgadmin
AliEn
The GridTM
WMS
FTS
Server
CREAM
Resource
Broker
LCG User Interface
Stefano Bagnasco - INFN Torino
Care & Feeding of the ALICE Grid – Torino Jan 15-16, 2009 - 42/3475
Proxy mgmt
MyProxy Server
VOMS DB
VO-Box
PRS
AliEn
myproxy-init -s myproxy.cern.ch -d -n -t 48 -c 720
WMS
FTS
Server
CREAM
Resource
Broker
LCG User Interface
Stefano Bagnasco - INFN Torino
Care & Feeding of the ALICE Grid – Torino Jan 15-16, 2009 - 43/3475
Proxy mgmt
VO-Box
MyProxy Server
VOMS DB
PRS
AliEn
Gsissh –p 1975 [email protected]
WMS
FTS
Server
CREAM
Resource
Broker
LCG User Interface
Stefano Bagnasco - INFN Torino
Care & Feeding of the ALICE Grid – Torino Jan 15-16, 2009 - 44/3475
Proxy mgmt
MyProxy Server
VOMS DB
VO-Box
PRS
AliEn
vobox-proxy --vo alice --voms alice:/alice/Role=lcgadmin register
WMS
FTS
Server
CREAM
Resource
Broker
LCG User Interface
Stefano Bagnasco - INFN Torino
Care & Feeding of the ALICE Grid – Torino Jan 15-16, 2009 - 45/3475
Proxy mgmt
MyProxy Server
VOMS DB
VO-Box
PRS
AliEn
/opt/lcg/bin/lcg-proxy-renew –a $file –d –t 72 – –cert –o
/tmp/tmpfile.$$ $X509_USER_PROXY – –key $X509_USER_KEY
WMS
FTS
Server
CREAM
Resource
Broker
LCG User Interface
Stefano Bagnasco - INFN Torino
Care & Feeding of the ALICE Grid – Torino Jan 15-16, 2009 - 46/3475
Proxy mgmt
MyProxy Server
VOMS DB
VO-Box
PRS
AliEn
WMS
FTS
Server
CREAM
Resource
Broker
LCG User Interface
Stefano Bagnasco - INFN Torino
Care & Feeding of the ALICE Grid – Torino Jan 15-16, 2009 - 47/3475
Proxy mgmt
VO-Box
MyProxy Server
VOMS DB
PRS
AliEn
The MyProxy
The “user” Proxy
The Certificate
The Login Proxy
The UI Proxy
WMS
FTS
Server
CREAM
Resource
Broker
LCG User Interface
Stefano Bagnasco - INFN Torino
Care & Feeding of the ALICE Grid – Torino Jan 15-16, 2009 - 48/3475
Data storage: xrootd

Uniform protocol for data access


Developed by SLAC and INFN for BaBar
ALICE is integrating xrootd capability in most
SRMs available

CASTOR2
•
•

dCache
•
•

Not a plugin but a Java reimplementation
Under test at FZK and GSI
DPM
•

Under test at CERN
Not yet deployed elsewhere
Under test in Torino and Catania
StoRM
•
This is still to be developed…
Stefano Bagnasco - INFN Torino
Care & Feeding of the ALICE Grid – Torino Jan 15-16, 2009 - 49/3475
xrootd

Logical File Names:


Physical file names (TURL):


Alien://alice/cern.ch/users/s/sbagnasc/testfile.1
Root://grid008.to.infn.it:1094//dpm/to.infn.it/home/xrootd/
…
(and there is of course a GUID)
Stefano Bagnasco - INFN Torino
Care & Feeding of the ALICE Grid – Torino Jan 15-16, 2009 - 50/3475
references

Registration & Certificates:
•
•

AliEn:
•

http://alien.cern.ch/twiki/bin/view/AliEn/GAPI
User’s guide:
•

http://alien.cern.ch
GAPI:
•

http://alien.cern.ch/twiki/bin/view/Alice/UserRegistration
https://ca.cern.ch/ca/
http://project-arda-dev.web.cern.ch/project-arda-dev/alice/apiservice/AAUserGuide-0.0m.pdf
aliensh Grid Command Online Reference:
•
http://project-arda-dev.web.cern.ch/project-arda-dev/alice/apiservice/guide/guide1.0.htm
Fly UP