Comments
Description
Transcript
ppt
WP2 - Data Management L.M.Barone Università di Roma & INFN L.M.Barone – INFN Rome Commissione Nazionale I 13 Settembre 2000 WP Goals “...to permit the secure access of massive amounts of data...to move and replicate data at high speed from one site to another and to manage the synchronisation of remote data copies” (dal Technical Annex di DataGrid) L.M.Barone – INFN Rome Commissione Nazionale I 13 Settembre 2000 Keywords • • • • • • • Automation Caching Generic Interface MetaData Data Mover Replica Manager Security L.M.Barone – INFN Rome Commissione Nazionale I 13 Settembre 2000 People SEDE NOME FTE Bari: L.Silvestris G.Zito 0.3 0.5 (0.3) Pisa: S.Arezzini A.Controzzi F.Donno F.Schifano 0.3 (0.3) 0.5 0.2 (0.2) 0.2 Roma1:L.M.Barone A.Lonardo A.Michelotti G.Organtini D.Rossetti 0.3 (0.3) 0.3 0.3 0.2 0.2 (0.2) L.M.Barone – INFN Rome Commissione Nazionale I 13 Settembre 2000 Deliverables •Requirements for Data Location Broker 5/2001 •Definition of a metadata syntax 7/2001 •Replica Management at file level 12/2001 L.M.Barone – INFN Rome Commissione Nazionale I 13 Settembre 2000 An Example • Ideas for a Replica Manager: – Management of production in a distributed environment: • • • • Data produced in many sites Data collected in a single reference site Data analyzed in many sites Data sometimes are moved, sometimes may be accessed via network • A case study with Objectivity/DB – can be extended to any kind of file L.M.Barone – INFN Rome Commissione Nazionale I 13 Settembre 2000 Cloning federations Clone FD RC1 Boot CERN Boot RC1 FD CERN FD DB_a DB_b RC2 Boot DB1 DB2 DB3 L.M.Barone – INFN Rome DBn RC2 FD Commissione Nazionale I 13 Settembre 2000 Productions RC1 Boot CERN Boot GDMP CERN FD RC1 FD DBn+1 GDMP DBn+m GDMP RC2 Boot GDMP DB1 DB2 DB3 L.M.Barone – INFN Rome DBn RC2 FD Commissione Nazionale I DBn+m+1 DBn+m+k 13 Settembre 2000 Analysis CERN Boot RC1 Boot CERN FD RC1 FD DBn+1 DBn+m RC2 Boot DB1 DB2 DB3 DBn RC2 FD DBn+m+1 DBn+m DBn+m+k Commissione Nazionale I L.M.Barone – INFN Rome DBn+1 DBn+m+1 DBn+m+k 13 Settembre 2000 Logical vs Physical Datasets Dataset: H 2 pccms1.bo.infn.it::/data1/Hmm1.hits.DB Hmm.1.hits.DB id=12345 Hmm.2.hits.DB id=12346 shift23.cern.ch::/db45/Hmm1.hits.DB pccms1.bo.infn.it::/data1/Hmm2.hits.DB shift23.cern.ch::/db45/Hmm2.hits.DB pccms3.pd.infn.it::/data3/Hmm2.hits.DB Hmm.3.hits.DB id=12347 Dataset: H 2e shift23.cern.ch::/db45/Hmm3.hits.DB pccms5.roma1.infn.it::/data/Hee1.hits.DB Hee.1.hits.DB id=5678 Hee.2.hits.DB id=5679 shift49.cern.ch::/db123/Hee1.hits.DB pccms5.roma1.infn.it::/data/Hee2.hits.DB shift49.cern.ch::/db123/Hee2.hits.DB pccms5.roma1.infn.it::/data/Hee3.hits.DB Hee.3.hits.DB L.M.Barone – INFN Rome id=5680 shift49.cern.ch::/db123/Hee3.hits.DB Commissione Nazionale I 13 Settembre 2000 Logical vs Physical Datasets • Each dataset is composed by one or more databases – datasets are managed by application-sw • Each DB is univocally identified by a DBid – DBid assignment is a logical-db creation • The physical-db is the file – zero, one or more instancies • The GIS manages the link between a dataset, its logical-dbs and its physicaldbs L.M.Barone – INFN Rome Commissione Nazionale I 13 Settembre 2000 Database creation shift.cern.ch pc.rc1.net RC1 Prod DB4 DB5 CERN FD RC1 Ref DB1 DB4 DB2 DB3 DB5 L.M.Barone – INFN Rome 0001 0001 DB1.DB DB1.DB shift.cern.ch::/shift/data shift.cern.ch::/shift/data 0002 0002 DB2.DB DB2.DB shift.cern.ch::/shift/data shift.cern.ch::/shift/data 0003 0003 DB3.DB DB3.DB shift.cern.ch::/shift/data shift.cern.ch::/shift/data 0004 0004 DB4.DB DB4.DB pc.rc1.net::/pc/data pc.rc1.net::/pc/data shift.cern.ch::/shift/data shift.cern.ch::/shift/data 0005 0005 DB5.DB DB5.db Commissione Nazionale I 13 Settembre 2000 pc.rc1.net::/pc.data pc.rc1.net::/ps.data shift.cern.ch::/shift/data Replica Management shift.cern.ch pc1.bo.infn.it DB1 CERN FD BO Ref DB2 DB1 DB3 DB2 0001 DB1.DB shift.cern.ch::/shift/data pc1.bo.infn.it::/data 0002 DB2.DB shift.cern.ch::/shift/data 0003 pc1.bo.infn.it::/data DB3.DB 0003 shift.cern.ch::/shift/data DB3.DB shift.cern.ch::/shift/data L.M.Barone – INFN Rome pc1.pd.infn.it PD Ref Commissione Nazionale I 13 Settembre 2000 Example Summary • Basic functionalities of a Replica Manager for production will be tested by end of 2000 on CMS production (GDMP) • Next comes an Information Server to allow easy synchronization of federations and optimized data access during analysis • The same functionalities shown for Objectivity/DB may/should be implemented for other kind of files L.M.Barone – INFN Rome Commissione Nazionale I 13 Settembre 2000 Conclusions • Data Management Tools are needed to face the complexity of new generation experiments (not only LHC) • The GRID projects (INFN and EU) are already providing solutions to real life problems • Milestones and objectives are well defined (to meet them will not be trivial) L.M.Barone – INFN Rome Commissione Nazionale I 13 Settembre 2000