...

The official Metadata Access Interface for EGEE

by user

on
Category:

web search

106

views

Report

Comments

Transcript

The official Metadata Access Interface for EGEE
Consorzio COMETA - Progetto PI2S2
AMGA
Official Metadata Service for EGEE
Salvatore Scifo –
Consorzio Cometa - Catania, ITALY
([email protected])
Grid Tutorial per i Laboratori Nazionali del Sud
Catania, Italy 25th – 27th Feb 2008
www.consorzio-cometa.it
FESR
Contents
• Background and Motivation for AMGA
• Interface, Architecture and
Implementation
• Metadata Replication with AMGA
• Web Interface to access AMGA remotely
Grid Tutorial per i Laboratori Nazionali del Sud – Catania, 25th-27th Feb 2008
2
Why Grid needs Metadata?
• Grids often contain millions of files spread over several
storage sites.
• Users and applications need an efficient mechanism
– to find the files of their interest
– to discover and query information about their contents
• This is provided
– by associating descriptive attributes (metadata) to files
– by exposing this information in catalogues, accessible and
searchable by user and client application
Grid Tutorial per i Laboratori Nazionali del Sud – Catania, 25th-27th Feb 2008
3
Metadata service requirements
• Metadata service must expose a complete but simple
interface, in order to make all users able to use it
easily.
• It should be flexible and support dynamic schemas in
order to serve many (all is wished) application
domains.
• The service must also allow structured and hierarchical
metadata in order to implement any logical collections.
• Collection refers metadata grouped by any logical
entity meaning. (for example, a collection can describe
all file video in any encoded format).
Grid Tutorial per i Laboratori Nazionali del Sud – Catania, 25th-27th Feb 2008
4
Metadata service requirements
• It must be designed with scalability in mind in order to
deal with the large number of entries (several millions).
• security is required to provide different access levels
to different users.
• Quality of service has to ensure
– Hide network latency – Improved performance for WAN clients
– Disconnected computing – Local replicas for off-line access
(laptops)
– DB Independent replication – GRID environment is
heterogeneous
– Improve reliability and scalability – No single point of failure
Grid Tutorial per i Laboratori Nazionali del Sud – Catania, 25th-27th Feb 2008
5
What AMGA is?
• AMGA is a metadata service for the Grid
– It represents a database access service for Grid applications
which allows user, and user jobs to discovery data describing
their files in order to access them in the appropriate way.
• AMGA is a service based on RDBMS.
– It allows to define metadata schemas according to users and
applications needs
– It provides a replication layer which makes databases locally
available to user jobs and replicate the changes between the
different participating databases.
• AMGA has been designed to provide a best integration
with the Grid environment
– Metadata Service is a Grid component
– Grid security compliant
– Hide DB heterogeneity
Grid Tutorial per i Laboratori Nazionali del Sud – Catania, 25th-27th Feb 2008
6
AMGA Features
• Dynamic Schemas
– Schemas can be modified at runtime by client
 Create, delete schemas
 Add, remove attributes
• Metadata organised as an hierarchy
– Schemas can contain sub-schemas
– Analogy to file system:
 Schema  Directory; Entry  File
• Flexible Queries
– SQL-like query language
– Joins between schemas are supported
Grid Tutorial per i Laboratori Nazionali del Sud – Catania, 25th-27th Feb 2008
7
Metadata Concepts
• To better understand how AMGA works think of
–
–
–
–
schema  database schema
collection  table
attribute  column
entry  row
• AMGA Metadata is list of attributes associated with entries
according to a user defined schema.
• Schema is a set of attributes
• Entry is the abstraction of directory/file mapped by the
metadata server
• Collection is a set of entries associated with a schema
Grid Tutorial per i Laboratori Nazionali del Sud – Catania, 25th-27th Feb 2008
8
Metadata Concepts
• Attribute – typed key/value pair associated with entries
– Type – The type (int, float, string,…)
– Name/Key – The name of the attribute
– Value - Value of an entry's attribute
• Analogy Examples
>createdir /jobs (create table jobs)
>addattr /jobs jobStatus int (alter table jobs add column jobStatus int)
>addentry /jobs/job1 jobStatus 0 (insert into jobs (jobstatus) values(1))
>updateattr /jobs jobStatus 1 jobID>100 (update jobs set jobStatus=1
where JobID>100)
Grid Tutorial per i Laboratori Nazionali del Sud – Catania, 25th-27th Feb 2008
9
AMGA Datatypes
• AMGA Datatypes
• Using the above datatypes you are sure that your metadata
can be easily moved to all supported back-ends
• If you do not care about DB portability, you can use, in
principle, as entry attribute type ALL the datatypes
supported by the back-end, even the more esoteric ones
(PostgreSQL Network Address type or Geometric ones)
Grid Tutorial per i Laboratori Nazionali del Sud – Catania, 25th-27th Feb 2008
10
AMGA Implementation
• C++ multiprocess server
Metadata Server
– Backends
Oracle
 Oracle, MySQL, PostgreSQL,
SQLite
Client
SOAP
MD
Server
– Front Ends
 TCP Streaming
Client
Postgre
SQL
MySQL
TCP
Streaming
• High performance
• Client API for C++, Java,
Python, Perl, Ruby
SQLite
 SOAP (web services)
• Interoperability
• Scalability
• Standalone Python
Library implementation
– Data stored on file system
Python Interpreter
Client
Metadata
Python
API
filesystem
Grid Tutorial per i Laboratori Nazionali del Sud – Catania, 25th-27th Feb 2008
11
Security
• Access control
– All entries in a directory sharing the same ACL
– Groups of users are also supported (Unix style permissions)
• Secure connections – SSL
– Provided by web services
• Client Authentication is based on
– Username/password
– General X509 certificates
– Grid-proxy certificates (VOMS - Virtual Organization Management
System is supported)
VO
MS
Authenticate
with X509
Cert
VOMS-Cert
with Group &
Role information
VOMS-Cert
Resource
management
Orac
AMG
le
A
Grid Tutorial per i Laboratori Nazionali del Sud – Catania, 25th-27th Feb 2008
12
Metadata Replication I
• AMGA provides an replication/federation mechanisms
• Motivation
–
–
–
–
–
Scalability – Support hundreds/thousands of concurrent users
Geographical distribution – Hide network latency
Reliability – No single point of failure
DB Independent replication – Heterogeneous DB systems
Disconnected computing – Off-line access (laptops)
• Models
– Asynchronous replication
– Master-slave – writes only allowed on the master
– Application level replication
 Replicate Metadata commands
– Proxy
 The proxy forwards metadata only (works as a front end)
Grid Tutorial per i Laboratori Nazionali del Sud – Catania, 25th-27th Feb 2008
13
Metadata Replication II
Full replication
Federation
Partial replication
Proxy
Redirected
Commands
Metadata
Commands
Grid Tutorial per i Laboratori Nazionali del Sud – Catania, 25th-27th Feb 2008
14
Conclusion
• AMGA – Metadata Service of gLite
– Part of gLite 3.1
– Useful to realize simple Relational Schemas
– Integrated on the Grid Environment (Security)
• Tests show good performance/scalability
• Already deployed by several Grid Applications
– LHCb, ATLAS, Biomed, …
• AMGA Web Site
http://project-arda-dev.web.cern.ch/project-arda-dev/metadata/
Grid Tutorial per i Laboratori Nazionali del Sud – Catania, 25th-27th Feb 2008
15
Use case: Biomed
• Medical Data Manager – MDM
– Store and access medical images and associated metadata on the Grid
– Built on top of gLite 1.5 data management system
– Demonstrated at last EGEE conference (October 05, Pisa)
• Strong security requirements
– Patient data is sensitive
– Data must be encrypted
– Metadata access must be restricted to authorized users
• AMGA used as metadata server
GUID
Images
Date Patient
ID
Patient
Doctor
Name
Doctor
Hospital
– Demonstrates authentication and
encrypted access
– Used as a simplified DB
• More details at
– https://uimon.cern.ch/twiki/bin/view/EGEE/DMEncryptedStorage
Grid Tutorial per i Laboratori Nazionali del Sud – Catania, 25th-27th Feb 2008
16
gMOD: grid Movie On Demand
• gMOD provides a Video-On-Demand service
• User chooses among a list of video and the chosen one
is streamed in real time to the video client of the user’s
workstation
• For each movie a lot of details (Title, Runtime, Country,
Release Date, Genre, Director, Case, Plot Outline) are
stored and users can search a particular movie
querying on one or more attributes
• Two kind of users can interact with gMOD:
TrailersManagers that can administer the db of movies
(uploading new ones and attaching metadata to them);
GILDA VO users (guest) can browse, search and
choose a movie to be streamed.
Grid Tutorial per i Laboratori Nazionali del Sud – Catania, 25th-27th Feb 2008
17
gLibrary - Multimedia CMS
• Motivations
– Huge amounts of data can be saved on SEs, but how can we easily find
later a file that we need?
 (if you have good memory, its GUID could be a solution but it is not so easy)
 File Catalogues just let us to arrange files in folders and subfolders, no way to
query on their contents
 Metadata Catalogues are a possible solution, but not always “affordable”
especially for non expert users (powerful but complex to use)
• Requirements
– easy to use, fast, secure, extensible
– Multimedia files






Images
Movies
Audio Files
Office Documents (Powerpoint, Word, Excel, OpenOffice)
E-Mails, PDFs, HTMLs
Customized versions of well-know document type (ex. EGEE PPTs)
Grid Tutorial per i Laboratori Nazionali del Sud – Catania, 25th-27th Feb 2008
21
Use Case: ADAT
Grid Tutorial per i Laboratori Nazionali del Sud – Catania, 25th-27th Feb 2008
28
AMGA Web Interface
Grid Tutorial per i Laboratori Nazionali del Sud – Catania, 25th-27th Feb 2008
29
High Level Requirements
• Group Management
–
–
–
–
Group list/add/drop
Group membership list
Group ownership list
Add/Remove user and group association
• User Management
– User list/create/delete
– User subject change
• Collection Management
– collection tree browse/create/delete
– collection ACL management
 list group, add group, drop group
 change mode for owner/change owner
• Metadata management
– entry listing/searching
– entry create/modify/delete
– schema management
 attribute listing/create/clear/delete
Grid Tutorial per i Laboratori Nazionali del Sud – Catania, 25th-27th Feb 2008
30
Standard Multi-layer Architecture
Data Presentation Layer: consists of all web pages that make users able to access all
provided features.
These pages publish dynamic contents managed by
DHTML and Ajax (Asynchronous JavaScript And
AMGA Web Interface
XML) libraries .
They work with both logic components to perform
data manipulation and with access components to
retrieve and publish data.
Data Presentation Layer
Logic application layer
AMGA API
AMGA API
Data Access Layer (Amga API)
AMGA Service
AMGA
Metadata catalog
Logic application layer is
made up by all the
software modules that
encapsulate the
implementation of the
provided features
(metadata handling and
manipulation).
Data Access Layer: implements all the software components than ensure the data
extraction from the AMGA server. These components work as services invoked by the
web pages and they provide a mechanism to retrieve data and publish dynamic content.
Grid Tutorial per i Laboratori Nazionali del Sud – Catania, 25th-27th Feb 2008
31
Software Architecture
The core of the
application is designed to
be a plug-in for general
purpose applications that
adopt metadata on Grid.
Its design covers several
Object Oriented Design
Patterns (Singleton,
Strategy method, Factory
method, Template
Method, Iterator and
Composite).
This ensures a very clean
and simple software
architecture with an high
degree of cohesion and
decoupling.
AMGA Web Interface
Management Web Pages
Collection
Manager
Entry
Manager
Attribute
Manager
ACL
Manager
Group
Manager
Group
Manager
AMGA API
AMGA API
engine is than generic for any application that
needs to integrate Metadata Usage.
Every component is built on top the Official
AMGA Java API.
Grid Tutorial per i Laboratori Nazionali del Sud – Catania, 25th-27th Feb 2008
32
Deployment Plan
Application can be deployed on a
dedicated server machine located
inside the GRID boundaries or
outside.
Deployed on the GILDA t-Infrastructure
Currently the GILDA AMGA Server
machine also hosts the web
interface.
J2EE application
Web front-end available at
https://amga.ct.infn.it:8443/amgawi/
Application server runs Apache Tomcat
5.0 on a Fedora Core 5 Linux Machine.
Internet
Clients
GRID
AmgaWi
Application Server
Users interact to the catalog
through functionalities provided by
the web interface.
Metadata Service
AMGA Server
User uses a common Web
Browser.
Grid Tutorial per i Laboratori Nazionali del Sud – Catania, 25th-27th Feb 2008
33
Collection Management
Modify Schema
Instance
Delete entry
Grid Tutorial per i Laboratori Nazionali del Sud – Catania, 25th-27th Feb 2008
34
Tool Bars Overview
“Address” bar
Go!
back to parent
add collection
type collection name
new collection
new entry
bulk upload
search entry
Modify Schema
ACL management
Grid Tutorial per i Laboratori Nazionali del Sud – Catania, 25th-27th Feb 2008
35
Add Entry
Grid Tutorial per i Laboratori Nazionali del Sud – Catania, 25th-27th Feb 2008
36
Modify Entry
Grid Tutorial per i Laboratori Nazionali del Sud – Catania, 25th-27th Feb 2008
37
Metadata Schema Management
Grid Tutorial per i Laboratori Nazionali del Sud – Catania, 25th-27th Feb 2008
38
ACL Management
Grid Tutorial per i Laboratori Nazionali del Sud – Catania, 25th-27th Feb 2008
39
Group Management
Grid Tutorial per i Laboratori Nazionali del Sud – Catania, 25th-27th Feb 2008
40
Group Ownership
Grid Tutorial per i Laboratori Nazionali del Sud – Catania, 25th-27th Feb 2008
41
Group Membership
Grid Tutorial per i Laboratori Nazionali del Sud – Catania, 25th-27th Feb 2008
42
User Management
Grid Tutorial per i Laboratori Nazionali del Sud – Catania, 25th-27th Feb 2008
43
User Group Relationship
Grid Tutorial per i Laboratori Nazionali del Sud – Catania, 25th-27th Feb 2008
44
Use Cases
• ADAT Project
– embeds engine of AMGA WI within the Digital Archive Software
• Aiuri (Project COPPE/UFRJ - BRAZIL)
– aims to implement Grid Oriented platform to support data and text
mining applications..
• BM Portal project (Bio-Lab, DIST University of Genoa )
– embeds the engine of AMGAWI as a plug-in
• GILDA Team
– adopts the AMGA Web Interface for dissemination and training
purposes.
• EGEE Respect program
– candidate as recommended external software
Grid Tutorial per i Laboratori Nazionali del Sud – Catania, 25th-27th Feb 2008
45
Conclusions
• AMGA WI challenge is to offer
– a flexible, multiplatform, secure, reusable and easy-to-use system to
handle AMGA metadata for files stored on a distributed Grid
infrastructure
• Flexible
– it allows to handle any kind of metadata schema defined within the
server
• Multiplatform
– implemented as a Java Web Application can be used on every
platforms
• Secure
– GSI compliant (x509 proxy if required)
• Reusable
– The engine can be embedded into bigger application
• Easy-to-use
– its intuitive web interface allows to manage metadata with a just a few
mouse clicks
Grid Tutorial per i Laboratori Nazionali del Sud – Catania, 25th-27th Feb 2008
46
Questions…
Than you very much for your kind attention!
Grid Tutorial per i Laboratori Nazionali del Sud – Catania, 25th-27th Feb 2008
47
Fly UP