Comments
Transcript
The official Metadata Access Interface for EGEE
Consorzio COMETA - Progetto PI2S2 AMGA Official Metadata Service for EGEE Salvatore Scifo – Consorzio Cometa - Catania, ITALY ([email protected]) Grid Tutorial per i Laboratori Nazionali del Sud Catania, Italy 25th – 27th Feb 2008 www.consorzio-cometa.it FESR Contents • Background and Motivation for AMGA • Interface, Architecture and Implementation • Metadata Replication with AMGA • Web Interface to access AMGA remotely Grid Tutorial per i Laboratori Nazionali del Sud – Catania, 25th-27th Feb 2008 2 Why Grid needs Metadata? • Grids often contain millions of files spread over several storage sites. • Users and applications need an efficient mechanism – to find the files of their interest – to discover and query information about their contents • This is provided – by associating descriptive attributes (metadata) to files – by exposing this information in catalogues, accessible and searchable by user and client application Grid Tutorial per i Laboratori Nazionali del Sud – Catania, 25th-27th Feb 2008 3 Metadata service requirements • Metadata service must expose a complete but simple interface, in order to make all users able to use it easily. • It should be flexible and support dynamic schemas in order to serve many (all is wished) application domains. • The service must also allow structured and hierarchical metadata in order to implement any logical collections. • Collection refers metadata grouped by any logical entity meaning. (for example, a collection can describe all file video in any encoded format). Grid Tutorial per i Laboratori Nazionali del Sud – Catania, 25th-27th Feb 2008 4 Metadata service requirements • It must be designed with scalability in mind in order to deal with the large number of entries (several millions). • security is required to provide different access levels to different users. • Quality of service has to ensure – Hide network latency – Improved performance for WAN clients – Disconnected computing – Local replicas for off-line access (laptops) – DB Independent replication – GRID environment is heterogeneous – Improve reliability and scalability – No single point of failure Grid Tutorial per i Laboratori Nazionali del Sud – Catania, 25th-27th Feb 2008 5 What AMGA is? • AMGA is a metadata service for the Grid – It represents a database access service for Grid applications which allows user, and user jobs to discovery data describing their files in order to access them in the appropriate way. • AMGA is a service based on RDBMS. – It allows to define metadata schemas according to users and applications needs – It provides a replication layer which makes databases locally available to user jobs and replicate the changes between the different participating databases. • AMGA has been designed to provide a best integration with the Grid environment – Metadata Service is a Grid component – Grid security compliant – Hide DB heterogeneity Grid Tutorial per i Laboratori Nazionali del Sud – Catania, 25th-27th Feb 2008 6 AMGA Features • Dynamic Schemas – Schemas can be modified at runtime by client Create, delete schemas Add, remove attributes • Metadata organised as an hierarchy – Schemas can contain sub-schemas – Analogy to file system: Schema Directory; Entry File • Flexible Queries – SQL-like query language – Joins between schemas are supported Grid Tutorial per i Laboratori Nazionali del Sud – Catania, 25th-27th Feb 2008 7 Metadata Concepts • To better understand how AMGA works think of – – – – schema database schema collection table attribute column entry row • AMGA Metadata is list of attributes associated with entries according to a user defined schema. • Schema is a set of attributes • Entry is the abstraction of directory/file mapped by the metadata server • Collection is a set of entries associated with a schema Grid Tutorial per i Laboratori Nazionali del Sud – Catania, 25th-27th Feb 2008 8 Metadata Concepts • Attribute – typed key/value pair associated with entries – Type – The type (int, float, string,…) – Name/Key – The name of the attribute – Value - Value of an entry's attribute • Analogy Examples >createdir /jobs (create table jobs) >addattr /jobs jobStatus int (alter table jobs add column jobStatus int) >addentry /jobs/job1 jobStatus 0 (insert into jobs (jobstatus) values(1)) >updateattr /jobs jobStatus 1 jobID>100 (update jobs set jobStatus=1 where JobID>100) Grid Tutorial per i Laboratori Nazionali del Sud – Catania, 25th-27th Feb 2008 9 AMGA Datatypes • AMGA Datatypes • Using the above datatypes you are sure that your metadata can be easily moved to all supported back-ends • If you do not care about DB portability, you can use, in principle, as entry attribute type ALL the datatypes supported by the back-end, even the more esoteric ones (PostgreSQL Network Address type or Geometric ones) Grid Tutorial per i Laboratori Nazionali del Sud – Catania, 25th-27th Feb 2008 10 AMGA Implementation • C++ multiprocess server Metadata Server – Backends Oracle Oracle, MySQL, PostgreSQL, SQLite Client SOAP MD Server – Front Ends TCP Streaming Client Postgre SQL MySQL TCP Streaming • High performance • Client API for C++, Java, Python, Perl, Ruby SQLite SOAP (web services) • Interoperability • Scalability • Standalone Python Library implementation – Data stored on file system Python Interpreter Client Metadata Python API filesystem Grid Tutorial per i Laboratori Nazionali del Sud – Catania, 25th-27th Feb 2008 11 Security • Access control – All entries in a directory sharing the same ACL – Groups of users are also supported (Unix style permissions) • Secure connections – SSL – Provided by web services • Client Authentication is based on – Username/password – General X509 certificates – Grid-proxy certificates (VOMS - Virtual Organization Management System is supported) VO MS Authenticate with X509 Cert VOMS-Cert with Group & Role information VOMS-Cert Resource management Orac AMG le A Grid Tutorial per i Laboratori Nazionali del Sud – Catania, 25th-27th Feb 2008 12 Metadata Replication I • AMGA provides an replication/federation mechanisms • Motivation – – – – – Scalability – Support hundreds/thousands of concurrent users Geographical distribution – Hide network latency Reliability – No single point of failure DB Independent replication – Heterogeneous DB systems Disconnected computing – Off-line access (laptops) • Models – Asynchronous replication – Master-slave – writes only allowed on the master – Application level replication Replicate Metadata commands – Proxy The proxy forwards metadata only (works as a front end) Grid Tutorial per i Laboratori Nazionali del Sud – Catania, 25th-27th Feb 2008 13 Metadata Replication II Full replication Federation Partial replication Proxy Redirected Commands Metadata Commands Grid Tutorial per i Laboratori Nazionali del Sud – Catania, 25th-27th Feb 2008 14 Conclusion • AMGA – Metadata Service of gLite – Part of gLite 3.1 – Useful to realize simple Relational Schemas – Integrated on the Grid Environment (Security) • Tests show good performance/scalability • Already deployed by several Grid Applications – LHCb, ATLAS, Biomed, … • AMGA Web Site http://project-arda-dev.web.cern.ch/project-arda-dev/metadata/ Grid Tutorial per i Laboratori Nazionali del Sud – Catania, 25th-27th Feb 2008 15 Use case: Biomed • Medical Data Manager – MDM – Store and access medical images and associated metadata on the Grid – Built on top of gLite 1.5 data management system – Demonstrated at last EGEE conference (October 05, Pisa) • Strong security requirements – Patient data is sensitive – Data must be encrypted – Metadata access must be restricted to authorized users • AMGA used as metadata server GUID Images Date Patient ID Patient Doctor Name Doctor Hospital – Demonstrates authentication and encrypted access – Used as a simplified DB • More details at – https://uimon.cern.ch/twiki/bin/view/EGEE/DMEncryptedStorage Grid Tutorial per i Laboratori Nazionali del Sud – Catania, 25th-27th Feb 2008 16 gMOD: grid Movie On Demand • gMOD provides a Video-On-Demand service • User chooses among a list of video and the chosen one is streamed in real time to the video client of the user’s workstation • For each movie a lot of details (Title, Runtime, Country, Release Date, Genre, Director, Case, Plot Outline) are stored and users can search a particular movie querying on one or more attributes • Two kind of users can interact with gMOD: TrailersManagers that can administer the db of movies (uploading new ones and attaching metadata to them); GILDA VO users (guest) can browse, search and choose a movie to be streamed. Grid Tutorial per i Laboratori Nazionali del Sud – Catania, 25th-27th Feb 2008 17 gLibrary - Multimedia CMS • Motivations – Huge amounts of data can be saved on SEs, but how can we easily find later a file that we need? (if you have good memory, its GUID could be a solution but it is not so easy) File Catalogues just let us to arrange files in folders and subfolders, no way to query on their contents Metadata Catalogues are a possible solution, but not always “affordable” especially for non expert users (powerful but complex to use) • Requirements – easy to use, fast, secure, extensible – Multimedia files Images Movies Audio Files Office Documents (Powerpoint, Word, Excel, OpenOffice) E-Mails, PDFs, HTMLs Customized versions of well-know document type (ex. EGEE PPTs) Grid Tutorial per i Laboratori Nazionali del Sud – Catania, 25th-27th Feb 2008 21 Use Case: ADAT Grid Tutorial per i Laboratori Nazionali del Sud – Catania, 25th-27th Feb 2008 28 AMGA Web Interface Grid Tutorial per i Laboratori Nazionali del Sud – Catania, 25th-27th Feb 2008 29 High Level Requirements • Group Management – – – – Group list/add/drop Group membership list Group ownership list Add/Remove user and group association • User Management – User list/create/delete – User subject change • Collection Management – collection tree browse/create/delete – collection ACL management list group, add group, drop group change mode for owner/change owner • Metadata management – entry listing/searching – entry create/modify/delete – schema management attribute listing/create/clear/delete Grid Tutorial per i Laboratori Nazionali del Sud – Catania, 25th-27th Feb 2008 30 Standard Multi-layer Architecture Data Presentation Layer: consists of all web pages that make users able to access all provided features. These pages publish dynamic contents managed by DHTML and Ajax (Asynchronous JavaScript And AMGA Web Interface XML) libraries . They work with both logic components to perform data manipulation and with access components to retrieve and publish data. Data Presentation Layer Logic application layer AMGA API AMGA API Data Access Layer (Amga API) AMGA Service AMGA Metadata catalog Logic application layer is made up by all the software modules that encapsulate the implementation of the provided features (metadata handling and manipulation). Data Access Layer: implements all the software components than ensure the data extraction from the AMGA server. These components work as services invoked by the web pages and they provide a mechanism to retrieve data and publish dynamic content. Grid Tutorial per i Laboratori Nazionali del Sud – Catania, 25th-27th Feb 2008 31 Software Architecture The core of the application is designed to be a plug-in for general purpose applications that adopt metadata on Grid. Its design covers several Object Oriented Design Patterns (Singleton, Strategy method, Factory method, Template Method, Iterator and Composite). This ensures a very clean and simple software architecture with an high degree of cohesion and decoupling. AMGA Web Interface Management Web Pages Collection Manager Entry Manager Attribute Manager ACL Manager Group Manager Group Manager AMGA API AMGA API engine is than generic for any application that needs to integrate Metadata Usage. Every component is built on top the Official AMGA Java API. Grid Tutorial per i Laboratori Nazionali del Sud – Catania, 25th-27th Feb 2008 32 Deployment Plan Application can be deployed on a dedicated server machine located inside the GRID boundaries or outside. Deployed on the GILDA t-Infrastructure Currently the GILDA AMGA Server machine also hosts the web interface. J2EE application Web front-end available at https://amga.ct.infn.it:8443/amgawi/ Application server runs Apache Tomcat 5.0 on a Fedora Core 5 Linux Machine. Internet Clients GRID AmgaWi Application Server Users interact to the catalog through functionalities provided by the web interface. Metadata Service AMGA Server User uses a common Web Browser. Grid Tutorial per i Laboratori Nazionali del Sud – Catania, 25th-27th Feb 2008 33 Collection Management Modify Schema Instance Delete entry Grid Tutorial per i Laboratori Nazionali del Sud – Catania, 25th-27th Feb 2008 34 Tool Bars Overview “Address” bar Go! back to parent add collection type collection name new collection new entry bulk upload search entry Modify Schema ACL management Grid Tutorial per i Laboratori Nazionali del Sud – Catania, 25th-27th Feb 2008 35 Add Entry Grid Tutorial per i Laboratori Nazionali del Sud – Catania, 25th-27th Feb 2008 36 Modify Entry Grid Tutorial per i Laboratori Nazionali del Sud – Catania, 25th-27th Feb 2008 37 Metadata Schema Management Grid Tutorial per i Laboratori Nazionali del Sud – Catania, 25th-27th Feb 2008 38 ACL Management Grid Tutorial per i Laboratori Nazionali del Sud – Catania, 25th-27th Feb 2008 39 Group Management Grid Tutorial per i Laboratori Nazionali del Sud – Catania, 25th-27th Feb 2008 40 Group Ownership Grid Tutorial per i Laboratori Nazionali del Sud – Catania, 25th-27th Feb 2008 41 Group Membership Grid Tutorial per i Laboratori Nazionali del Sud – Catania, 25th-27th Feb 2008 42 User Management Grid Tutorial per i Laboratori Nazionali del Sud – Catania, 25th-27th Feb 2008 43 User Group Relationship Grid Tutorial per i Laboratori Nazionali del Sud – Catania, 25th-27th Feb 2008 44 Use Cases • ADAT Project – embeds engine of AMGA WI within the Digital Archive Software • Aiuri (Project COPPE/UFRJ - BRAZIL) – aims to implement Grid Oriented platform to support data and text mining applications.. • BM Portal project (Bio-Lab, DIST University of Genoa ) – embeds the engine of AMGAWI as a plug-in • GILDA Team – adopts the AMGA Web Interface for dissemination and training purposes. • EGEE Respect program – candidate as recommended external software Grid Tutorial per i Laboratori Nazionali del Sud – Catania, 25th-27th Feb 2008 45 Conclusions • AMGA WI challenge is to offer – a flexible, multiplatform, secure, reusable and easy-to-use system to handle AMGA metadata for files stored on a distributed Grid infrastructure • Flexible – it allows to handle any kind of metadata schema defined within the server • Multiplatform – implemented as a Java Web Application can be used on every platforms • Secure – GSI compliant (x509 proxy if required) • Reusable – The engine can be embedded into bigger application • Easy-to-use – its intuitive web interface allows to manage metadata with a just a few mouse clicks Grid Tutorial per i Laboratori Nazionali del Sud – Catania, 25th-27th Feb 2008 46 Questions… Than you very much for your kind attention! Grid Tutorial per i Laboratori Nazionali del Sud – Catania, 25th-27th Feb 2008 47