Comments
Transcript
A new Union Catalogue generation in Linked Open Data
34st ADLUG ANNUAL MEETING 2015 American University of Rome 21-23 October 2015 A new Union Catalogue generation in linked open data Tiziana Possemato @Cult A new model to create a Union catalogue The project goals: the conversion in linked open data, the publishing of the dataset in RDF, the production of a unique portal with a userfriendly interface of bibliographic and authority data coming from different universities in the south of Italy: • • • • • • Università Università Università Università Università Università degli degli degli degli degli degli Copyright 2008 @CULT. All rights reserved Studi di Napoli Federico II (Napoli) Studi di Napoli L’Orientale (Napoli) Studi di Napoli Parthenope (Napoli) Studi di Salerno (Salerno) Studi del Sannio (Benevento) Studi della Basilicata (Potenza) The original data The original data used in the project comes from different ILS (Aleph e Sebina) and they are in Marc format (Unimarc): • • • • • • Università Università Università Università Università Università degli degli degli degli degli degli Studi di Napoli Federico II => Aleph Studi di Napoli L’Orientale => Sebina Studi di Napoli Parthenope => Aleph Studi di Salerno => Aleph Studi del Sannio => Sebina studi della Basilicata => Aleph The project takes in consideration two different types of data: • Bibliographic records • Authority records Copyright 2008 @CULT. All rights reserved Biblioteche Individuazione Bib1 Raccolta Bib2 Selezione Bib4 Elaborazione Bib3 Bib6 Bib5 (Elaborazione dati in RDF) Search Engine FRBR RDF Store Functional Requirements for Bibliographic Records LOD Cloud Linked Open Services Platform Copyright 2008 @CULT. All rights reserved The portal: a three layers architecture Person/Works Contra academicos De Beatâ Vitâ Manifestations Item Copyright 2008 @CULT. All rights reserved De civitate Dei contra paganos The portal: a three layers architecture 1°- Person/Work: the set of data related to Person and Works, in RDF (linked open data), saved in a SPARQL endpoint and made available by specific search and presentation functions. The variant forms of name for Person and the Work titles coming from local authority files and VIAF. 2°- Manifestations: bibliographic data indexed in SOLR, that is able to produce new different data aggregations in facets (such as publication date, language, publisher, edition, etc.); this layer gives to users a great series of search and navigation functions. 3°- Item: holdings data, related to copy information, coming from local OPAC or local system of each specific library. Copyright 2008 @CULT. All rights reserved The controlled name access point Production of a unique name access point for Person names, containing all preferred and variant forms of name, used both in authority files and in different bibliographic catalogues, finally enriched with VIAF forms. A) Different name forms for sant’Agostino in the authority file of Università degli Studi di Napoli Federico II: Augustinus, Aurelius <354-430> < Augustinus Hipponensis <354-430> < Aurelius Augustinus Hipponensis <354-430> < Agostino d’Ippona <santo ; 354-430> < Agostino, Aurelio <santo ; 354-430> Copyright 2008 @CULT. All rights reserved The controlled name access point B) Different name forms for sant’Agostino in bibliographic data of Catalogo Università degli Studi di Napoli Federico II: Augustinus, Aurelius <354-430> < Augustinus, Aurelius santo < Augustinus Hipponensis 354-430 < Aurelius Augustinus Hipponensis 354-430 <Agostino d’Ippona santo ; 354-430 < Agostino, Aurelio santo ; 354-430>. Copyright 2008 @CULT. All rights reserved The controlled name access point C) Different name forms for sant’Agostino in bibliographic data of Catalogo Università degli Studi del Sannio: Augustinus, Aurelius <santo> < Augustinus, Aurelius santo ; ; pseudo < Augustinus santo < Augustinus, Aurelius < Augustinus : von Hippo santo < Agostino : d'Ippona santo < Agostino santo Copyright 2008 @CULT. All rights reserved The controlled name access point D) Different name forms for sant’Agostino in bibliographic data of Catalogo Università degli Studi della Basilicata: Augustinus, Aurelius santo Augustinus, Aurelius santo ; 354-430 <Agostino, Aurelio santo> <Augustinus Hipponensis> <Aurelius Augustinus Hipponensis> <Agostino d’Ippona santo> Copyright 2008 @CULT. All rights reserved The controlled name access point • • • • • • • • • • • • • • Augustinus, Aurelius <354-430> Augustinus, Aurelius santo Augustinus Hipponensis 354-430 Aurelius Augustinus Hipponensis 354-430 Agostino d’Ippona santo ; 354-430 Agostino, Aurelio santo ; 354-430 Augustinus, Aurelius santo Augustinus, Aurelius santo ; ; pseudo Augustinus santo Augustinus, Aurelius Augustinus : von Hippo santo Agostino : d'Ippona santo Agostino santo … (etc.) Copyright 2008 @CULT. All rights reserved VIAF API The name cluster for sant’Agostino: The controlled name access point The Person cluster for sant’Agostino: • • • • • • • • • • Name cluster ID 245 • • • • UNION CATALOGUE (search for Manifestations with • • Name ID cluster 245) Copyright 2008 @CULT. All rights reserved Augustinus, Aurelius <354-430> (AUF) (VIAF) Augustinus, s., vesc. d’Ippona, 354-430 (VIAF) Augustine, Saint, Bishop of Hippo (VIAF) Augustinus, Aurelius santo (FED) (BAS) Augustinus Hipponensis 354-430 (FED) Aurelius Augustinus Hipponensis 354-430 (AUF) (FED) (BAS) Agostino d’Ippona santo ; 354-430 (FED) Agostino, Aurelio santo ; 354-430 (FED) (BAS) Augustinus, Aurelius santo (BAS) Augustinus, Aurelius <santo> (ORI) (SAL) Augustinus santo (SAN) Augustinus, Aurelius (SAN) Augustinus : von Hippo santo (SAN) Agostino : d'Ippona santo (SAN) Agostino santo (SAN) Augustinus, Aurelius santo ; 354-430 (BAS) Retrieve Works associated to Person: the controlled title access point The Title cluster for Varro, Marcus Terentius (ID 69518): • • • • • • • • Title cluster ID 855 • • • UNION CATALOGUE • (search for Manifestations with • Title ID cluster 855) Copyright 2008 @CULT. All rights reserved De lingua latina (ID 986) Saturae menippeae (ID 1314) De vita populi romani ad Atticum (ID 2135) De gente populi romani (ID 2136) Antiquitates (ID 2137) Logistorika (ID 2138) De re rustica (ID 855) Res rusticae (FED) Économie rurale (SAL) (VIAF) (FED) Del Camp (SAL) Gespräche über die Landwirtschaft (FED) Varro the farmer a selection from the Res rusticae Rerum rusticarum libri tres (VIAF) (BAS) (FED) Layer 1 (Person/Works) & Layer 2 (Manifestations) Association of Person and Title cluster ID to Manifestation record =LDR =001 =005 =010 =010 =100 =101 =102 =200 =210 =215 =225 =300 =300 =410 =500 =676 =700 =702 =702 =801 =997 01350nam 2200289 450 000005093 20150112111102.0 \\$a2-251-00329-6$bvol. 2 \\$a2-251-01400-4$bvol. 3 \\$a20010205d--------km-y0itay0103----ba 2\$alat$afre \\$aFR 1\$aÉconomie rurale$fVarron \\$aParis$c<<Les>> belles lettres \\$a3 volumi$d20 cm 2\$aCollection des universités de France \\$aTraduzione francese con testo latino a fronte \\$aSulla copertina del terzo tomo è erroneamente riportata la dicitura "livre IV" \0$12001$aCollection des universités de France Title ID in $9 11$aDe re rustica / Varro, Marcus Terentius$9855 \\$a630.93$v(22. ed.)$9Agricoltura. Mondo antico \1$aVarro,$bMarcus Terentius$f<116-27 a.C.>$069518 Person ID \1$aHeurgon,$bCharles \1$aGuiraud,$bCharles \0$aIT$bUniversità della Basilicata - B.I.A.$gREICAT$2unimarc \\$aUNIBAS Copyright 2008 @CULT. All rights reserved in $0 How enrich the Manifestations with Title of Work Servomechanism practice in Università Parthenope => Uniform title in tag 500 =LDR =005 =100 =101 =102 =105 =200 =210 =215 =500 =610 =676 =700 =801 =951 =997 00673nam0 2200217 450 =001 000021998 20090123105405.0 \\$a20090123d1954----km-y0itay50------ba 0\$aeng \\$aGB \\$aa-------001yy 1\$aServomechanism practice$fWilliam R. Ahrendt \\$aLondon [etc.]$cMcGraw-Hill$dc1954 \\$aVII, 349 p.$cill.$d24 cm 10$aServomechanism practice$912013 1\$aServomeccanismi \\$a629.8$v21$9Ingegneria dei controlli automatici \1$aAhrendt,$bWilliam Robert$030 \0$aIT$bUNIPARTHENOPE$c20090123$gRICA$2UNIMARC \\$aS 629.8/6$bS A, 252$cDSA$d2009 \\$aUNIPARTHENOPE Copyright 2008 @CULT. All rights reserved How enrich the Manifestations with Title of Work Servomechanism practice in Università Federico II => Uniform title added in tag 995 =LDR =001 =005 =100 =101 =105 =200 =205 =210 =215 =610 =676 =700 =702 =801 =995 =997 00652nam0 22002291i 450 000000022 20030908103336.0 \\$a20020821d--------km-y0itay50------ba 0\$aita \\$ay-------001yy 1\$aServomechanism practice$fW.R. Ahrendt, C.J. Savant \\$a2nd ed. \\$aNew York$cMcGraw-Hill Book Company$d1960 \\$aXV, 566 p.$cill.$d24 cm 0\$aServomeccanismi \\$a629.8 \1$aAhrendt,$bW. R.$030 \1$aSavant,$bClement J.$c<jr.> \0$aIT$bUNINA$gRICA$2UNIMARC \\$aServomechanism practice$912013 \\$aUNINA Copyright 2008 @CULT. All rights reserved APIs VIAF to retrieve data An important step of the project to retrieve data from VIAF: using the specific APIs, we retrieve the info for Person/Family/Corporate body and Work/Expression layer. For each VIAF ID we retrieve the following info: a) Variant forms on name for each ID: they will be used to make possible, to end users and search engine, to extend the search functions using all possible forms of names, in different string texts and alphabets. b) Works associated to specific VIAf ID of Person: used to combine Person with his Works, considering (filtering) only title works for wich exists al least one Manifestation in the original catalogues. Copyright 2008 @CULT. All rights reserved The conversion process from Marc to RDF RESOURCES METADATA CREATORS (Librarians, curators) ALIADA BROWSERS (GOOGLE) Library Management System (ILS) Museum Collection Management System (MMS) IT COMPANIES OTHER PUBLIC AND CULTURAL INSTITUTIONS LINKED DATA CLOUD http://lod-cloud.net/ Content Management System (CMS) Copyright 2008 @CULT. All rights reserved © 2015 Aliada Consortium 18 Data conversion with ALIADA Dublin Core Validation of Input Data RDF output MARCXML2RDF ALIADA ontology VALIDATION RDF DublinCore2RDF translation LIDO2RDF USER INTERFACE Other RDF-izer RDF-izer CONVERSION RDF Triple Store endpoint LINKING PUBLICATION Linked Data Server Copyright 2008 @CULT. All rights reserved Links Discovery Creation CKAN DataHub page VALIDATION Linked Dataset Ontologies used in the project Other ontologies added to Aliada ontologies: • DCMI Metadata Terms • RDF Schema • RDA elements • BIBFRAME Copyright 2008 @CULT. All rights reserved The final result of conversion: - RDF dataset for Persons and Works - RDF dataset for Manifestations (Elaborazione dati in RDF) Search Engine RDF Store LOD Cloud Linked Open Services Platform Copyright 2008 @CULT. All rights reserved Where we are (with the project)… Task Definition of project requirements Status and Date [closed] Analysis of bibliographic and authority data [closed] Data modelling (definition of ontologies and mapping) [closed] Conversion data in RDF [in progress] – Within 30.11.2015 Definition of a portal to use as prototype [in progress] – Within 31.10.2015 Meeting with librarians to define better portal search and discovery functions Within 30.11.2015 Implementation activities Within 31.12.2015 Go live January 2016 Copyright 2008 @CULT. All rights reserved The project as Union catalogue model The Union catalogue of Campania and Basilicata Universities in linked open data The core for a Nation-wide Union Catalogue project Copyright 2008 @CULT. All rights reserved 23 Union catalogue - Hypothesis of a project 1°Physical union catalogue: one LMS, one catalogue, one OPAC => Trento Union Catalogue model 2°Central database (hub) with download of physical data (shared data, in different original formats) 3°A network of libraries grouped in local hubs (each local hub served by different LMS), linked to a Central Index system, core of the network (local hub are connected to Central Index via API) 4°Union catalogue in RDF (linked open data) plus a complex API layer to connect different ILS Copyright 2008 @CULT. All rights reserved RDF Store LOD Cloud Copyright 2008 @CULT. All rights reserved API layer RDF Union Catalogue www.atcult.it Grazie Tiziana Possemato – @CULT [email protected] Copyright 2008 @CULT. All rights reserved Cel.: 3485810489