Comments
Transcript
Linked data and its role in the semantic web @der42
Linked data and its role in the semantic web Dave Reynolds, Epimorphics Ltd @der42 Roadmap What is linked data? Strengths and weaknesses Modelling other topics Examples image: Leo Oosterloo @ flickr.com Access Linked data intro Linked data ... publishing data on the web ... ... to enable integration, linking and reuse across silos Can’t we just publish data as files? pdf easy to read and publish Excel allows further processing and analysis processing without need for proprietary tools csv But ... structure of data not explained no connection between different data sets, silos static and fixed – can’t retrieve just slices relevant to problem Linked data Apply the principles of the web to publication of data The web: is a global network of pages each identified by a URL fetching a URL gives a document pages connected by links open, anyone can say anything about anything else Linked data Apply the principles to the web to publication of data The linked data web: is a global network of things each identified by a URI fetching a URI gives a set of statements things connected by typed links open, anyone can say anything about anything else Linked data is “data you can click on” Example schools information http://education.data.gov.uk/id/school/401874 Example schools information http://education.data.gov.uk/id/school/401874 label “Cardiff High School” phase district “Cardiff” “Secondary” Example schools information http://education.data.gov.uk/id/school/401874 label “Cardiff High School” phase district http://statistics.data.gov.uk/id/local-authority-district/00PT school:PhaseOfEducation_Secondary label “Cardiff” Example schools information http://education.data.gov.uk/id/school/401874 label “Cardiff High School” phase district school:PhaseOfEducation_Secondary http://statistics.data.gov.uk/id/local-authority-district/00PT label “Cardiff” http://data.ordnancesurvey.co.uk/id/7000000000025484 contains ward extent contains parish GML: 310499.4 184176.6 310476.5 ... Example schools information http://education.data.gov.uk/id/school/401874 label “Cardiff High School” phase district school:PhaseOfEducation_Secondary http://statistics.data.gov.uk/id/local-authority-district/00PT label “Cardiff” same as http://data.ordnancesurvey.co.uk/id/7000000000025484 contains ward extent contains parish GML: 310499.4 184176.6 310476.5 ... Linked data principles Use URIs as names for things Use HTTP URIs so that people can look up those names When someone looks up a URI, provide useful information, using the standards (RDF*, SPARQL) Include links to other URIs, so that they can discover more things Pattern of application of semantic web stack Linked open data cloud: 2007 Linking Open Data cloud diagram, by Richard Cyganiak and Anja Jentzsch. http://lod-cloud.net/ Linked open data cloud: 2009 Linking Open Data cloud diagram, by Richard Cyganiak and Anja Jentzsch. http://lod-cloud.net/ Linked open data cloud: 2010 Linking Open Data cloud diagram, by Richard Cyganiak and Anja Jentzsch. http://lod-cloud.net/ Data.gov.uk – linked datasets and APIs Data.gov.uk visualizations on top of linked data Ordnance survey Environment agency - data, API, visualizations BBC – integration and site design E-commerce and rich snippets Overstock.com Peek-cloppenburg.de Internal use Open? Linked open data = linked data + open data Modelling Modelling Thing, entity, concept ... resource resource being described abstract concept real world thing data item, particular measurement document identify by URI provide information making statements about those resources identifier NOT a container c.f. UML open schema critical to open extensibility and integration similar to Entity-Attribute-Value modelling Modelling – RDF – Resource Description Framework Statement, triple, logical assertion Subject Predicate Object Modelling – RDF Statement, triple, logical assertion Subject Predicate Object some school has a name/label some literal Modelling – RDF Statement, triple, logical assertion Subject Predicate http://education.data.gov.uk has a name/label /id/school/401874 Object “Cardiff High School” Modelling – RDF Statement, triple, logical assertion Subject Predicate Object http://education.data.gov.uk http://www.w3.org/2000/01/rd “Cardiff High School” f-schema#label /id/school/401874 Modelling – RDF Statement, triple, logical assertion Subject Predicate Object school:401874 rdfs:label “Cardiff High School” where school: = http://education.data.gov.uk/id/school/ rdfs: = http://www.w3.org/2000/01/rdf-schema# Modelling – RDF Statement, triple, logical assertion Subject Predicate Object school:401874 rdfs:label “Cardiff High School” school:401874 ont:districtAdministrative la:00PT la:00PT rdfs:label Cardiff Modelling – RDF Statement, triple, logical assertion Subject Predicate Object school:401874 rdfs:label “Cardiff High School” school:401874 ont:districtAdministrative la:00PT la:00PT rdfs:label “Cardiff” school:401874 rdfs:label “Cardiff High School” ont:districtAdministrative la:00PT rdfs:label “Cardiff” Modelling – RDF Statement, triple, logical assertion Subject Predicate Object school:401874 rdfs:label “Cardiff High School” school:401874 ont:districtAdministrative la:00PT la:00PT rdfs:label “Cardiff” la:00PT rdfs:label “Caerdydd”@cy RDF Syntaxes RDF/XML normative Turtle more human readable/writable being standardized RDFa embed in (X)HTML [others omitted] Modelling – RDF RDF/XML syntax Subject Predicate Object school:401874 rdfs:label “Cardiff High School” school:401874 ont:districtAdministrative la:00PT la:00PT rdfs:label “Cardiff” <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:ont="http://education.data.gov.uk/def/school/" xmlns:la="http://statistics.data.gov.uk/id/local-authority-district/" xmlns:school="http://education.data.gov.uk/id/school/" xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#"> <rdf:Description rdf:about="http://education.data.gov.uk/id/school/401874"> <rdfs:label>Cardiff High School</rdfs:label> <ont:districtAdministrative> <rdf:Description rdf:about="http://statistics.data.gov.uk/id/local-authority-district/00PT"> <rdfs:label>Cardiff</rdfs:label> </rdf:Description> </ont:districtAdministrative> </rdf:Description> </rdf:RDF> Modelling – RDF Turtle syntax Subject Predicate Object school:401874 rdfs:label “Cardiff High School” school:401874 ont:districtAdministrative la:00PT la:00PT rdfs:label “Cardiff” @prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> . @prefix school: <http://education.data.gov.uk/id/school/> . @prefix ont: <http://education.data.gov.uk/def/school/> . @prefix la: <http://statistics.data.gov.uk/id/local-authority-district/> . school:401874 rdfs:label "Cardiff High School"; ont:districtAdministrative la:00PT . la:00PT rdfs:label "Cardiff" . Modelling Vocabularies so far no actual models, let alone semantics want to define types of thing : Class what you can say about them : Property encode definitions in more RDF and publish at the corresponding URIs link from data to data model reuse published vocabularies to enable integration freely combine different vocabularies or new ones Modelling – vocabularies Logical modelling modelling the domain, not a particular data structure what exists what is asserted? what can you deduce from that? not about constraints as such monotonic, open world controlled vocabulary Ontology thesaurus taxonomy ontology Modelling – vocabularies unfamiliar terminology but related to information architecture and conceptual modelling domain-driven design ... and yes knowledge representation Modelling – RDFS RDF vocabulary description language classes, types and type hierarchy rdfs:label ont:School rdf:type “School” rdfs:Class Modelling – RDFS RDF vocabulary description language classes, types and type hierarchy rdfs:label ont:School “School” rdf:type rdfs:subClassOf ont:WelshEstablishment rdfs:Class rdf:type Modelling – RDFS RDF vocabulary description language classes, types and type hierarchy rdfs:label ont:School “School” rdf:type rdfs:subClassOf ont:WelshEstablishment rdfs:Class rdf:type ont:WelshEstablishment rdf:type school:401874 Modelling – RDFS RDF vocabulary description language classes, types and type hierarchy rdfs:label ont:School rdf:type rdfs:subClassOf ont:WelshEstablishment “School” rdf:type rdfs:Class school:401874 rdf:type ont:WelshEstablishment ⇒ ont:WelshEstablishment school:401874 “School” rdf:type rdfs:label ont:School Modelling – RDFS RDF vocabulary description language properties, property hierarchy ont:staffAt rdf:type person:JoeBloggs rdf:Property ont:headOf rdfs:subPropertyOf school:401874 ont:headOf ⇒ ont:staffAt person:JoeBloggs school:401874 ont:headOf Modelling – RDFS RDF vocabulary description language class/property relations domain range Already have power to do some vocabulary mapping declare classes or properties from different vocabularies to be equivalent: A rdfs:subClassOf B B rdfs:subClassOf A Modelling - OWL richer modelling and semantics axioms on properties axioms on classes some value from, all values from, cardinality, has value, one of, keys axioms on individuals intersection, union, disjoint, equivalent restrictions on classes transitive, symmetric, inverseOf, ... functional, inverse functional equivalent property same as, different from, all different imports Modelling – OWL supports much richer modelling consistency checking of model consistency checking of data some surprises if used to schema languages open world, no unique name assumption can extend to closed world checking inference classification inferred relationships Modelling Spectrum of goals and styles Lightweight vocabularies simple modelling just enough agreement to get useful work done removing boundaries to enable information to be found and connected global consistency not possible a little semantics goes a long way Rich ontological models rich domain models need expressivity consistency is critical make complex inferences you can rely on, across data you trust knowledge is power Modelling Ontology reuse invest in complete ontology for a domain reuse small, common, vocabularies rich but general model, may be modular inside strong “ontological commitment” e.g. medical ontologies FOAF, SIOC, Dublin Core, Org ... pick and choose classes and properties you need fill in a few missing links for your domain generic reusable vocabularies Data cube vocabulary Accessing all this data link following HTTP GET, follow links, aggregate relevant statements query SPARQL SPARQL core idea is pattern matching graph patterns with variables any subgraph which matches yields row of bindings ?school ont:districtAdministrative [] rdfs:label “Cardiff” syntax based on Turtle syntax for RDF web API endpoints lots of power filters optionals named graphs sub-queries property chains aggregation federated query update construct Accessing all this data link following query SPARQL linked data API HTTP GET, follow links, aggregate relevant statements RESTful API onto linked data resources simple query, usable without RDF stack, web dev friendly easy to layer visualizations and UIs on top third parties search engines and aggregators e.g. Sindice, sameAs.org Semantic web layer cake Strengths and weaknesses image: spcbrass @ flickr.com Strengths data integration use of global identifiers (URIs) composable – statements v. containers, schemaless linking, vocabulary mapping extensible, incremental, decentralized, resilient no global ontology/schema to develop or maintain freely add terms from other vocabularies open world assumption modelling and data entwined link data to models, data in context use same technology to share, manage extend models supports inference and classification rich access routes web linking, download, query, web APIs Weaknesses complexity of the stack performance of schema-less stores alphabet soup – RDF, RDFS, OWL, SPARQL, RIF .. unfamiliar “ontology”, “logical entailment” • use the parts you need lots of arcane details • tooling e.g. Linked Data API • core ideas not that complex RDF/XML syntax optimization challenges limited validation and constraints • technology improving steadily • hybrid solutions • closed world checkers cost of modelling,ontology development • ontology reuse • generic ontologies (data cube) • tools no inbuilt notions of time, uncertainty • model on top Wrapping up image: erika g. @ flickr.com Things we missed out RDF nuances linked data nuances OWL species, serializations, lots of details Other technologies in the stack URI for thing v. web page, content negotiation, httprange-14 URI architecture OWL nuances blank nodes, containers and collections named graphs SPARQL update, rules (RIF), GRDDL, Powder, Geo SPARQL, RDB mapping, triple/quad stores Embedding structured data in markup RDFa, micro formats, micro data, schema.org and all that Hot topics Government linked data identifiers to seed linked data data publication structured data and search engines rich snippets, structured results, SEO search => question answering user interfaces transparency, improving services, economic growth visualization, exploration, exploiting linking data as a service fin.