...

Linked data and its role in the semantic web @der42

by user

on
Category: Documents
102

views

Report

Comments

Transcript

Linked data and its role in the semantic web @der42
Linked data and its role in the
semantic web
Dave Reynolds, Epimorphics Ltd
@der42
Roadmap
What is linked
data?
Strengths and
weaknesses
Modelling
other topics
Examples
image: Leo Oosterloo @ flickr.com
Access
Linked data intro
Linked data ...
publishing data on the web ...
... to enable integration, linking and reuse
across silos
Can’t we just publish data as files?
pdf

easy to read and publish

Excel

allows further processing and analysis

processing without need for proprietary tools

csv

But ...



structure of data not explained
no connection between different data sets, silos
static and fixed – can’t retrieve just slices relevant to problem
Linked data
Apply the principles of the web to publication of data
The web:





is a global network of pages
each identified by a URL
fetching a URL gives a document
pages connected by links
open, anyone can say anything about anything else
Linked data
Apply the principles to the web to publication of data
The linked data web:





is a global network of things
each identified by a URI
fetching a URI gives a set of statements
things connected by typed links
open, anyone can say anything about anything else
Linked data is “data you can click on”


Example schools information
http://education.data.gov.uk/id/school/401874
Example schools information
http://education.data.gov.uk/id/school/401874
label
“Cardiff High School”
phase
district
“Cardiff”
“Secondary”
Example schools information
http://education.data.gov.uk/id/school/401874
label
“Cardiff High School”
phase
district
http://statistics.data.gov.uk/id/local-authority-district/00PT
school:PhaseOfEducation_Secondary
label
“Cardiff”
Example schools information
http://education.data.gov.uk/id/school/401874
label
“Cardiff High School”
phase
district
school:PhaseOfEducation_Secondary
http://statistics.data.gov.uk/id/local-authority-district/00PT
label
“Cardiff”
http://data.ordnancesurvey.co.uk/id/7000000000025484
contains ward
extent
contains parish
GML: 310499.4 184176.6 310476.5 ...
Example schools information
http://education.data.gov.uk/id/school/401874
label
“Cardiff High School”
phase
district
school:PhaseOfEducation_Secondary
http://statistics.data.gov.uk/id/local-authority-district/00PT
label
“Cardiff”
same as
http://data.ordnancesurvey.co.uk/id/7000000000025484
contains ward
extent
contains parish
GML: 310499.4 184176.6 310476.5 ...
Linked data principles




Use URIs as names for things
Use HTTP URIs so that people can look up those names
When someone looks up a URI, provide useful
information, using the standards (RDF*, SPARQL)
Include links to other URIs, so that they can discover
more things
Pattern of application of semantic web
stack
Linked open data cloud: 2007
Linking Open Data cloud diagram, by Richard Cyganiak and Anja Jentzsch. http://lod-cloud.net/
Linked open data cloud: 2009
Linking Open Data cloud diagram, by Richard Cyganiak and Anja Jentzsch. http://lod-cloud.net/
Linked open data cloud: 2010
Linking Open Data cloud diagram, by Richard Cyganiak and Anja Jentzsch. http://lod-cloud.net/
Data.gov.uk – linked datasets and APIs
Data.gov.uk
visualizations on top of linked data
Ordnance survey
Environment agency
- data, API, visualizations
BBC – integration and site design
E-commerce and rich snippets
Overstock.com
Peek-cloppenburg.de
Internal use
Open?
Linked open data
=
linked data
+
open data
Modelling
Modelling
Thing, entity, concept ... resource

resource being described







abstract concept
real world thing
data item, particular measurement
document
identify by URI
provide information making statements about those
resources
identifier NOT a container c.f. UML



open schema
critical to open extensibility and integration
similar to Entity-Attribute-Value modelling
Modelling – RDF – Resource Description Framework
Statement, triple, logical assertion
Subject
Predicate
Object
Modelling – RDF
Statement, triple, logical assertion
Subject
Predicate
Object
some school
has a name/label
some literal
Modelling – RDF
Statement, triple, logical assertion
Subject
Predicate
http://education.data.gov.uk has a name/label
/id/school/401874
Object
“Cardiff High School”
Modelling – RDF
Statement, triple, logical assertion
Subject
Predicate
Object
http://education.data.gov.uk http://www.w3.org/2000/01/rd “Cardiff High School”
f-schema#label
/id/school/401874
Modelling – RDF
Statement, triple, logical assertion
Subject
Predicate
Object
school:401874
rdfs:label
“Cardiff High School”
where
school: = http://education.data.gov.uk/id/school/
rdfs: = http://www.w3.org/2000/01/rdf-schema#
Modelling – RDF
Statement, triple, logical assertion
Subject
Predicate
Object
school:401874
rdfs:label
“Cardiff High School”
school:401874
ont:districtAdministrative
la:00PT
la:00PT
rdfs:label
Cardiff
Modelling – RDF
Statement, triple, logical assertion
Subject
Predicate
Object
school:401874
rdfs:label
“Cardiff High School”
school:401874
ont:districtAdministrative
la:00PT
la:00PT
rdfs:label
“Cardiff”
school:401874
rdfs:label
“Cardiff High School”
ont:districtAdministrative
la:00PT
rdfs:label
“Cardiff”
Modelling – RDF
Statement, triple, logical assertion
Subject
Predicate
Object
school:401874
rdfs:label
“Cardiff High School”
school:401874
ont:districtAdministrative
la:00PT
la:00PT
rdfs:label
“Cardiff”
la:00PT
rdfs:label
“Caerdydd”@cy
RDF Syntaxes
RDF/XML

normative
Turtle


more human readable/writable
being standardized
RDFa

embed in (X)HTML
[others omitted]
Modelling – RDF
RDF/XML syntax
Subject
Predicate
Object
school:401874
rdfs:label
“Cardiff High School”
school:401874
ont:districtAdministrative
la:00PT
la:00PT
rdfs:label
“Cardiff”
<rdf:RDF
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:ont="http://education.data.gov.uk/def/school/"
xmlns:la="http://statistics.data.gov.uk/id/local-authority-district/"
xmlns:school="http://education.data.gov.uk/id/school/"
xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#">
<rdf:Description rdf:about="http://education.data.gov.uk/id/school/401874">
<rdfs:label>Cardiff High School</rdfs:label>
<ont:districtAdministrative>
<rdf:Description rdf:about="http://statistics.data.gov.uk/id/local-authority-district/00PT">
<rdfs:label>Cardiff</rdfs:label>
</rdf:Description>
</ont:districtAdministrative>
</rdf:Description>
</rdf:RDF>
Modelling – RDF
Turtle syntax
Subject
Predicate
Object
school:401874
rdfs:label
“Cardiff High School”
school:401874
ont:districtAdministrative
la:00PT
la:00PT
rdfs:label
“Cardiff”
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix school: <http://education.data.gov.uk/id/school/> .
@prefix ont: <http://education.data.gov.uk/def/school/> .
@prefix la: <http://statistics.data.gov.uk/id/local-authority-district/> .
school:401874
rdfs:label "Cardiff High School";
ont:districtAdministrative la:00PT .
la:00PT rdfs:label "Cardiff" .
Modelling
Vocabularies


so far no actual models, let alone semantics
want to define



types of thing : Class
what you can say about them : Property
encode definitions in more RDF
and publish at the corresponding URIs



link from data to data model
reuse published vocabularies to enable integration
freely combine different vocabularies or new ones
Modelling – vocabularies
Logical modelling

modelling the domain, not a particular data structure




what exists
what is asserted? what can you deduce from that?
not about constraints as such
monotonic, open world
controlled
vocabulary
Ontology
thesaurus
taxonomy
ontology
Modelling – vocabularies

unfamiliar terminology but related to

information architecture and conceptual modelling
domain-driven design

... and yes knowledge representation

Modelling – RDFS
RDF vocabulary description language

classes, types and type hierarchy
rdfs:label
ont:School
rdf:type
“School”
rdfs:Class
Modelling – RDFS
RDF vocabulary description language

classes, types and type hierarchy
rdfs:label
ont:School
“School”
rdf:type
rdfs:subClassOf
ont:WelshEstablishment
rdfs:Class
rdf:type
Modelling – RDFS
RDF vocabulary description language

classes, types and type hierarchy
rdfs:label
ont:School
“School”
rdf:type
rdfs:subClassOf
ont:WelshEstablishment
rdfs:Class
rdf:type
ont:WelshEstablishment
rdf:type
school:401874
Modelling – RDFS
RDF vocabulary description language

classes, types and type hierarchy
rdfs:label
ont:School
rdf:type
rdfs:subClassOf
ont:WelshEstablishment
“School”
rdf:type
rdfs:Class
school:401874
rdf:type
ont:WelshEstablishment
⇒
ont:WelshEstablishment
school:401874
“School”
rdf:type
rdfs:label
ont:School
Modelling – RDFS
RDF vocabulary description language

properties, property hierarchy
ont:staffAt
rdf:type
person:JoeBloggs
rdf:Property
ont:headOf
rdfs:subPropertyOf
school:401874
ont:headOf
⇒
ont:staffAt
person:JoeBloggs
school:401874
ont:headOf
Modelling – RDFS
RDF vocabulary description language

class/property relations



domain
range
Already have power to do some vocabulary mapping

declare classes or properties from different vocabularies to be
equivalent:
A rdfs:subClassOf B
B rdfs:subClassOf A
Modelling - OWL


richer modelling and semantics
axioms on properties




axioms on classes


some value from, all values from, cardinality, has value, one of, keys
axioms on individuals


intersection, union, disjoint, equivalent
restrictions on classes


transitive, symmetric, inverseOf, ...
functional, inverse functional
equivalent property
same as, different from, all different
imports
Modelling – OWL



supports much richer modelling
consistency checking of model
consistency checking of data




some surprises if used to schema languages
open world, no unique name assumption
can extend to closed world checking
inference


classification
inferred relationships
Modelling
Spectrum of goals and styles
Lightweight vocabularies





simple modelling
just enough agreement to
get useful work done
removing boundaries to
enable information to be
found and connected
global consistency not
possible
a little semantics goes a long
way
Rich ontological models





rich domain models
need expressivity
consistency is critical
make complex inferences
you can rely on, across data
you trust
knowledge is power
Modelling
Ontology reuse

invest in complete ontology for a domain




reuse small, common, vocabularies




rich but general model, may be modular inside
strong “ontological commitment”
e.g. medical ontologies
FOAF, SIOC, Dublin Core, Org ...
pick and choose classes and properties you need
fill in a few missing links for your domain
generic reusable vocabularies

Data cube vocabulary
Accessing all this data

link following


HTTP GET, follow links, aggregate relevant statements
query

SPARQL
SPARQL

core idea is pattern matching


graph patterns with variables
any subgraph which matches yields row of bindings
?school



ont:districtAdministrative
[]
rdfs:label
“Cardiff”
syntax based on Turtle syntax for RDF
web API endpoints
lots of power



filters
optionals
named graphs



sub-queries
property chains
aggregation



federated query
update
construct
Accessing all this data

link following


query


SPARQL
linked data API




HTTP GET, follow links, aggregate relevant statements
RESTful API onto linked data resources
simple query, usable without RDF stack, web dev friendly
easy to layer visualizations and UIs on top
third parties

search engines and aggregators e.g. Sindice, sameAs.org
Semantic web layer cake
Strengths and weaknesses
image: spcbrass @ flickr.com
Strengths





data integration
 use of global identifiers (URIs)
 composable – statements v. containers, schemaless
 linking, vocabulary mapping
extensible, incremental, decentralized, resilient
 no global ontology/schema to develop or maintain
 freely add terms from other vocabularies
 open world assumption
modelling and data entwined
 link data to models, data in context
 use same technology to share, manage extend models
supports inference and classification
rich access routes
 web linking, download, query, web APIs
Weaknesses

complexity of the stack





performance of schema-less stores


alphabet soup – RDF, RDFS, OWL, SPARQL, RIF ..
unfamiliar “ontology”, “logical entailment”
• use the parts you need
lots of arcane details
• tooling e.g. Linked Data API
• core ideas not that complex
RDF/XML syntax
optimization challenges
limited validation and constraints
• technology improving steadily
• hybrid solutions
• closed world checkers

cost of modelling,ontology development
• ontology reuse
• generic ontologies (data cube)
• tools

no inbuilt notions of time, uncertainty
• model on top
Wrapping up
image: erika g. @ flickr.com
Things we missed out

RDF nuances



linked data nuances



OWL species, serializations, lots of details
Other technologies in the stack


URI for thing v. web page, content negotiation, httprange-14
URI architecture
OWL nuances


blank nodes, containers and collections
named graphs
SPARQL update, rules (RIF), GRDDL, Powder, Geo SPARQL, RDB
mapping, triple/quad stores
Embedding structured data in markup

RDFa, micro formats, micro data, schema.org and all that
Hot topics

Government linked data


identifiers to seed linked data
data publication


structured data and search engines



rich snippets, structured results, SEO
search => question answering
user interfaces


transparency, improving services, economic growth
visualization, exploration, exploiting linking
data as a service
fin.
Fly UP