SWED Glossary
Community Portal - Controlled
Vocabularies - Metadata - N3
- Namespace - Ontology
- OWL - Portal - RDF
- RDF/XML - Semantic Web
- SKOS - Thesaurus - Taxonomy
- URI -URL - URN
- vCard -XML
Community Portal:
Web-based information portals provide a point of access onto an integrated
and structured body of information about some domain. They range from
very broad domains (e.g. all web pages - [http://www.yahoo.com/]
and [http://www.dmoz.com/]), to
topic specific domains (e.g. mathematics [http://www.math-net.de/],
fish species [http://www.fishbase.org/]).
Community portals are information portals which are also designed
to support and facilitate a community of interest. They typically
allow members of the community to contribute news and information
to the pool, either by submitting information to the portal (via some
editing or reviewing process) or by posting the information on some
associated web bulletin board or other collaboration tool.
Controlled Vocabularies:
The majority of cataloging or indexing systems (e.g. libraries and
product catalogues, etc.) use controlled vocabularies. That is they
index all items (e.g. library books) using a constrained set of terms
(e.g. in the Dewy decimal system for classification of books a book
about zoology will be classified under the term "Zoological Sciences").
The use of these controlled vocabularies means that it is very much
easier to be consistent in how things are classified and so makes
it easier to find relevant items.
Most often the controlled vocabularies are
not simply lists but have a structure (e.g. in Dewy the category"zoological
sciences' is sub-divided into Zoology, Invertebrates, etc.). There
are various degrees of structure used in classification schemes. Moving
from a simple list to a thesaurus (using broader and narrower terms
to give the structure) to a formal taxonomy which defines the relationships
more specifically (e.g. a cat is a species of animal from a family
of animals called felines) to yet richer and more general ways of
describing relationships between terms (e.g. a cat is a type of feline.
felines have fur. etc...). This last example is generally called a
formal ontology (see ontology below). In general
the richer the type of classification and classification the richer
the type of queries that can be asked of the system.
For example a simple list based controlled
vocabulary (assuming it contains the term 'cat') can provide a way
of answering the question 'show me all the pictures in your collection
of cats' and that is about as complex as it could be. However in the
last example it would (in principle) be possible to ask 'show me all
the pictures in your collection of things that have fur'.
SWED uses a number of thesauri, ontologies
and lists to catagorise the organisations and projects listed in the
directory. This means that it is possible to provide much richer ways
to browse and search the directory than is usual.
Metadata:
Metadata simply means data about data. For example metadata about
a book might be its author, title, number of pages - the kind of information
held in library catalogues. In terms of SWED metadata about organisations
and projects includes name, contact details, their area of operation,
topics that are relevant to them, the types of things that they do,
etc...
So metadata is vitally important to SWED (and
any other index, directory, etc. ) since it is by using the metadata
we can locate organization that are relevant to a particular query.
The richer the metadata (within limits) the more effective the search
can be.
N3:
See RDF below.
Namespace:
A namespace is a way to distinguish two, or more, elements (types
of data, e.g. organisation's name, date of formation, etc.) or terms
(e.g. within a controlled vocabularies) that happen to share the same
name (e.g. 'date' as used by one organisation - 1995 - and 'date'
as used by another - 19/07/1995). They also provide a means of grouping
elements or terms that are related in some way.
This means that data elements and vocabulary terms can be unambiguously
identified, in general the namespaces are maintained by the organisations
that own the associated web domain name.
In concrete terms a namespace is a URI (Uniform
Resource Indicator) - that can be used as a prefix to elements
or terms, that therefore uniquely identifies them as associated with
that namespace. In SWEDs case this might be 'http://www.swed.org.uk/2004/5/swed_oa'
- for swed's controlled vocabulary
(ontology) for operational areas.
So a term from the vocabulary could then have
the URI http://www.swed.org.uk/2004/5/swed_oa#bristol this longhand
is obviously very unwieldy, so there is a mechanism for using a short
hand version by replacing the namespace URI (http://www.swed.org.uk/2004/5/swed_oa)
with a shorter prefix (e.g. swed_oa:) In RDF documents
once this shorter prefix has been declared (normally at the top of
the document) e.g.
<swed_oa: xmlns:swed_oa="http://www.swed.org.uk/2004/5/swed_oa">
URIs can be written in the short hand. Making documents shorter and
more readable.
It may seem odd but there is no requirement
that there is actually anything at the URI. It provides a unique ID
regardless of whether there is any resolvable web page or other document
at the end of the URI. In general though it is regarded as good practice
to have a human and/or computer readable definition of the elements
or terms resolvable at the URI.
Ontology:
An ontology in the sense it is used within the Semantic Web and computer
science can be thought of as...
OWL:
OWL stands for Web Ontology Language. It is the computer readable
language in which web-based ontologies that form part of the Semantic
Web are written. This language was designed to attempt to meet the
needs of the Semantic Web. It has to provide the flexibility and richness
of expression (of relationships and constraints) required for the
very wide range of Web based applications with the need to make it
possible to do automatic logical reasoning.
Because of these sometimes conflicting requirements
there are three 'species' of OWL. OWL Full, OWL Lite and OWL DL.
Portal:
See community portal
RDF (Resource Description Framework)
RDF (Resource Description Framework) is the equivalent of the language
for writing Web pages, HTML (HyperText Markup Language), for the Semantic
Web. The semantic web uses RDF as the basic language for representing
metadata about any kind of resource on the
Web.
Essentially RDF is as its name suggests is a
framework for describing resources, and is actually very simple in concept.
It is a way of making statements about things and their relationships.

Figure 1
Figure 1 shows the form of an RDF statement
i.e. subject (Natural History Museum) - predicate (has Date of Formation)
- object (1881). It is composed of 'nodes' (ovals or rectangles) and
'arcs' (lines) that make up what is mathematically called a graph.
RDF is essentially collections of these simple graphs. Figure 2 shows
an example from the SWED project.

Figure 2 - Simplified example
of SWED RDF file in graphical form
The information in figure 1 alone is not necessary
clear, e.g. which 'natural history museum' are we talking about? what
do we mean by 'date of formation'? These questions might not arise
if the data were only to be used within a single organisation where
it was 'obvious' which museum was meant and what the 'date of formation'
means. However when we are talking about information on the Web the
context is very different. These things are not at all clear - go
to Google and you will find many different 'natural history museums'.
RDF overcomes this by requiring that all resources
(things you are talking about) and relationships have unique identifiers
that are written explicitly in the RDF. To illustrate the point. In
plain English instead of simply writing
the Natural History Museum - has Date of
Formation - 1881
we might write:
the organisation with name 'Natural History
Museum' which has the home page URL of "http://www.nhm.ac.uk/"
has the 'year of formation', of '1881' with all the terms used as
defined by the SWED project which has the homepage URL of http://www.swed.org.uk/2004/02/swed/.
This, albeit very long winded, way of making
the statement ensures that the meaning of the terms (note the change
from date to year, as 1881 is not a date) and their usage is (much
more) unambiguous. This is effectively what RDF does, it gives those
creating Semantic Web documents a way of writing down statements about
things and their relationships as unambiguously as possible in a Web-based
environment. In fact in SWED this statement would actually look like
this:
@prefix swed: <http://www.swed.org.uk/2004/02/swed#>
.
swed_id:prorg102 a swed:prorg;
swed:has_primary_prorg_name
"Natural History Museum";
swed:has_year_formed
"1881";
swed:has_url "http://www.nhm.ac.uk/";
.
The @prefix line simply says that 'swed' will
be used to mean 'http://www.swed.org.uk/2004/02/swed#' . The rest
are the actual statements. In SWED we give each organisation a unique
code or identifier for our own usage in this case 'prorg102'. The
other statements are making the actual statements.
In the example above we have used particular
way of writing down the RDF called N3. However RDF can be written
down in other forms (syntax), the most common syntaxes are RDF-XML,
N-Triple. However each of the syntaxes represent the same RDF. For
more information about RDF and the various syntaxes see http://www.w3.org/TR/2004/REC-rdf-primer-20040210/
If Web-based data is produced in RDF it makes
it possible to publish information in a way that can be processed
and linked together automatically by computer programs. For example
if a computer program found two pieces of RDF that talked about an
organisation with the name 'Natural History Museum' and URL 'http://www.nhm.ac.uk/'
it could [automatically] link the pieces of data together.
However a further step is necessary to make
this possible. In the above example we have explicitly used the terms
as defined by SWED. It is highly unlikely that everyone will use the
swed: namespace or organisation names and URLs. Indeed it would be
virtually impossible for ANY organisation to become the single authority
for terms across the whole Web.
The Semantic Web has been designed with this
in mind and provides mechanisms for dealing with these issues. RDFS
(RDF Schema see below ) and OWL (Web Ontology Language see below)
have been designed to allow the explicit description of relationships
between terms in different namespaces and other ways of joining data
that is created using RDF.
RDF/XML:
See RDF above.
Semantic Web:
"The Semantic Web is an extension of the current web in which
information is given well-defined meaning, better enabling computers
and people to work in cooperation." -- Tim Berners-Lee, James Hendler,
Ora Lassila, The
Semantic Web, Scientific American, May 2001
The goal of the Semantic Web initiative by the World
Wide Web Consortium (W3C) is to make it possible for web-based content
more automatically processible by computer programs, so that they can
locate, link and process information more effectively. It essentially
annotates web content with meaningful 'tags' that computers can interpret
and therefore 'know' what can be done with it and what is related to
it, without human intervention.
Information is produced in a standard format called RDF (Resource Description
Framework, see above) much like web pages are written
in HTML. RDF is based on the idea of statements about how things (real
and conceptual) are related and specifically uses unique identifiers
(such as Web URLs, and URNs) for
objects that statements are about, and the properties that are being
talked about - see RDF above.
Layered on top of RDF is are other richer ways of describing web content
and information about it. For example the Web
Ontology Language (OWL) gives ways to create so called, ontologies
- computer readable description about a domain (area) of knowledge.
An example is the relatively simple SWED ontology about organisations
that defines the properties that an organisations/project/part_of_organisations
(called 'prorgs' by SWED) have and how they are related. It is the combination
of information written in RDF and associated otology(ies) (also stored
on the web) that mean that computer programs can 'know' what can be
done with information and what is related to it, without human intervention.
SKOS:
SKOS
(Simple Knowledge Organisation System) is an RDF
based thesaurus format that has been developed as part of the SWAD-Europe
project. It allows thesauri to be created or converted to RDF and
so become usable as part of the wider semantic web. As well as providing
a standard semantic web based format, SKOS also explicitly deals with
multilingual thesauri and mechanisms to allow terms in one thesauri
to be related to those in another.
Thesaurus:
see controlled vocabularies
Taxonomy:
see controlled vocabularies
URI:
A URI (Uniform Resource Identifier) is "a compact string
of characters
for identifying an abstract or physical resource" (RFC
2396). There are different 'URI schemes' for different types of
resource e.g. web pages are identified by the http: URI scheme and books
can be identified by the isbn: (International Standard Book Numbering)
URI scheme. Anyone can make up their own informal URI scheme for their
own use, however there are many officially registered
URI schemes.
Basically URIs can be thought of as a general term for all URLs
and URNs. URIs are a mechanism for uniquely identifying
something.
For example:
- The URL for the SWED home page is: http://www.swed.org.uk/index.html
- The URN for a book using the ISBN URI scheme is urn:isbn:0-345-33973-8
(this is the uri for JRR Tolkien's "Return of the King")
URL:
URL (Uniform Resource Locator) is an informal term that related to
widely used URI schemes such as ftp (file transfer protocol), http
(hypertext transfer protocol) - see URI above.
Broadly speaking a URL points to the primary access mechanism of a
resource, e.g. a web page, as opposed to a URN
(see below) that simply provides a standardized means of uniquely
naming things.
URN:
URN (Uniform Resource Name) is a registered URI (see above)
scheme that have associated with URN namespaces (see above).
URN schemes/namespaces can be officially
registered. Registered namespaces include ISBN
(International Standard Book Numbering) and ISSN
(International Serial Standard Number).
e.g. the URN for JRR Tolkien's "Return of the King" using
the ISBN URI scheme is:
urn:isbn:0-345-33973-8
the use of URNs give an effective means of uniquely identifying real
or abstract things.
vCard
vCard is a widely used standard for storing information about individuals
or corporations. It includes all the information usually associated
with business card. See Internet Mail Consortium's v-card
page for more details.
XML:
Put simply XML (eXtensible Markup
Language), is a language for writing and sharing structured information.
A 'mark up language' is simply a way of annotating or 'marking up' data
by entering special textual 'tags' around the content so that it can
be identified by a computer program. A simple piece of XML might look
like:
<book>
<title>The Little Prince</title>
<author>Antoine De Saint-Exupery</author>
<published>1943</published>
</book>
The tags appear in angle brackets i.e. '<' and '>'. All tags
are opened (e.g. <book>) and closed </book> the '/' signifying
the close tag. Tags can contain (nest) other tags e.g. <book>
contains two more sets <title> and <author>.
XML has ways of defining specific mark up languages, i.e. using specific
sets of tags, rules about how the tags should be nested and the types
of data that tags should contain. To define the languages we use an
XML Schema language (e.g. W3C
XML Schema) or Document Type Definitions (DTDs). For example in
our book case above we might call the markup language MBML
(My Book Markup Language). The language would describe the structure,
i.e. order and nesting of the tags and contents of the tags (e.g. <author>
tag contents are text and <published> tags are numbers between
1900 and 2010).
Using the XML Schema it would now be possible for anyone else to
write MBML data that could be shared between computer programs that
have been designed to process the data.
XML was created by the World Wide Web Consortium
(W3C) to enable the sharing of structured data over the web.
|