SWED Glossary

Community Portal - Controlled Vocabularies - Metadata - N3 - Namespace - Ontology - OWL - Portal - RDF - RDF/XML - Semantic Web - SKOS - Thesaurus - Taxonomy - URI -URL - URN - vCard -XML


Community Portal:

Web-based information portals provide a point of access onto an integrated and structured body of information about some domain. They range from very broad domains (e.g. all web pages - [http://www.yahoo.com/] and [http://www.dmoz.com/]), to topic specific domains (e.g. mathematics [http://www.math-net.de/], fish species [http://www.fishbase.org/]).

Community portals are information portals which are also designed to support and facilitate a community of interest. They typically allow members of the community to contribute news and information to the pool, either by submitting information to the portal (via some editing or reviewing process) or by posting the information on some associated web bulletin board or other collaboration tool.

Controlled Vocabularies:

The majority of cataloging or indexing systems (e.g. libraries and product catalogues, etc.) use controlled vocabularies. That is they index all items (e.g. library books) using a constrained set of terms (e.g. in the Dewy decimal system for classification of books a book about zoology will be classified under the term "Zoological Sciences"). The use of these controlled vocabularies means that it is very much easier to be consistent in how things are classified and so makes it easier to find relevant items.

Most often the controlled vocabularies are not simply lists but have a structure (e.g. in Dewy the category"zoological sciences' is sub-divided into Zoology, Invertebrates, etc.). There are various degrees of structure used in classification schemes. Moving from a simple list to a thesaurus (using broader and narrower terms to give the structure) to a formal taxonomy which defines the relationships more specifically (e.g. a cat is a species of animal from a family of animals called felines) to yet richer and more general ways of describing relationships between terms (e.g. a cat is a type of feline. felines have fur. etc...). This last example is generally called a formal ontology (see ontology below). In general the richer the type of classification and classification the richer the type of queries that can be asked of the system.

For example a simple list based controlled vocabulary (assuming it contains the term 'cat') can provide a way of answering the question 'show me all the pictures in your collection of cats' and that is about as complex as it could be. However in the last example it would (in principle) be possible to ask 'show me all the pictures in your collection of things that have fur'.

SWED uses a number of thesauri, ontologies and lists to catagorise the organisations and projects listed in the directory. This means that it is possible to provide much richer ways to browse and search the directory than is usual.

Metadata:

Metadata simply means data about data. For example metadata about a book might be its author, title, number of pages - the kind of information held in library catalogues. In terms of SWED metadata about organisations and projects includes name, contact details, their area of operation, topics that are relevant to them, the types of things that they do, etc...

So metadata is vitally important to SWED (and any other index, directory, etc. ) since it is by using the metadata we can locate organization that are relevant to a particular query. The richer the metadata (within limits) the more effective the search can be.

N3:

See RDF below.

Namespace:

A namespace is a way to distinguish two, or more, elements (types of data, e.g. organisation's name, date of formation, etc.) or terms (e.g. within a controlled vocabularies) that happen to share the same name (e.g. 'date' as used by one organisation - 1995 - and 'date' as used by another - 19/07/1995). They also provide a means of grouping elements or terms that are related in some way.

This means that data elements and vocabulary terms can be unambiguously identified, in general the namespaces are maintained by the organisations that own the associated web domain name.

In concrete terms a namespace is a URI (Uniform Resource Indicator) - that can be used as a prefix to elements or terms, that therefore uniquely identifies them as associated with that namespace. In SWEDs case this might be 'http://www.swed.org.uk/2004/5/swed_oa' - for swed's controlled vocabulary (ontology) for operational areas.

So a term from the vocabulary could then have the URI http://www.swed.org.uk/2004/5/swed_oa#bristol this longhand is obviously very unwieldy, so there is a mechanism for using a short hand version by replacing the namespace URI (http://www.swed.org.uk/2004/5/swed_oa) with a shorter prefix (e.g. swed_oa:) In RDF documents once this shorter prefix has been declared (normally at the top of the document) e.g.

<swed_oa: xmlns:swed_oa="http://www.swed.org.uk/2004/5/swed_oa">

URIs can be written in the short hand. Making documents shorter and more readable.

It may seem odd but there is no requirement that there is actually anything at the URI. It provides a unique ID regardless of whether there is any resolvable web page or other document at the end of the URI. In general though it is regarded as good practice to have a human and/or computer readable definition of the elements or terms resolvable at the URI.

Ontology:

An ontology in the sense it is used within the Semantic Web and computer science can be thought of as...

OWL:

OWL stands for Web Ontology Language. It is the computer readable language in which web-based ontologies that form part of the Semantic Web are written. This language was designed to attempt to meet the needs of the Semantic Web. It has to provide the flexibility and richness of expression (of relationships and constraints) required for the very wide range of Web based applications with the need to make it possible to do automatic logical reasoning.

Because of these sometimes conflicting requirements there are three 'species' of OWL. OWL Full, OWL Lite and OWL DL.

Portal:

See community portal

RDF (Resource Description Framework)

RDF (Resource Description Framework) is the equivalent of the language for writing Web pages, HTML (HyperText Markup Language), for the Semantic Web. The semantic web uses RDF as the basic language for representing metadata about any kind of resource on the Web.

Essentially RDF is as its name suggests is a framework for describing resources, and is actually very simple in concept. It is a way of making statements about things and their relationships.

Figure 1

Figure 1 shows the form of an RDF statement i.e. subject (Natural History Museum) - predicate (has Date of Formation) - object (1881). It is composed of 'nodes' (ovals or rectangles) and 'arcs' (lines) that make up what is mathematically called a graph. RDF is essentially collections of these simple graphs. Figure 2 shows an example from the SWED project.

Figure 2 - Simplified example of SWED RDF file in graphical form

The information in figure 1 alone is not necessary clear, e.g. which 'natural history museum' are we talking about? what do we mean by 'date of formation'? These questions might not arise if the data were only to be used within a single organisation where it was 'obvious' which museum was meant and what the 'date of formation' means. However when we are talking about information on the Web the context is very different. These things are not at all clear - go to Google and you will find many different 'natural history museums'.

RDF overcomes this by requiring that all resources (things you are talking about) and relationships have unique identifiers that are written explicitly in the RDF. To illustrate the point. In plain English instead of simply writing

the Natural History Museum - has Date of Formation - 1881

we might write:

the organisation with name 'Natural History Museum' which has the home page URL of "http://www.nhm.ac.uk/" has the 'year of formation', of '1881' with all the terms used as defined by the SWED project which has the homepage URL of http://www.swed.org.uk/2004/02/swed/.

This, albeit very long winded, way of making the statement ensures that the meaning of the terms (note the change from date to year, as 1881 is not a date) and their usage is (much more) unambiguous. This is effectively what RDF does, it gives those creating Semantic Web documents a way of writing down statements about things and their relationships as unambiguously as possible in a Web-based environment. In fact in SWED this statement would actually look like this:

@prefix swed: <http://www.swed.org.uk/2004/02/swed#> .

swed_id:prorg102 a swed:prorg;
        swed:has_primary_prorg_name "Natural History Museum";
        swed:has_year_formed "1881";
        swed:has_url "http://www.nhm.ac.uk/";
.

The @prefix line simply says that 'swed' will be used to mean 'http://www.swed.org.uk/2004/02/swed#' . The rest are the actual statements. In SWED we give each organisation a unique code or identifier for our own usage in this case 'prorg102'. The other statements are making the actual statements.

In the example above we have used particular way of writing down the RDF called N3. However RDF can be written down in other forms (syntax), the most common syntaxes are RDF-XML, N-Triple. However each of the syntaxes represent the same RDF. For more information about RDF and the various syntaxes see http://www.w3.org/TR/2004/REC-rdf-primer-20040210/

If Web-based data is produced in RDF it makes it possible to publish information in a way that can be processed and linked together automatically by computer programs. For example if a computer program found two pieces of RDF that talked about an organisation with the name 'Natural History Museum' and URL 'http://www.nhm.ac.uk/' it could [automatically] link the pieces of data together.

However a further step is necessary to make this possible. In the above example we have explicitly used the terms as defined by SWED. It is highly unlikely that everyone will use the swed: namespace or organisation names and URLs. Indeed it would be virtually impossible for ANY organisation to become the single authority for terms across the whole Web.

The Semantic Web has been designed with this in mind and provides mechanisms for dealing with these issues. RDFS (RDF Schema see below ) and OWL (Web Ontology Language see below) have been designed to allow the explicit description of relationships between terms in different namespaces and other ways of joining data that is created using RDF.

RDF/XML:

See RDF above.

Semantic Web:

"The Semantic Web is an extension of the current web in which information is given well-defined meaning, better enabling computers and people to work in cooperation." -- Tim Berners-Lee, James Hendler, Ora Lassila, The Semantic Web, Scientific American, May 2001

The goal of the Semantic Web initiative by the World Wide Web Consortium (W3C) is to make it possible for web-based content more automatically processible by computer programs, so that they can locate, link and process information more effectively. It essentially annotates web content with meaningful 'tags' that computers can interpret and therefore 'know' what can be done with it and what is related to it, without human intervention.

Information is produced in a standard format called RDF (Resource Description Framework, see above) much like web pages are written in HTML. RDF is based on the idea of statements about how things (real and conceptual) are related and specifically uses unique identifiers (such as Web URLs, and URNs) for objects that statements are about, and the properties that are being talked about - see RDF above.

Layered on top of RDF is are other richer ways of describing web content and information about it. For example the Web Ontology Language (OWL) gives ways to create so called, ontologies - computer readable description about a domain (area) of knowledge. An example is the relatively simple SWED ontology about organisations that defines the properties that an organisations/project/part_of_organisations (called 'prorgs' by SWED) have and how they are related. It is the combination of information written in RDF and associated otology(ies) (also stored on the web) that mean that computer programs can 'know' what can be done with information and what is related to it, without human intervention.

SKOS:

SKOS (Simple Knowledge Organisation System) is an RDF based thesaurus format that has been developed as part of the SWAD-Europe project. It allows thesauri to be created or converted to RDF and so become usable as part of the wider semantic web. As well as providing a standard semantic web based format, SKOS also explicitly deals with multilingual thesauri and mechanisms to allow terms in one thesauri to be related to those in another.

Thesaurus:

see controlled vocabularies

Taxonomy:

see controlled vocabularies

URI:

A URI (Uniform Resource Identifier) is "a compact string of characters
for identifying an abstract or physical resource
" (RFC 2396). There are different 'URI schemes' for different types of resource e.g. web pages are identified by the http: URI scheme and books can be identified by the isbn: (International Standard Book Numbering) URI scheme. Anyone can make up their own informal URI scheme for their own use, however there are many officially registered URI schemes.

Basically URIs can be thought of as a general term for all URLs and URNs. URIs are a mechanism for uniquely identifying something.

For example:

  • The URL for the SWED home page is: http://www.swed.org.uk/index.html
  • The URN for a book using the ISBN URI scheme is urn:isbn:0-345-33973-8
    (this is the uri for JRR Tolkien's "Return of the King")

URL:

URL (Uniform Resource Locator) is an informal term that related to widely used URI schemes such as ftp (file transfer protocol), http (hypertext transfer protocol) - see URI above. Broadly speaking a URL points to the primary access mechanism of a resource, e.g. a web page, as opposed to a URN (see below) that simply provides a standardized means of uniquely naming things.

URN:

URN (Uniform Resource Name) is a registered URI (see above) scheme that have associated with URN namespaces (see above). URN schemes/namespaces can be officially registered. Registered namespaces include ISBN (International Standard Book Numbering) and ISSN (International Serial Standard Number).

e.g. the URN for JRR Tolkien's "Return of the King" using the ISBN URI scheme is:

urn:isbn:0-345-33973-8

the use of URNs give an effective means of uniquely identifying real or abstract things.

vCard

vCard is a widely used standard for storing information about individuals or corporations. It includes all the information usually associated with business card. See Internet Mail Consortium's v-card page for more details.

XML:

Put simply XML (eXtensible Markup Language), is a language for writing and sharing structured information. A 'mark up language' is simply a way of annotating or 'marking up' data by entering special textual 'tags' around the content so that it can be identified by a computer program. A simple piece of XML might look like:

<book>
    <title>The Little Prince</title>
    <author>Antoine De Saint-Exupery</author>
    <published>1943</published>
</book>

The tags appear in angle brackets i.e. '<' and '>'. All tags are opened (e.g. <book>) and closed </book> the '/' signifying the close tag. Tags can contain (nest) other tags e.g. <book> contains two more sets <title> and <author>.

XML has ways of defining specific mark up languages, i.e. using specific sets of tags, rules about how the tags should be nested and the types of data that tags should contain. To define the languages we use an XML Schema language (e.g. W3C XML Schema) or Document Type Definitions (DTDs). For example in our book case above we might call the markup language MBML (My Book Markup Language). The language would describe the structure, i.e. order and nesting of the tags and contents of the tags (e.g. <author> tag contents are text and <published> tags are numbers between 1900 and 2010).

Using the XML Schema it would now be possible for anyone else to write MBML data that could be shared between computer programs that have been designed to process the data.

XML was created by the World Wide Web Consortium (W3C) to enable the sharing of structured data over the web.