SWAD-E Portal administration

Contents

Introduction

Installation

Administration

Logging

Harvester

Documentation index

Portal structure

Portal administration

Portal customization

Introduction

The SWAD-E portal is a prototype implementation of a semantic web portal which supports one demonstration portal service - the Semantic Web Environmental Directory. The design is intended to be quite flexible and easy to customize. This document offers some information on installation and administration issues.

Installation

The portal is implemented as a Java web application which requires a Java servlet container environment that supports the Java Servlet API 2.2. The prototype runs under tomcat 4.* and 5.* and would probably run in other containers such as Jetty. This document will assume you have a working tomcat 5.0.25 installation. If you have an older installation then you may need to update the version of xerces in the tomcat common/endorsed directory with a more up to date version compatible with Jena. Jena currently requires at least Xerces version 2.6.0 though 2.6.1 is preferable. A suitable copy can be found in the localLib directory in the full portal distribution.

Before installing the portal application you should create a security rolename which will control access to the portal administration pages. The simplest way to do this is to add the following lines to tomcat's conf/tomcat-users.xml file:

    <role rolename="portalAdmin"/>
    <user username="user" password="password" roles="portalAdmin"/>  
    
Tomcat has to be restarted to see any changes to this file.

The portal application caches substantial indexes and parts of the RDF in memory for performance reasons (the size of these caches can be configured in the code). This means it may be important to run the servlet container with a large heapsize. For example, in the development system, which also runs other webapps, we run tomcat with JDK parameter -Xmx200M.

Once you have a working tomcat (or similar) you then need to create or obtain the war file defining the portal application. For the demonstration portal a prebuilt war file is available for download. Alternatively the sources include an ant build file which can build a new war from the sources (target war-dist). You then load the war into tomcat in the usual way (either drop it into the webapps directory or upload it via the tomcat manager interface). If you are developing a new configuration using an IDE you may want to place your development area directly into the tomcat context path rather than via a war file but that is outside the scope of this documentation.

The portal can run from memory just loading data from files or can run from a database. The demonstration configuration uses a mysql database but any database supported by Jena can be used. However, the prebuilt distributed war file is configured to just use files and memory to simplify initial installation. So with the distributed war file there is no more installation required and the portal should now be working.

If you want to change the portal to run from a database you first need a working database installation and then need to create an empty database which will hold the data. In mysql this could be done with the sql commands:

    create database swed;
    grant all on swed.* TO user@localhost IDENTIFIED BY 'password'
    grant all on swed.* TO user@'%' IDENTIFIED BY 'password'
    
The database name, user and password you choose will have to be edited into the portal configuration file (portal/WEB-INF/config/sources.n3).

It is useful if your database configuration can keep idle database connections open. If the database closes the open connection while the portal thinks it is using it then the portal will attempt to reconnect and continue operation. However, this recovery code is still under development and it is preferable if the database doesn't timeout.For mysql the easiest way to ensure this is to add the following line to the [mysqld] section of the my.ini configuration file:

    set-variable = wait_timeout=7884000
    
You will also need to install a suitable JDBC driver for your database into the lib directory of the web application. We do not include these drivers in the distribution due to licensing restrictions. See the Jena db documentation for more information on appopriate database versions and drivers to use. The demonstration system was implemented using MySql 4.0.18 and the Connector/J database driver.

Before the portal can use the database it needs to have data put into it. In the portal configuration file you can specify a number of data file sources. The ant target dbinit can be used to load these files into the database. You may need to edit the ant build.xml script to tell it the correct database name, user and password for your database. Once dbinit has run then your database version of the portal is ready for use.

top

Administration

Aside from the harvester controls (see below) there is fairly little administration that is required for the portal. If the portal crashes for some reason (bug, database disconnect, out of memory) then tomcat should automatically restart the application. If that doesn't seem to be working then use the tomcat manager web page and "reload" the portal application. Note that the first time the portal is used after a reload or restart the data files will need to be loaded in which takes on the order of 10 seconds on a modern PC.

If you change the portal configuration (sources.n3) or the associated data it is possible to reload the data without restarting the application. Go to the portal administration page (linked off the home page, you will need the username and password you defined for the portalAdmin role), the link reload data will cause the portal to reload all the DataSource definitions from sources.n3 and restart.

One maintenance action that might be required is to rebuild the Lucene free text index. When the description of portal objects is changed (through the harvester uploading changed RDF source files) the new object values are added to the text index but older index keys are not removed. Rebuilding the text index will fix this, just select Rebuild text index on the portal administration page. For the SWED data this takes around 1 minute on a modern PC.

top

Logging

Normal server access logs should be set up in your tomcat (or other) configuration in the normal way.

In addition the portal application itself logs actions such as the operation of the harvester scans and loading of data files. This logging uses the Jakara log4j package. The configuration file which defines how log4j operates is in the webapp at path WEB-INF/config/log4j.properties. The default configuration logs events to standard output and to a portal2.log file in WEB-INF/logs. If you modify the log4j configuration then note that any file appender with a name starting with WEBINF is treated specially - you just need to specify the file name and and appropriate path to the WEB-INF/logs directory of the containing web application will be added by the portal startup code.

Top

Harvester

The harvester is that part of the portal application which periodically scans a list of known RDF sources and uploads any changed data. To control the operation of the harvester go to the administration page (as above) and select Harvester controls.

Starting/stopping the harvester scans
When the portal application first starts up the harvester is not running. The current state of the harvester and buttons to start or stop it are shown at the top of the harvester control page.
Status of a datasource
The second region of the harvester control page allows the administrator to view the status of any individual data source or to manually add a new source.
Put the URL of the source in the first text box and use the view button to query the harvester database for information on that source. That will indicate whether the site is known to the harvester and, if so, the "trusted" and "blocked" check boxes will be updated. You can trust or block a site by changing these boxes and selecting update. It is possible to poll a site interactively (rather than wait for the next harvester scan) by selecting poll now.
Newly registered sites
New sources which should be included within the harvester scans are normally registered using a web registration form which in turn uses a NewSite servlet to register the site with the portal's harvester. When a site is registered this way it is marked as "new" and will appear on the bottom panel of the harvester control page.
To mark that you have seen the site and it should no longer be treated as "new" then tick the update check box and click update new entries. At the same time you can opt to block or trust the site though changes to these properties will only be made if you tick update as well.
Top