SIRS Final report; Achievements

SIRS Final report; Achievements

3 Achievements

SIRS-3 was set up to develop support in three different areas: security, fault tolerance, and management tools.

For security, we wanted to design and implement an architecture that would allow producers of illicit content to be traced. In addition, several authorization mechanisms needed to be provided. The following three subgoals were formulated:

The maintainer of a software package should be traceable.
Only the maintainer can modify a package.
Only the maintainer can issue requests for creating new replicas.

These choices were mainly based on the observation that SIRS should provide guarantees with respect to the secure distribution of software packages. An alternative that we only briefly considered, was to provide the means to check for the distribution of illicit content. In general, content verification by technical means only is virtually impossible.

In the current solution, we have set up the means to trace exactly where an uploaded package came from. In essence, when uploading a package to SIRS, the client is requested to digitally sign that package with a key that has been provided by a trusted Access Granting Organization (AGO). The AGO does not verify the content of a package. However, by signing an uploaded package, a downloading client can always check where that package came from. Downloading unsigned packages is at the client's risk.

It is not the goal of this report to describe the security infrastructure for SIRS in any detail. These can be found in [2] and has been added as an appendix to this report. Practical issues concerning security in SIRS (such as key management, signing, and verification) can be found in the Globe Operations Guide (GOG), which is provided as part of the GDN distribution.

3.2 Fault Tolerance

The software resulting from SIRS-1 and SIRS-2 had only minimal support for tolerating faults. We wanted to add the following functionality:
A simple scheme to allow object servers to recover from crashes.
Recovery from a communication error.

An object server provides fault tolerance by periodically storing the complete in-memory state related to a package object to disk. If the server crashes during operation, the most recently saved state is restored, effectively recovering packages that were being accessed at the time of the crash. No checkpoint is made if the state has not been altered since the last checkpoint.

We decided to implement periodic checkpointing for performance reasons. The alternative is to checkpoint the state at each update operation, but this was felt too expensive as it requires a synchronous disk operation. Further research is needed to see whether and how we can improve this situation. In particular, as part of our long-term research, we intend to explore the promising combination of object-specific fault tolerance and replication for performance. Some initial work on this matter is described in [3].

When recovering from a server crash, the object server normally fetches the new state from a master replica or other source that is guaranteed to have a fresh copy. This synchronization is necessary to ensure consistency. Before doing so, the server first checks whether any updates have occurred during the crash period to avoid needless state transfer across the network.

Recovery from a communication error deals with avoiding that we need to restart the uploading or downloading of a package from the beginning. We have implemented a simple client-side scheme that handles interrupted downloads. Downloads occurs in blocks of data that are first stored at the client's side to be later assembled into a package. In this way, when a connection between a client and an object server breaks, the downloading client can later continue where it had left off. This mechanism is transparent to the user. There is no support for recovery from a communication error during uploads. The problem was considered less important as a client already uploads individual files instead of complete packages.

Fault tolerance is described in detail in an upcoming dissertation by Arno Bakker.

3.3 Management tools

One of the problems with the SIRS-2 variant was the considerable effort it took to install a site before it could become fully operational. We planned to develop the following tools and mechanisms to alleviate these problems:
Tools to start and shutdown an entire site in a user-friendly way.
Tools to monitor the resource usage within a site, and that allow a user to inspect which objects it is currently running.
User-friendly installation and configuration scripts.
A tool to remove a replica.
Global system management tools.
Adaptation ofglobify, a tool that generates scripts by which entire Web sites can be embedded inside a Globe object.

The management tools have only been partly realized. An important extension to SIRS is a special directory service, called GIDS, that stores all the necessary configuration information for a site. GIDS stands for the Globe Infrastructure Directory Service. The service can be queried regarding configuration settings and is built to be enhanced for other purposes such as finding a suitable object server to host a replica. A description of GIDS and its integration into GDN and Globe can be found in [4], which has been added as an appendix to this report.

A special utility, calledgrunt, has been developed to assist in bringing up, and shutting down a site.Gruntkeeps track of dependencies between different servers and ensures that these servers are started in the correct order, thus making it easier for a user to start a site.

Finally, ourglobifytool has been enhanced so that it can now be successfully applied to various organizations of Web sites.Globifyassists in converting a collection of files into either a GDN package or a GlobeDoc. The latter is a persistent object designed to store a collection of related Web files into a Globe object, known as a Globe Web document.

We have not implemented monitor tools, nor special tools for removing replicas. System management is mostly handled by the combination of GIDS andgrunt, and seems to be sufficient. The Globe Operations Guide (gog) has been updated in such a way that it should be easier to bring up a Globe site.

It is yet unclear whether the lack of monitor tools or those for removing replicas is a serious omission to GDN. The main reason for not implementing these tools is lack of time, caused by having spent more effort in disseminating the results of the project, as well as the integration of GIDS and GDN. I return to dissemination of results below. Integration of GIDS and GDN was more difficult than we expected. In hindsight, we were simply too optimistic. When formulating the SIRS-3 proposal, we had a prototype version of GIDS running that we felt was almost completed. However, this version needed to be adapted to handle the site configuration information stored in thesite.cfgfile, but should also allow a fall-back to that file if so required (e.g., for overruling the configuration information stored in GIDS).

At this point, we feel more practical and daily experience with GDN is needed before deciding on what the appropriate management tools should be. However, we do intend to continue putting effort in some simple monitoring tools, and include these tools in future releases. I return to these issues below.

3.4 Current status

At present, GDN has seen the release of version 1.0, a month later than the originally planned date of 1 January 2002.

The software for release 1.0 has been tested in various ways. Tests have been conducted with the installment and replication of more than 20 Gbytes of the Sourceforge database. These tests have lead to the identification and repair of numerous faults. In addition, we have asked several non-Globe users to bring up a site giving them only the release 1.0 software and the operations guide. Practice now shows that installment can be done in less than a few hours by an experienced UNIX user, giving us confidence that management of a site has indeed been improved. We do not expect that a site can be brought up easily by an average internet user.

It should also be noted that the Globe Web site as well as various home pages of its developers, have been hosted by SIRS/GDN software for many months now. The software distribution is, by default, downloaded from the GDN system thereby redirecting a client to the nearest replica. This other form of testing has revealed several smaller errors and has lead to a number of functional improvements.

3 Achievements

3.1 Security

3.2 Fault Tolerance

3.3 Management tools

3.4 Current status