SIRS-3 Project Proposal

December 19, 2000

1 Current Status

The result of the previous SIRS projects have resulted in what is called the Globe Distribution Network (GDN). GDN is a collection of servers and client-side components that allow a user to distribute and replicate packages of files in a way that is specific to a package. Each package is constructed as a Globe distributed shared object. These objects are physically distributed across multiple machines.

The server-side software consists of a Globe object server, which is capable of hosting a replica of an object. The object server provides contact points that allow other servers or clients to access an object's replica. A server provides the facilities to store a replica to a local disk. Whether or not a replica wants to make use of such a facility is up to the object to decide. Persistence of replicas and contact points have been implemented in such a way that a graceful shutdown and subsequent restart of the server will bring a replica back into its state before shutdown.

Contact points can be looked up by means of the Globe location service. This location service is also implemented and forms an integral part of GDN. To lookup an object, a client supplies a globally unique object handle to the location service. Object handles are location independent and not designed to be processed by human beings.

Human-friendly naming of objects in the form character strings is supported by the Globe naming service. A DNS-based implementation of this service has also been implemented and is part of GDN. The naming service accepts a UNIX-like file name as input, and returns an object handle that can be used to lookup a contact address of the identified object.

GDN can be accessed using the Web. For this purpose, two additional components are needed. First, because naming in GDN does not conform to URLs (which are location dependent), Globe names are embedded in URLs so that existing Web browsers can be used. Such an embedding requires a translation between Globe names and embedded URLS. This translation is done by a separate process called gtrans.

The translator communicates with another process called the Globe gateway. As its name suggest, this process operates as a gateway between the Web and GDN. It is capable of holding a (simple) representative of a distributed object and communicates with Globe object servers that host a replica. The gateway extracts files from an object, and passes these to the translator using the HTTP protocol.

GDN comes with a set of simple tools to manipulate objects, access the location service and naming service, and to access object servers. This toolset has deliberately been kept primitive in order to focus only on the things that really matter, namely the distribution of objects that encapsulate files.

GDN currently provides no support for security or fault tolerance, and lacks convenient management tools. In this proposal, we outline solutions to these problems and propose to develop these solutions as projects SIRS-3.

2 Security

We currently have a paper design for a secure version of GDN. The basic design principle is that GDN does not provide any guarantees concerning what is distributed, but provides mechanisms to guarantee that distribution of content is fully traceable.

2.1 Tracing the Owner

Consider a GDN that is spread across multiple organizations. Suppose Alice uploads a package of files to the GDN containing illicit content (illegal mp3, pornography, etc.). GDN provides no tools to moderate the content: Alice can upload anything she wants. However, GDN does require that Alice digitally signs the package with a signature that can be verified by an Access-Granting Organization (AGO) such as VeriSign.

The AGO must be well known and trusted by all GDN users. Signing a package with an AGO-certified signature says nothing about the content, but only that the person signing has been registered, and as such is traceable. This traceability is needed to ban a malicious user and his/her content from the entire GDN. Note that this is an important distinction from signing content using, for example, PGP keys. In that case, the signature is used to verify the content, not the producer of that content. For this reason, the PGP's web of trust will work. It cannot be used to trace users when they need to be banned, for in that case every user and organization should trust the keys used for signing.

Placing this signature can be done automatically by the GDN tools for unloading content. Signing takes place by means of a private key. The corresponding public key that is used for verification of the signature is placed in a certificate (i.e., a message signed by the AGO), and added to the package. An uploaded package will now consist of signed content, and a certificate containing the public key for verifying the signature.

When Bob downloads the package from Alice, he does two things. First, he verifies the signature against the content. If the two do not match, Bob knows the content has been modified during its distribution by GDN. In that case, he should reject it. Second, if the signature matches the content, he knows who produced that content. If he finds Alice's package to contain stuff that should not have been distributed, he can complain about this because Alice is traceable. Bob will have to check whether Alice's key has been revoked, that is, whether it is still valid.

Note that Alice may have used a pseudonym instead of her real identity. In that case, Bob can complain that someone called Alice by the AGO has placed illicit content. Only the AGO knows who Alice is in real life. In this way, we can also protect Alice's privacy.

The reaction to finding illicit content is to remove replicas and ban the identified person from the GDN. This approach requires setting up and maintaining a black list. Accusations about illicit content are accepted only from users registered as uploaders. This approach is needed to protect against false accusations: a malicious accuser may be banned from the GDN.

2.2 Protecting Object Servers

Another problem we need to solve is that of malicious object servers. Such a server may modify content, but this can easily be verified using the signing mechanism just described. Another potential problem is an object server sends fake updates to another server. Again, we protect against such attacks by demanding that updates are also signed, and in such a way that the modified package is still traceable to its owner.

One obvious attack that requires separate measures is the following. A package may be replicated across the GDN. To locate the nearest package, Bob issues a lookup request at the location service. However, a malicious object server may have inserted an address for an object 0, while in fact it provides nothing or a replica for something completely different (but traceable). In other words, Bob should be assured that the address returned actually is that of a replica of 0.

To partially solve this problem, we introduce a delegation mechanism, by which the owner of a package grants an object server the right to insert a contact address. This permission may be granted to an object server that is trusted by the package owner. This solution prevents an object server inserting an address for something it doesn't have. It does not solve the problem that a server inserts an address for a fake replica, although in that case, Bob will be able to detect that he is dealing with a fake by checking signatures.

3 Fault Tolerance

There are currently two main fault-tolerance problems to solve. First, we need to recover from server crashes. Second, we need to recover from communication failures when unloading or downloading huge amounts of data. For this project, we assume that partial failures lead to crash failures.

3.1 Crash Recovery

At present, servers support only a graceful shutdown and restart. Gracefully shutting down implies that a server saves critical state to persistent storage to reload that state when restarted. For crash failures, we need to provide mechanisms to have critical state be recoverable.

Because GDN replicates packages for performance only, the only effect a server crash has is a potential performance decrease. We initially assume that a package is sufficiently replicated to warrant that there is always at least one copy available. As a consequence, a server can recover from a crash by requesting its peers to transfer the necessary state to create a new replica for each object it hosted before the crash. This approach requires that a server keeps track of the objects it hosts, and which object server it can request a new replica from.

3.2 Communication Recovery

To solve the problem of recovering from a broken connection during the transfer of huge amount of data, we adopt the following approach. A transfer is internally split into a number of chunks. This splitting is transparent to clients. Chunks of data are transferred one after the other, while maintaining a bookkeeping on the last successful transmission. When a connection breaks and is recovered, transfer continues with the next chunk following the most recent successful one. This approach is similar to downloading large files using Netscape's FTP mechanism.

The above solution is adopted only for unloading or downloading packages. When communication between two object servers fails, the two will simply restart the transfer, although we envisage that object-specific solutions may be applied.

4 Management Tools

At present, we have a number of tools for managing the GDN and its content. This set of tools needs to be improved.

4.1 Site Management

To participate in GDN, a local administrator needs to manage several processes:

One or more Globe object servers
A leaf node of the Globe location service
A leaf node of the Globe naming service
A Globe translator
A Globe gateway

Each of these processes is currently separately managed using several support tools. Practice shows that this management hinders the daily use of Globe. Therefore, a better approach is needed to manage a single site. For example, preferably there would be a single tool that allows a local administrator to automatically start all necessary processes in the correct order, and to subsequently monitor those processes. In addition, it should be easy to configure a site, or change certain configuration parameters without having to shutdown all running processes.

4.2 Global system management

GDN also requires global system management. The Globe naming service is distributed as a tree across the various GDN sites. If a site is added or removed, this may affect the configuration or availability of the naming service. Likewise, the Globe location service is also distributed across the various GDN nodes. At present, globally configuring and managing these services is cumbersome.

Another global system management issue is that GDN servers need to know about each other. This awareness is necessary to enable manual and automatic replication. At present, we simply keep track of servers at a centralized site. Location information on servers is not recorded. What is needed is support for information on the collection of servers, where servers are located, what their capabilities are, and so on.

4.3 Content management

There are currently several tools available for manipulating the content of distributed shared objects. One tool is used to add documents or files to a package. It has been deliberately kept simple, but has the drawback that manually filling objects with content is quite laborious. We have also developed a tool that can take an entire Web subtree as input an store the files directly into a Globe object.

5 Project Proposal

For SIRS-3, we propose to work on security, fault tolerance, and management tools.

5.1 Security

For the next release of GDN, we will assume that there is only a single maintainer per package. We propose to implement the following functionality:

Traceability of the maintainer of a package. This will require that we build in the mechanisms to sign packages and updates, and to let (object servers and) clients identify a maintainer by a signature. It also requires implementing interfaces to an AGO such as Verisign.
Mechanisms that can enforce authorization of updates so that only the maintainer can modify a package. A possible solution is to let uploads take place only through a single machine for which the maintainer has an account, and use the access control mechanisms of that server's operating system (UNIX). Note that each maintainer may have his/her own associated "upload machine." We assume for SIRS-3 that object servers can be trusted. This model comes close to the one adopted in Globus.
Mechanisms that can enforce that only the maintainer can issue requests for creating new replicas. The problem we need to solve is that malicious users can request replica creation at any server, while actually only the maintainer is held responsible for the package. (In the future, object servers will initiate replica creation.)

5.2 Fault Tolerance

For fault tolerance, we propose to work on two issues:

A simple scheme to allow object servers to recover from crashes. In a simple scheme, we introduce a "main" machine which is responsible for keeping a fresh copy of a package on persistent storage. In many cases, this machine will also run as master in a master/slave replicaton scenario. Assuming that a server keeps a recoverable account of the objects it was hosting, recovery can be done by starting anew, and requesting a main machine of an object for a fresh copy.
In a slightly more sophisticated scheme, a server can recover (part of) an object's state locally, and contacts the main machine only for consistency so that it can also get the latest updates. We intend to incorporate this functionality as well.
Recovery from a communication error. We propose to handle only the situation in which sender and receiver are running, but due to a broken connection, transfer failed. In this case, the two can set up a new connection and continue where they left off. If either the sender or receiver crashes, recovery becomes more difficult to handle and may incur object-specific solutions. Such solutions are not part of SIRS-3.

5.3 Management tools

The bulk of our work will concentrate on tools for managing a site. In particular, we propose to develop the following tools:

Tools to start and shutdown an entire site in a user-friendly way. We need to distinguish several situations.
- In the simplest case, a GDN user will only be running our tika tool, which is used to modify the content of an object server. Scripts will be provided to easily configure a site to run tika.
- A GDN user is running an object server and perhaps tika to fill it with content. A slightly more complicated situation is when the GDN user also runs a translator and gateway. For these situations, we will provide scripts that allow a user to gracefully shutdown a site and restart it later
- A GDN user is also running the Globe name service or Globe location service. In this case, shutting down is not really an option because we assume that these services are always available. Nevertheless, when a shutdown is needed, special measures need to be taken to ensure that part of the service's administration is either taken over by other nodes, or temporarily disabled. This is particularly the case for the location service. We will develop tools to support this graceful shutdown and subsequent restart.
Tools to monitor the resource usage within a site, and that allows a user to inspect which objects it is currently running. In essence, we need a simple tool by which a GDN user can see what its Globe processes are doing. Monitoring may include resource consumption (CPU, disk, memory), uptime, object identification, and so on. Whether or not a simple graphical interface is provided is left open for now. We let servers generate standard log reports.
User-friendly installation and configuration scripts. These scripts are aimed to new users so that they can easily start a new GDN site and fill that site with content. Installation may require setting environment variables, compilation, registration of servers, liveness checks, and so on. When installation succeeded, starting and shutting down the site should be done with the tools mentioned before.
A tool to remove a replica. In our model, the owner of an object server has the final word on which objects are hosted. Removing a replica by a server requires adaptations to our objects. The actual removal can be integrated with the monitoring tool mentioned above.
Adaptation of globify. There are currently several aspects that need to be improved with respect to globify to enhance its usability. We propose to enhance the capabilities of globify.
Global system management tools. We need facilities to globally manage a network of GDN sites. Part of this management involves global registration of servers. We will provide a tool to support this registration.

6 Plan Of Work

Initially, we will concentrate on the management tools, and start working on security and communication recovery in the second half of 2001. The reason for this approach has to do with the availability of personnel.

The plan of work has been set up to provide four releases, one every 3 months. We will announce new releases on our own Globe site, but also on news groups. Each release will consist of a complete development tree, including the sources. Documentation will be updated as necessary.

The following plan of work is based on a reasonable amount of feedback from new Globe users. Reasonable means that most feedback can be processed in 1-4 hours per week. More feedback may require adjustment of the schedule, which will be done after consulting NLnet.

We also recognize that security is important, for which reason we will attempt to incorporate basic security mechanisms as soon as possible into our releases, possibly at the risk of delaying other activities. Again, adjustments to the plan of work are done only after consulting NLnet.

The following table shows what each release will consist of. The schedule sets a high priority on management tools to make it easier for new GDN users to start using GDN. Although it can be argued that security should have higher priority, we are faced with the situation that an expert for implementing security into GDN will not be able to join the project team before June 2001.

Release 2001.1:	Enhanced globify User-friendly installation and configuration tools Global registration of various servers
Release 2001.2:	Start and shutdown tools Crash recovery for object server
Release 2001.3:	Traceability Communication recovery
Release 2001.4:	Authorization of updates Authorization of replica creation Site monitoring

Back to SIRS project page

Back to Stichting NLnet projects page

RCSID: $Id: sir3-proposal.html,v 1.1 2001/03/08 09:41:46 wytze Exp $