Calls: Send in your ideas. Deadline April 1, 2024

Last update: 2003-01-02

Grant
End: 2002-01

LoCoCa (LCC) Final project status: as at october 2002

local content caching system for new search engine architecture

The LoCoCa project and associated CVS archive has existed on SourceForge for several months. There has been a single package release (version 1.0.0) containing several updates available from the project page.

The following sections denote updates to the information found in the report dated 14 August, 2002.

1. Major status points

1.1 What's done

Development has pretty much stopped, and the system is in a usable and reasonably tested state.

As mentioned above, the system is complete enough that a package release has been made, installed and used on several machines. The package contains the sources which must be compiled and installed, a process simple enough on Linux systems where following the README gives a step by step guide.

It was briefly noted in the last development status report that there would be no "magic application" that knows how to produce <url> containers as part of a Content Provider. In fact, the final sources available contain a reasonably complete toolset for Unix systems to implement a CP:

  • lcccp; this has always existed and provides low-level communication with the UNS. It accepts a set of <url> containers to transmit.
  • lcccpstate; a filter that accepts <url> containers and a state file, and produces a set of <url> containers representing changed content and updates the state file.
  • file2url.sh; a simple shell script that can produce <url> containers describing files when given the filenames.

These tools can be used individually or together to form a CP.

The UNS, QM, BM and BOT processes, as noted in the previous report, are all functionally complete.

The LCC is being integrated with the existing search.nl search engine [[[ and more stuff here, like wamnet etc etc. ]]]

What remains to be done

There is a TODO list available with the source download that contains a list of technical items that should eventually be actioned; most are very minor or 'like to haves' simply noted in the list so as not to be forgotten.

A few of the more major points are the following:

  • The LCC currently only collects HTTP requests. Hooks exist internally in some parts of the LCC for other types of request, such as FTP, but some implementation needs to be done.
  • Security needs to be looked at regarding the communication channel to the UNS (eg, using SSL) and, perhaps, supporting HTTPS for collecting content. The provider password is transmitted and stored in the clear for the moment as well.
  • An investigation should be performed to see if using transactions at the MySQL level provides any degree of extra robustness and, perhaps, implement its use as a run-time option if there are performance implications.

Also, the "Content User" end of the system is not represented by an API. Actions are performed manually by issuing SELECTs directly on the LCC tables and gathering content directly from local files. A CU "api" was not deemed useful at this time until the system has been used by several different parties to give direction to the functionality required.

Another idea that we thought of recently that could have far reaching effects, would be the idea to associate category attributes with collections of URLs. The idea would be to invent a set of assigned numbers for category systems, for example "1" for dmoz, "2" for the dewey system etc. Then clusters of URLs could have 1 or more categories and category systems associated with them. By this process one could provide for a ground roots category system for the internet, libraries etc.

Further publicity and securing a future

Towards the end of further publicity, we have written a document that provides a technical overview deemed more suitable for publishing as a article on technically oriented websites. At an appropriate time this should be released.

Secondly, and more significantly. We have arranged for future development of the LCC protocol within the "Ingenuus Institute". This is a Belgium non-profit organisation whose social goal is:

"The promotion of the use of Open Standards based Software and IT solutions in education, government, health care and business"

This institute has given jobs for both Gordon and myself to further develop both the LCC project and NexTrieve, the latter will be also released into the public domain under a GNU GPL license.

In addition, the institute will take over hosting and maintenance of Search.NL, and will turn this search engine into an example/experimental search engine whose sole purpose is the promotion of the LCC protocol. It will be marketed as such and will only accept further submissions using the LCC protocol. Despite having it's origins as a Dutch search engine, we will open up submissions to anyone to reflect it's change in mission. This opens up doors for further press releases to the International community.

The founders of the institute are very enthusiastic about it's prospects and have several addition funding tracks in progress, some of which are oriented around other synergistic projects. Such as an open source content management system. Integrating the LCC protocol into a content management system may possible be the very glue necessary to aid wide spread adoption of the LCC protocol as it removes the difficult stage of defining the websites.

The institute has an experienced marketing person designated to handle press releases and external communications in relation to both the LCC protocol and NexTrieve. We will be targetting a press release day of November 15.

This action of securing a developmental future of the LCC protocol and by adding the synergistic component NexTrieve into the open source, coupled with the marketing and University contacts of the institute, should provide a powerful boost to serious introduction of the LCC protocol into the world. A better conclusion would be hard to imagine at this stage :-)

Project LCC

Navigate projects

Search