Send in your ideas. Deadline December 1, 2024

Last update: 2003-01-02

LCC

local content caching system for new search engine architecture

Introduction

Centralized search engines, allowing searching of a fraction of the entire internet, are encountering significant scalability problems as servers struggle to keep up with the exponential growth of content providers and the amount of content provided. The main problem for a search engine, or any other content "user" for that matter, is keeping an up-to-date (processed) copy of the content of each content provider. Because there is no "protocol" for content providers to let search-engines know that there is new content, or that old content has been deleted or updated, search engines periodically "visit" the content provider, often fetching content that was fetched before. Valuable resources are used in this process, while the results are still inherently out of date.

Another problem is that content providers on the Internet provide content in a form that is good for the human reader, but which is not really ideal for the type of processing needed to create a search engine or similar process.

This six month pilot-project will investigate what would be needed to create a system of local content caching, in which a content provider can notify a Local Content Cache of new (or updated or deleted) content. This content will then be collected by that Local Content Cache, possibly in a form more suitable for content processing than the form in which it is presented to the human reader. Such a Local Content Cache can then be used by a search engine, or any other content "user" such as an intelligent agent, for its own purposes. A proof of concept implementation of the software needed for a Content Provider, a Local Content Cache and Content Users such as search engines and intelligent agents, will be part of this pilot-project.

Architectural Overview

overview picture

The goal of this pilot-project is to create a functioning proof-of-concept in which:

  • one or more Content Providers can notify a Local Content Cache of new, updated or deleted content.
  • a Local Content Cache can then fetch the indicated content from the Content Provider in the manner and at the time indicated by the Content Provider.
  • the fetched data can provide different "views" of the same underlying data (i.e., the data can be provided in several different forms)
  • the content fetched by the Local Content Cache is stored on the server on which the Local Content Cache software modules are running.
  • a Content User is able to interrogate the Local Content Cache for a list of new, updated or deleted content.
  • a Content User is able to obtain this content from the Local Content Cache.
  • a Content User can be shown to work on the content as provided by the Local Content Cache.

Project LCC

Navigate projects

Search