Send in your ideas. Deadline December 1, 2024
Grant
Theme fund: NGI0 Discovery
Period: 2021-08 — 2022-09
More projects like this
Services + Applications

AREXERA Crawler

C++ based web crawler

The AREXERA web crawler dates back to the early 2000's when AREXERA GmbH (former TECOMAC GmbH) wrote it as part of a toolset to run public search engines like Seekport in Germany and some other European countries. The AREXERA crawler is written in C++ and was designed from the ground up for speed. The crawler supports the common features, like TLS support, robots.txt, politeness rules and WARC file output. The tool was in full production use until the company went out of business, and subsequently development stopped for a while. Recently the code resurfaced, and AREXERA was reborn as a free and open source project. Recent first tests showed still promising performance compared to other widely crawlers. The aim of the project is to bring the crawler up to date with modern requirements and clean up the code, so it can be properly benchmarked with a representative workload - after all, high crawling speed means faster throughput and a lower power consumption per fetched web page.

    Run by -

    Logo NLnet: abstract logo of four people seen from above Logo NGI Zero: letterlogo shaped like a tag

    This project was funded through the NGI0 Discovery Fund, a fund established by NLnet with financial support from the European Commission's Next Generation Internet programme, under the aegis of DG Communications Networks, Content and Technology under grant agreement No 825322.