Send in your ideas. Deadline June 1, 2024

Binary Analysis Fund

Derive knowledge from binary blobs such as firmwares

This page contains a concise overview of projects funded by NLnet foundation that belong to Binary Analysis Fund (see the thematic index). There is more information available on each of the projects listed on this page - all you need to do is click on the title or the link at the bottom of the section on each project to read more. If a description on this page is a bit technical and terse, don't despair — the dedicated page will have a more user-friendly description that should be intelligible for 'normal' people as well. If you cannot find a specific project you are looking for, please check the alphabetic index or just search for it (or search for a specific keyword).

Automated clearing of source code files — More efficient retrieval of security and license compliance contextual information

A common task for companies is to clear software source code files for legal or security reasons before they can be used by the software developers. The clearing process is tool driven, using tools such as code clone detectors/snippet matchers, license scanners and security scanners. Typically the clearning process starts from 0 for each new file that is analyzed and the fact that open source software is changed incrementally most of the time, and the software being scanned will likely be nearly identical to previously seen software, is not used. For a (large) subset of files it is possible to use this characteristic to (semi-)automate this process. When scanning a new file, first find a closest file in a set of known files, compute the difference to the known file, checking where the difference in the file is and use rules to determine what action to take depending on where the difference in the file is.

When scanning source code people are typically looking at the file as a whole as an individual unit but never at the lifecycle of the file: how much was changed and where was it changed. For license compliance it makes no sense to rescan files if the header where the license text is found has not been changed and earlier conclusions can be copied. For security it doesn't matter if only comments are changed but no code. This project tries to tackle this by finding out a little bit more about finding a closest match to the code (is there already a file that is close enough), determine the structure of the file (what is comments, what is code) and then comparing the two files to see where changes were made. Depending on the scenario (license compliance or security) different actions can subsequently be taken by the user.

>> Read more about Automated clearing of source code files

binary-analysis-ng improvements — Integrate Kaitai in binary-analysis-ng

Firmware is one of the most opaque components of our technology stack. Firmware analysis is a critical factor in making our appliances more secure, but there is a very limited set of tools available. BANG is a tool to analyse firmware and other binary files. The code and complexity of the tool has grown significantly over time, making it challenging to maintain.

Most of the parsers are hand-made. Meanwhile the reverse engineering community has produced significant efforts for analyzing binaries, such as the kaitai struct framework (http://kaitai.io). The project will integrate these efforts, and will in addition work on optimising performance based on realistic workload performance measurements.

>> Read more about binary-analysis-ng improvements

Serialization in Kaitai Struct for Java and Python — Declaratively modify and create complex binary file formats

Kaitai Struct (KS) is a tool for working with binary formats. It introduces a declarative domain-specific language for describing the structure of arbitrary binary formats. Over 170 formats are already described in the official format gallery. Based on any specification, KS can automatically generate a ready-to-use parsing module in one of 11 programming languages (C++/STL, C#, Go, Java, JavaScript, Lua, Nim, Perl, PHP, Python, Ruby). The current state of KS only allows you to extract data from binary files (parsing). However, in many cases, the opposite direction is also needed, i.e. to modify the data in the binary files or to create new ones (serialization). It is a logical extension to KS that allows new uses of written format specifications. This is by far the most requested feature in KS for a long time. Its absence prevents many users from using KS to its full potential. The goal is to add stable serialization support to the KS project. This will involve extending the compiler, adding support for serialization in runtime libraries, and building an automated testing infrastructure for serialization. This project will implement serialization for Java and Python.

>> Read more about Serialization in Kaitai Struct for Java and Python

ZIP file format description — Documenting the ZIP file format for reverse engineers and developers

The ZIP file format was originally a compression format, but is meanwhile used a lot in projects. Although there is a historical specification (dating back to 1990), there are plenty of edge cases as well as files not following the specification. These for instance add extra data (electronic signatures/keys, pad data, (an example are Android APK files) or change headers (Dahua firmware files). Information is scattered on various webpages, and can be hard to decipher. The goal is to gather this information in one place and to describe the format properly with examples. Given the broad usage of ZIP files in many use cases by different actors, this will be an ongoing effort - as new exceptions and extensions continue to be uncovered.

>> Read more about ZIP file format description