News

EC publishes study on Next Generation Internet 2025 2018/10/05

Bob Goudriaan successor of Marc Gauw 2017/10/12

NLnet Labs' Jaap Akkerhuis inducted in Internet Hall of Fame 2017/09/19

NLnet and Gartner to write vision for EC's Next Generation Internet initiative 2017/04/12

Dutch Ministry of Economic Affairs donates 0.5 million to "Internet Hardening Fund" 2016/12/16

Vietsch Foundation and NLnet cooperate in internet R&D for research and education 2016/09/28

 

Project Mail::Box final report

[Mark Overmeer, MARKOV Solutions, 4 January 2004]

Introduction

Early 2002, foundation Stichting NLnet offered Mark Overmeer a grant to improve the development and user support for the Mail::Box software he had developed. The project started in May 2002, and was extended into a follow-up in January 2003. In December 2003, NLnet's involvement ends with this project final paper.

Mail::Box is an Open Source software library for the Perl programming language. It is designed to help (mainly system administrators) implement automatic e-mail processing. For instance, it can be used to create automatic replies on incoming e-mail, do spam and virus filtering, and implement web-based e-mail clients. The software is released under GPL.

This document describes the contributions to Mail::Box realized with the generous grant of Stichting NLnet. After completion of this sponsored project, the module will continue to be supported on voluntary basis.

Software Development

One of the areas covered by the project plan was pure software development, targeted to provide more "weight" to the library. Development took place on different aspects, of which the most important are listed below.

Main concerns during the development were

These criteria are the main difference between the other e-mail related Perl modules and Mail::Box: other software only provides low level operations on simple messages, and expects users to create the smarter functionality.

Releases

The software is released under the GNU Public License (GPL), which means that everybody is free to use and redistribute it. Fixes and extensions made to the software must be made public, which in this case actually resulted in some contributions.

As release model was chosen for a "release often" policy: new releases are made frequently. Each release contains bug-fixes and new features. Feedback about the release is swiftly handled to improve the next release.

For a project with only one developer, it is not practicle to use a system with a development branch and a stable branch. When a new release doesn't break existing code, the need for a stable branch is even lower. Mail::Box therefore used a single track release schedule.

When the NLnet project took off, Mail::Box 2.012 was current. At the end of the supported period, release 2.052 was reached. This gives an average of one release every two weeks, forty releases in total. Only a few of these releases were emergency fixes, which is a consequence of the release policy.

New features

The next sections will give some details about major developments during the time span of the project funded by NLnet.

New folder types

Folders (also named mailboxes), are used to store groups of e-mail messages. When the project started, only the Mbox, MH were supported. These types are the most popular types on Unix systems. However, the target of Mail::Box is to create a folder type independent interface, which is usable on as many different platforms as possible.

Maildir support was the first one to be added. It was already in a rough implementation at the start of the project, and finished in one of the early releases: version 2.015, June 2002.

In a joined effort with Liz Mattijsen, a POP3 connector was implemented. This full implementation of the POP3 protocol has nice extra features, like invisibly reconnecting lost connections to the server. POP3 was first released with version 2.027, October 2002.

Likewise, based on software from Tassilo von Parseval, access to Outlook DBX folders was implemented. The closed nature of this Microsoft format makes it impossible to provide write access to these folders, but read access is enough to move e-mail archives away from Outlook into Open Source folder types. Included since version 2.042, May 2003.

David J. Kernen wrote an IMAP4 protocol handler. That handler was used to create a back-end to that popular folder type, were the messages resides on a remote server. Not all features are implemented yet, but an alpha version was released in December 2003.

The main complication of the implementation is the level of portability and auto-configuration which should take place. In general, with Mail::Box the user's program doesn't need to know how the messages are stored. At the same time, much effort is put in optimizing performance in folder type dependent ways.

Message construction

New ways to create and process messages were added. In all these extensions, the implementation tries to protect the user from mistakes against the official rules of the RFCs. It is hard to grasp the content of the (at least) five e-mail related standards, so people should be kept away from that when possible.

Next to the reply, forward, and build --which already existed but saw some improvements--, new read and rebuild actions were added. "Rebuild" can be used to add plain text alternatives to html message parts, remove structural complexity, and such. "Rebuild" was added in release 2.041, May 2003.

Unicode

Due to its origin in the United States, e-mail is using 7 bits ASCII. However, the current e-mail standards support the encoding of character sets used in other parts of the world. Mail::Box received a simple way to decode and encode these "foreign" characters, although this has a negative effect on performance. Unicode features have been added over time, and can not be pin-pointed to a certain release.

Field groups

Each (MIME compliant) e-mail message starts with a set of lines describing the content of the message and the transportation process. Mail::Box got ways to handle sets of these lines which are related.

For instance, a header may contain multiple Received} lines, showing the intermediate computers which were used to transport the message from the sender to the destination. These lines can be accompanied by a few more lines. The lines are not very useful, and can hence be removed per group or all together, saving disk space. But they can also easily be inspected and constructed. ResentGroups are included since release 2.023, September 2002.

Header fields added by mailing-list software can be recognized (and removed) as well as fields produced by spam-fighting software. A connection to the popular SpamAssassin software was made. More use in the area of spam-fighting software is expected, as result of the ever growing amount of unsolicited e-mail. ListGroups were introduced with release 2.044 in July 2003. SpamGroups were added in 2.048, released in August 2003.

Web-based clients

Maybe the most important application for Perl based e-mail scripting has to do with web-based mail applications. To simplify this task for system developers, the module HTML::FromMail was created. It provides template systems to implement a web-mail client, taking care of many complicated tasks. The thing left for the mail-client developer is to add interactivity and layout. HTML::FromMail saw daylight in October 2003.

Performance

Various approaches were taken to improve performance. For one: many methods in the library implement alternatives, and the user can decide which alternative is fastest in his or her case. Usually, the automatic decision is the best.

A good example of this smart behavior, is the way a message's body is stored when the program is run. It can be stored as one large string, a set of lines, or as temporary file on disk. For the functionality it's all the same, but it depends on the size of the message and the way it is used which version of storage performs best. The best way to keep the body data, is simply remember where to find it, and not get it at all unless needed. This lazy behavior has been implemented in many of the library's features.

Besides the lazy implementation, there have been some real performance improvements. The two best gains (up to 20% gain each) were reached by rewriting existing Perl components. An optional message parser, implemented in C, has a benefit as well. The C parser's first release became available in December 2002.

Mail::Box was developed for automated e-mail handling, but is in the current state and on new PCs, fast enough for user applications. Some functionality required for interactive applications could be added to simplify the task of the user client developers.

Documentation

The best way to improve the acceptance of software by a community, is with better documentation. Mail::Box has grown quite large, and it will never be easy to learn how to use such a large library. But without good documentation you are lost for sure.

The standard way of documenting Perl was not sufficient for a code base of this size. Therefore, Perl's documentation system was extended with new syntax and a new tool to produce an homogeneous set of manual pages. These pages are indexed in various ways, and contain many examples and detailed explanations of concepts. The statistics at the end of the project:

Classes (and manuals) 128
Documented methods 931
Documented diagnostics165
Shown examples 228

Probably the easiest way to find the right methods to do a certain task, is by browsing through the HTML version of the documentation. This was all realized during the project.

The documentation tool has been released as separate product, named OODoc: Object Oriented Documentation.

Promotion

Providing a good implementation is one side of the story, but growing an user community is even more important for a successful product. Therefore, quite some promotional activities were initiated.

Mailing-list

In an attempt to get people reacting on each other's use of Mail::Box a mailing-list was started. There is quite some activity on the list, but this is mainly related to mistakes in user code and bugs in the library. It did not really contribute to joint development.

The mailing-list started in May 2002. In February 2003, it had 70 members. At the end of the project in December 2003, 118 people followed the list.

  Jan 2003132 messages
  Feb 2003138 messages
  Mar 200392 messages
 (from May 20)Apr 2003104 messages
May 2002 7 messagesMay 200369 messages
Jun 200273 messagesJun 200322 messages
Jul 200225 messagesJul 200368 messages
Aug 200279 messagesAug 200345 messages
Sep 200253 messagesSep 200334 messages
Oct 200276 messagesOct 200326 messages
Nov 200257 messagesNov 200347 messages
Dec 200264 messagesDec 200333 messages
  (till Dec 6)
Messages posted on the mailing-list.

The usage statistics of the list's archive , show a very irregular pattern, mainly driven by major changes in the library.

On the other hand, the mailing-list statistics are only showing a part of all traffic: to avoid boring other list members, bug hunts usually took place off-list. The mailing-list received 1244 messages, while Mark's personal archive adds up to 3500 (of which about half written by me). It is clear that answering messages consumed a lot of time... for the 19 months of the project, this means 8.8 message per workday, or 22 per sponsored workday.

Conferences

Mail::Box was promoted in various ways, but mainly by giving talks on various (Perl) conferences. Preparations of the abstracts, papers, and slides consumed more time than planned for the project. NLnet sponsored mainly the travel and stay.

SANE 2002 in Maastricht, The Netherlands
a 45 minutes contribution on the use of Mail::Box for system administration, entitled "E-mail with Perl". (UNIX system and network administrators conference)
YAPC::Europe 2002 in Munich, Germany
A three hours tutorial "E-mail programming with Mail::Box", 45 minutes talk on software development of large libraries, and 7 minutes about the Mail::Box spin-off module Object::Realize::Later. (European Perl conference)
German Perl Workshop 2003 in Bonn
45 minutes talk on the Mail::Box spin-off User::Identity, 15 minutes about Unicode e-mail headers.
YAPC::NA 2003 in Boca Raton, Florida, USA
90 minutes tutorial on Mail::Box, 20 minutes for OODoc, and 5 minutes for Object::Realize::Later. (North-American Perl conference)
YAPC::EU 2003 in Paris, France
new 95 minutes tutorial on Mail::Box.

Attracting external developers

Op

en Source projects must have a community to be successful. They do not only require a group of users --supporting each other with the use of software--, but should also have a group of developers which can supplement each other in the development process. A project which is developed by only one person, like Mail::Box, may collapse when that one person stops development, for instance by illness or lack of time.

In the ideal situation, a group of active developers with comparable influence are in control. This can be found in FreeBSD, Gnome, and KDE development teams. However, the Linux kernel development has only a very small group of people on top, which do not call themselves {\it a team}. But that also shows to work. Many smaller applications depend on the effort of one person. When that developer stops its work, the product starts to faint away, which may take many years. For instance, the XV image displayer hasn't been changed since 1994, but is still distributed with the latest SuSE Linux.

During this Mail::Box project, effort was made to attract developers for the module, to try to shape {\it a team}. Time was reserved to encourage people to participate in development. However, this did not succeed.

One way to get people helping a hand, is by explicitly tackling their problems with the existing code. That way, a person relation is built, which may grow active developers. Every few months, the members of the mailing-list were asked for their needs. This always brought some life to the list, and some ideas to work on, but no code contributions.

Furthermore, each time someone spoke about their own application using Mail::Box, that person was invited to contribute the code as part of the module. People were not unwilling, but the conversion from an application which suites personal needs into code which is usable for other people is huge: much higher requirements on configuration, documentation, and automated testing. In some cases, the employer did not permit the contribution.

As test-case, it was planned to find someone to implement IMAP4 support. No less than four people offered to implement this, over time. Still, each time the good intentions faded when the complexity of the required code came clear to the volunteer. The POP3 protocol was much easier. Liz Mattijsen offered to implement it, and (once started) there was an full implementation within two weeks.

An other complication to get spontaneous code contributions, is the size of Mail::Box. Combined with its Object Oriented coding style, with up to 5 levels of inheritance, it is not easy to get a good feeling about the internals. It is hard to figure-out what the best spot for a new functionality is, and often some existing functionality has to be rewritten, redesigned, or relocated.

Many programmers do not feel capable enough to write code which is usable by other people: they hesitate to show their programs. To be honest: usually they are right. Getting them to release code requires a lot of guidance; many long e-mails explaining how to produce better code. Only a few reach a publishable level.

After 19 months, the number of received code patches has increased, but these are all quite small patches: never more than a few lines. No-one has offered to join core code development. Which is a shame.

Deployment

Mail::Box has found deployment in different areas. Most of these applications are hidden to the outside world: it is in most cases part of a company's internal infrastructure. Often, it is used to clean-up e-mail archives or handle databases containing messages.

To name a few applications:

\end{itemize}

Spin-offs

The Mail::Box development has resulted in a few modules which can also be used with other applications than purely e-mail related. These modules are

MIME::Types
is a collection of knowledge about MIME types, which can be used to map file-name extensions to types, vice versa.
OODoc
is a system to document complex (probably large, often Object Oriented) modules.
Object::Realize::Later
is a tricky module to implement lazy (delayed) creation of objects, which improves performance. This attracted a lot of attention from hard-core Perl programmers,
User::Identity
plays smart about user information, like deriving someone's probable language preference from an e-mail address. Or discovering a person's gender from a full name description in multiple languages.

Acknowledgements

Special gratitude to Stichting NLnet for offering me the chance to work on this free software package. With the help of NLnet, the Perl software base is enriched with a powerful library, which in time, may become the basis of the next generation e-mail applications.

The following people contributed. Some contributed documentation, other send in patches or bug reports. Major contributors are marked with (*).

Adam AugustineGilles DaroldMike Cudmore
Adam Byrtek 'alpha'Greg Matheson*Mike Mimic
Alan Kelm*James SanfordNick Ing-Simmons
Albert SchuellerJan StapelNik Clayton
Alex LibermanJason WoodwardPaul Simons
alexJeff SquyresPhil Hagen
Alexander BauerJeffrey FriedlPhil Holden
Andre SchultzeJeremy BanksPjotr Prins
Andreas FitznerJerrad PierceRob Holland
Andreas M. RiechertJoe JunkinRobin Berjon
Andreas PiperJohn B BatzelRon Savage
Anthony D. UrsoJon ThomasonRonnie Paskin
astaJost KriegerSebastian Krahmer
Beirne KonarskiKaren CravenSebastian Willert
Benjamin PineauKees DekkerShagren
Bernd PatollaKingpinSimon Cozens
Bill MoseleyLiz Mattijsen*Simon von Janowsky
Blair Zajac*Lutz GehlenSlaven Rezic
Brian GrossmanMarcel GruenauerStefan Wolfsheimer
Christoph DahlMarcel de BoerSteve Lewis
Conrad HeineyMark Ethan TrostlerSteven Benson
Constantin KhatsckevichMark WeilerSupriya Jagadeesh
Cory JohnsMartin ThurnSwapnil Khabiya
Darrell FuhrimanMarty J. RileyTassilo von Parseval*
David A. Golden*Marty PauleyTerrence Brennon
David Coppit*Matthew DarwinTim Sellar
David FavorMatthew LocknerTodd Richmond
Dimitris GlynosMatthew WalkerTom Allison
Edward Wildgoose*Max MaischeinTony Bowden
Emmet CailfieldMax PoduhoroffWalery Studennikov
Eric WheelerMelvyn SopacuaWiggins d'Anconia
Eugene Eric KimMichael D RichardsYuval Kojman
Evan BorgstromMichael Reece
Francois PetillonMichael de Beer

Calls

Send in your ideas.
Deadline Feb 1st, 2018.

 

Project Mail::Box

NLnet Projects