Sepand Haghighi, Arash Zolanvari & Sadra Sabouri - PyCM

Evaluate the performance of ML algorithms

icon of databse Data and AI

cartoon of friendly smiling man with spyglass observing a robot with human-like brains.

Can you introduce yourself and your project?

We are living in the machine learning era. It is important to have solid methods for evaluating trained machine-learning models. Genuine evaluating methods make comparing different machine learning models easier. The confusion matrix is a representation of a classification model’s performance. In the simplest case, it includes how many samples are correctly or wrongly labelled as true or false. Several metrics can be derived from the confusion matrix, each assessing an aspect of the model’s performance. Unfortunately, popular machine learning libraries like `scikit-learn` only support a small portion.

Our project, PyCM, emerged to bridge this gap

PyCM, Confusion Matrix in Python, gathered a complete list of metrics and tools for assessing and comparing machine learning classification models. Our team includes senior software developers and PhD researchers who devoted six years to software programming and research for PyCM.

PyCM team includes:

Sepand Haghighi is the initiator and maintainer of the project. This project was part of his master’s thesis. Not only does he have a solid theoretical background, but he is also a one-of-a-kind software architecture designer and software developer. Sepand built multiple open-source software packages in Python, some downloaded millions of times from Pip. Sepand enjoys rock music in his free time and loves swimming.
Arash Zolanvari is another core developer of PyCM. He is a PhD Candidate in Optimization and Decision Systems from the University of Groningen. Arash is a valuable researcher who helped PyCM develop far further than its starting goal. Due to Arash’s efforts, PyCM is now the only Python library that fills the gap for machine learning model evaluation in the academic research community. Arash likes table tennis and cooking in his free time.
Sadra Sabouri is the final core developer of PyCM. Since joining the team in 2019, he has refreshed the development process. He is now pursuing a PhD in computer science from the University of Southern California. He is a solid software developer who proposed new features and took PyCM to the next step. He likes skateboarding, gardening, and reading books as hobbies.

What are the key issues you see with the state of the internet today?

After the emergence of AI, the internet is on a historic edge. OpenAI and other big companies are in a crazy tournament to serve “the best” large language model (LLM) over the internet through APIs. Evaluating these LLMs is complex due to the complexity of evaluating models on different tasks and aggregation.

Therefore, benchmarking LLMs will be a future direction of much research, and many funding agencies are dedicated to funding these research directions. One of the classic tasks used for evaluation is classification. In this task, an LLM model should classify given objects into classes. Existing tools that can evaluate classification results are designed naively, which leaves much room for improvement.

How does your project contribute to correcting some of those issues?

PyCM emerged as the first and the most complete tool for evaluating AI classification tools. PyCM filled this gap both in academia and industry. PyCM paper was cited more than 170 times by researchers from different domains as a tool for evaluating machine learning models in computer science, health care, etc. Course producers added this tool as a typical machine learning post-training evaluation tool in their course programs. Companies that integrate LLM in their workflow use PyCM to evaluate LLMs in different classification tasks, such as text summarization.

What do you like most about (working on) your project?

We are always super excited to see the impact of our work on society. Our project impacted lots of stakeholders in different ways. PyCM helped medical researchers build better tools for disease detection and, therefore, helped humanity. PyCM also helped many AI practitioners compare different models and tune the best model for their use case. Contributing to such an impactful project is not an everyday opportunity. We are truly proud of our library and its contribution to society.

Where will you take your project next?

We are currently working on making PyCM more accessible for less tech-savvy users. We are planning to design a website for easier interaction with the library. Plus, we are working on structural enhancements that will allow new features to be added to the project. We hope to build a sustainable environment in the library, which can increase our library’s impact even further.

How did NGI Assure help you reach your goals for your project?

NLnet Foundation has supported the PyCM project from version 3.6 to 4.0 through the NGI Assure Fund. Using this grant, we developed a new feature supporting multi-label classification scenarios. We also implemented a new structure for trade-off curves which is frequently used in machine learning evaluation settings. We added new metrics to the library and improved the “compare” feature, enabling users to compare the output of several machine learning models. Finally, we published a preprint literature review on raters’ agreement.

Do you have advice for people who are considering applying for NGI funding?

“Dream big”. After releasing each version, NGI funding and their support immensely helped us follow our mission. If you have an idea that aligns with NGI funding goals, no matter how big, go for it. You would be surprised what you can do with your dedication to the work!

Do you have any recommendations to improve future NGI programmes or the wider NGI initiative?

Our main recommendation for NGI programs is to increase their focus on artificial intelligence and machine learning infrastructure projects. Given that machine learning is one of the most important trends on the internet, it is a good use of effort. We were ready to apply for NGI grants for several projects in the past that might count as out of the program’s scope. There are, for sure, other vibrant teams who can benefit from NGI support on their projects.

The other place that we saw room for improvement is the communication process when applying. It was a bit slow and could be faster. We applied for another round of NLnet grants for the PyCM project, and we hope to hear back from them soon.

Acknowledgements

Image: courtesy of Sepand Haghighi, Arash Zolanvari & Sadra Sabouri.

Published on October 31, 2024

PyCM received funding through the NGI Assure Fund, a fund established by NLnet with financial support from the European Commission's Next Generation Internet programme, under the aegis of DG Communications Networks, Content and Technology under grant agreement No 957073.