Send in your ideas. Deadline June 1, 2024

Interview With Viktor Lofgren from Marginalia Search

"Let's take alternative paths away from big tech websites"

screenshot from the Marginalia search page interface.

Viktor Lofgren is the creator of Marginalia Search, a search engine takes you of the beaten track by letting you find small quality web pages. These pages barely surface in commercial search engines because they are snowed under by larger commercial websites and marketing. We interviewed Viktor for FreeWebSearchDay. You can listen to the recording of the interview or read the edited transcript below. The interview was conducted by Tessel Renzenbrink, comms officer at NLnet.

"The internet seems a lot smaller than it used to be"

NLnet: Can you tell us something about Marginalia Search?

Viktor: Marginalia Search is really bit of a COVID baby. Like many people I had a lot of time on my hands during the pandemic. I spend a bunch of it online and I was frustrated with the state of the internet. I noticed in recent years something had changed, it seemed a lot smaller than it was before. Never really seeing anything new and I couldn’t find any blogs and forums and so on. So I wanted to investigate what was going on. It was certainly possible that the internet had changed and all these websites had vanished. But it also seemed possible that the way we are interacting with the web had changed.

And that is difficult to verify without having something to compare it with. I just started working on a search engine, intended more or less to work like Google used to do in the late 1990’s. It is a very traditional, by the book search engine, a keyword search engine. I am doing my own crawling and my own indexing basically on PC hardware. What I found was a bunch of website that were completely different to what I would find in the big search engines or on social media. Which is fascinating. It did not give me another option than to build this search engine because it was such a breath of fresh air. Basically I have been going since and adding to it. Running it on fairly low-powered hardware as well.

"Without software diversity, you get a one-sided view of the world"

NLnet: I understood that you want to make the crawling data public?

Viktor: That is an ambition, at least. We will have to see in how far that is possible in terms of logistics but it may also be a legal gray zone that’s is difficult to navigate. My ambition is in the future to collaborate on the crawling bit. Possibly run various versions of this search engine or something similar.

NLnet: How would it benefit people if the crawling data was public?

Viktor: The thing about search engines in general and this is the big problem with having just a few of them is that they are censurable. If you have too few search engines someone can come and intimidate you into removing a website or hiding some fact. But also, if we are talking having different pieces of search software that are using the same crawling data, when you design a ranking algorithm you basically encode your own values and perspectives into the software. So if you don’t have enough software diversity in the sense that you have multiple search engines build by multiple people than you get a very one-sided view of the world. And having someone else come and build a search engine with ther own ranking algorithm for example, is that they would promote different types of content. And that would benefit people in general. Just to be able to find different types of websites.

NLnet: And you also want to crowd source the search sets?

Viktor: I’ve experimented a bit with that. I have a GitHub repository where, if you want to add a website you can make a pull request. If it isn’t a terrible website in some way I will approve it and then it will be eventually crawled. I haven’t actually rejected any entrees yet. But maybe one day someone will try to add something awful.

"If you can’t find something, it will not grow"

NLnet: Do you want to give people more possibilities to find their own way on the web with Marginalia, as opposed to how the big search engines work?

Viktor: Yes, and that is an important point. Because you can think about search engines as bringing websites to people but you can also think of them as bringing people to websites. In terms of growing communities and fostering creative content and information sharing and so on, having search engines and discovering mechanisms for this stuff is critically important. Because if you can’t find something than it will not grow. A lot of stuff is out there but it is really struggling to find an audience because it is just displaced by so much search engine marketing. If you are an ad tech company than it is fairly hard to penalize adds on the internet in a way that I can do. But it does not harm me if you don’t see adds in my search results.

NLnet: Can you expand on that? What do you mean with penalizing ads?

Viktor: I can for example look at the HTML of a document and if it has too many ads or if it has too many tracking elements I can downrank the website for example. Or enable a user to have a check box to say I don’t want to many ads. I prefer content that does not have ads, for example. It is hard to get it perfectly right, but even to remove 75% of the ads that’s still a huge improvement.

"Having fresh eyes on the problem is refreshing"

Moving on from Marginalia to search in general, what do you think are the big issues with how search works today?

Viktor: I think I’ve gone over most of my key gripes with search in general. The big problem is the limited number of indexes that are available. There are a lot of search engines out there but most of them use Google or Bing as their back end. There are a few other indexes online as well but there is not really a lot of them. Having this limited set of sources to pull from if you are building a search engine, really limits what you can accomplish.

NLnet: With an index you mean mapping the web?

Viktor: Yeah, basically. You can conceptualize a search engine as consisting of a database that you fetch results from and then you can do some re-ranking of them. Both Google and Bing and even my search engine offer an API where you can ask me to do the search. I will give you a machine readable list of results and then you can do something with them yourself. For instance if you want to build a front end of a search engine. Most search engines aren’t doing this. They are using a combination of what is available and that is a bit limiting. I would like to see more different takes on this. It would be really helpful to have other people dabble in search without having to build an entire search engine from scratch.

NLnet: Are there other efforts that you know of who are working on this?

Viktor: I don’t know if I can or want to mention any particular projects. But as I said, there are a lot of small projects, especially in the last couple of years. It may be a COVID effect. Maybe a lot of developers had a lot of time on their hands to to explore this area and build independent search engines. Some of them have stagnated and some of them haven’t. But it is refreshing to see people who are not coming from an academic background and formal information retrieval look into this. Because there are a lot of assumptions that have been around since the ‘80s, or the ‘70s even, on how to build a search engine. So having fresh eyes on the problem, even if it does mean occasionally reinventing the wheel, is refreshing. And there is interesting stuff coming out of it. And not everything is turning into search engines but just in general, discovery. It feels like for the last ten years not a lot of stuff has happened but the last two years there has been a large number of these tiny projects showing up. That is exciting to me.

"The Linux of internet search engines"

NLnet: If you think about search five or ten years from now, what would you like to see?

Viktor: I just want to start by noting that we’re at an interesting inflection point right now in terms of hardware. Because computers have gotten super powerful in the last ten years. And we are at a point where operating a search engine isn’t necessarily that expensive anymore. Ten, fifteen years ago you needed a large budget to be able to play in this space. You needed to demonstrate that you were going to be making a profit. Because nobody is going to throw tens of millions of dollars at something just for fun. But now we are at the point where regular human beings can dabble in this space. I hope this means a lot of developers and programmers and other people will seize the opportunity to experiment and approach the problem. Because if you have more eyes looking at the problem, than hopefully more solutions will be found and old conventions will be challenged.

I am hopeful for the future that something good will come out of this and something like the Linux of internet search engines will emerge. Where people can collaborate and build something great together, open source.

"Venture outside of the big main stream websites"

NLnet: Is there anything that people can do today to make this better future of search a reality?

Viktor: I think by just participating in the web and not just consuming it. There is a chicken and egg situation where smaller websites are kind of dying because people aren’t finding them and people aren’t looking for it because they are difficult to find. I think maybe look more actively outside of the beaten path and big social media websites. Even though that is a bit hard right now. The more people venture outside of the biggest main stream websites the more stuff they will find. And the more people are finding this stuff the better these websites will get, and the more alternative paths away from the gigantic big tech websites will be build. By walking them.

NLnet: And if someone would want advice on how to get there, I can recommend your search engine, Marginalia, which is build to get people there

Viktor: Yeah, that is the big goal for the project right now. Just to show people what is out there. Perhaps it is not the most useful search engine right now. There are some improvements necessary before it can be more than it is. But I think we will probably get there at some point.

NLnet: The last question is about FreeWebSearch Day. Is there anything you hope people will take away from it?

Viktor: I think they should take away that websites aren’t fixed. We don’t have to have a Google and a Twitter and a Facebook. That does not need to be the forever status quo. Even if you think back ten years ago the web was different. And it is possible to build new stuff. The stuff we have now was build by someone. And we can still do that. For some reason I think we stopped trying to build new websites and new web services. But that is still doable and, if anything, easier than before.



Funding

Marginalia Search received funding through the Entrust Fund for Trustworthiness and data sovereignty. The funds are established by NLnet.nl with financial support from the European Commission's Next Generation Internet programme.

Do you also have an open source project that needs funding? You can apply for one of the theme funds of NLnet.


Logo NGI Zero: letterlogo shaped like a tag
Logo European Commission