Website Planet had the chance to interview Avi Press, a software developer with over a decade of experience working in companies like Pandora, and since 2019 founder and CEO of Scarf.
He explained how he started Scarf, a disruptive platform that empowers developers to unlock previously inaccessible, and investible, insights around open source project adoption.
We will discover the secrets behind the success of Scarf, and what opportunities he sees in the near future for digital entrepreneurs.
What is your story before Scarf?
I had my first experience around computing and coding in college. After graduating, I became a software engineer at Pandora and found myself building open source tools in my spare time. Like everyone else, I put them on GitHub to show what I was working on. Then people at bigger companies started using those tools, and this looked like it could be a potential business opportunity for the future.
However, I found it very challenging to navigate what came next. It wasn’t just finding people who might be interested in these tools, it was finding people who might be willing to pay for them or get support, to build revenue and create a sustainable company. It was hard to know who was getting value and using the tools in production, compared to the noise around people liking the project or saying nice things.
Today, a lot of my work in and around the open source community is about how we can help open source projects and maintainers be more financially successful. When you produce a valuable project, you may have an opportunity to create that business to go alongside it and be able to concentrate on that project full time. The opportunity here is that open source maintainers can be paid more fairly for their work and for the value they create.
What inspired you to start Scarf, and how did you validate the business idea?
Scarf started from my own pain points as an open source maintainer. I had projects that were out there and being used, but I didn’t know how they were being used, how much they were being adopted, or who was using them, unless those users actually got in touch and told me.
Why was this problematic?
This lack of visibility into the usage of my own software made it difficult to make informed decisions. In order to prioritize any given bug fix, it’s useful to know how many users are impacted by it. In order to make the software better, it’s useful to know where people get stuck and simply give up, without ever getting in touch with you. If building a sutainable business around your open source is a goal, it’s crucial to know which companies are relying on the software and how they use it.
No matter what a project’s goals might be, making decisions in the dark is never optimal for anyone.
I created Scarf as a company to help open source creators understand how their work is being used so they can be more successful, whatever that means to them.. Open source is so pervasive that commercial opportunities to build sustainable businesses around it is actually quite common, but creators have to know who is using their tools in order to proactively build those commercial relationships. . At Scarf, we see commercial opportunities for many open source projects that are not being taken advantage of because maintainers have no idea who’s using their code.
I believe that the OSS community as a whole benefits from this approach to data and creating more data-sharing initiatives amongst ourselves. When end users share info with maintainers about how they are using that software, maintainers can see where real-life users see problems and then take steps to fix those issues. Over time we’re seeing that maintainers benefit from having better usage metrics about their software, and that’s something that can benefit everyone downstream from them.
This data exists, but it’s often held by other organizations and not by the open source project owner. That maintainer has to rely on other kinds of signals that can be proxies for real-world usage, but they are not reliable enough. My take is that the OSS community would see tremendous benefits from this kind of data, based on how we share anonymous usage data with each other. This kind of discussion is just starting in the industry as a whole, and we need to have more conversations around how we can share data in the right ways so that everyone benefits..
How do you ensure that Scarf stands out from the competition?
It’s all about connecting open source developers to their users. Getting usage data into the hands of maintainers helps developers and project owners can help them make better decisions. For instance, you can ask questions that can help you concentrate your efforts where they are needed the most, like what versions of your software are in place, and how does that change over time. This information is super useful for maintenance, even separately from monetization. More informed maintainers leads to better open source software for everyone.
People need to see the impact of their work. Until then, open source maintainers are going to be at a disadvantage compared to those building proprietary software, where they gather all this kind of data as standard. It’s my goal to help maintainers be more successful. Scarf can correlate data so that developers can learn which companies use their libraries, and get more insight on the impact their projects have in the real world.
Whether you have commercial ambitions or not, having the tools to see the impact of your work helps everyone find more success. It helps us all maintain projects more effectively. Before, every time I got a GitHub notification, I wasn’t sure how to prioritize it. But when a user comes to you and says, “hey, this thing is broken in a subtle way on Windows,” and you can then see that this might affect a significant percentage of your overall user base, that allows you to prioritize accordingly. That single response might lead to work that helps dozens, hundreds of other users.
Being able to make data-driven decisions is a powerful way to help create an open source community where maintainers are less burnt out because they spend their time on the right things.
What challenges have you faced in starting out, and how did you overcome them?
The trade-off to accessing this data is that we need to be very responsible and respectful around data privacy and data collection. We have put a huge amount of effort into this, such as making sure that we don’t collect personally identifiable information (PII),detailing our approach to metadata, and ensuring there are ample ways to opt out.
There are some trade-offs to think about so that we can provide more useful data to maintainers and protect end-user privacy at the same time. We’ve prioritized getting the basics to maintainers that can help them with maintenance and help them identify commercial companies that are using their work, without sacrificing any of that privacy.
This work has enabled us to make the proper privacy guarantees that projects require when they operate within established open source foundations like the The Apache Software Foundation, Cloud Native Computing Foundation, and others.
The immediate problem we are looking at is something that no one else is working on in the way that we are. The very best registry, in terms of sharing data with maintainers, is probably craztes.io by the Rust Foundation or PyPI. Crates.io provides information like version adoption to package maintainers, but there’s so much more data available that could be useful for maintainers too. These distribution channels can be a black box for people building the software, so providing more data can help – that is where Scarf comes in.
We are building on this with more information sources and ways to help maintainers. For example, documentation is essential to developers. Having great docs and information sources helps adoption, but how do you know that your resources are getting used? Equally how do you know when you have a gap in your resources, or places that are confusing people the most? We put together Documentation Insights to help fix this problem, so that maintainers can see how their docs are getting used.
Similarly, we have worked on how to help developers manage software supply and repositories. You might only use one version control service to host your project, or you may want to host your software on multiple sites. How can you get that information on who is downloading what, and get it in one location when you have packages or container images in multiple repositories? We have our own gateway that makes it easier to correlate all that data and serve customers or users, and we have added services like automatic registry-mirroring and failover. We can have copies of all your containers cached and ready to go, and we can serve those should your registry goes down, and that’s a service we’ll charge for.
What strategies have been most effective to promote and grow your business so far?
Our growth is based on data. We have started publishing a quarterly report that goes into trends in open source usage over time, and that is popular with a lot of different audiences. It shares information on best practices that are developing, as well as where the demand for open source is growing fastest over time. Alongside this, we create our own podcast and video content with some of the leading figures in the world of open source, sharing their knowledge and their experiences around how to create and run sustainable and successful businesses that are open source at heart.
Those efforts help us grow and reach more open source maintainers and project owners, helping them to see where they might benefit from looking at data in more detail. Our goal is to be there and make all the processes around software easier, from delivery to maintenance, for support and commercialization of any given piece of software.
The Scarf toolchain will be the platform facilitating every aspect of distribution, and that’s how Scarf will make money in the long run.
What tools in your techstack are helping you the most, and how?
Most of it is open source!
One tool that may surprise you… Our entire backend is written in Haskell. While I could emphasise its benefits of type safety, the ability to confidently and quickly refactor to meet our needs, or even its hiring advantages, at the end of the day it’s just a language that brings us joy to write.
AWS could be argued to be helping us the most, we are certainly paying them the most. Our infrastructure is globally distributed, secured, and ready to grow with us.
There’s too many other tools to name, but the others include Docker, Terraform, Nix, Kubernetes, GitHub, Vercel, HoneyComb…
What opportunities and challenges do you see in the future of your industry?
I think our biggest opportunity has also been our biggest challenge. We want to empower developers to unlock previously inaccessible insights around open source project adoption, and assist those maintainers to find support or investment when it would help the community grow or build a company. However, we have to understand the attitudes that open source developers have around gathering and using data, and what they think about data privacy.
People hold lots of opinions around data privacy and equally they can be inconsistent, depending on who is gathering data, how it’s being gathered, and what it will be used for. For instance, NPM collects useful data about who is downloading packages; scarf.js collects the same data, but people did not respond positively because data collection was happening at an unexpected time, during an `npm install`. . However, when we talked to those who expressed concerns about scarf.js, they were instead comfortable with tracking techniques they were already used to, web analytics and pixel tracking. Those techniques go far beyond what scarf.js does.
What did this teach us? We learned to never make assumptions about people’s expectations when it comes to privacy. Still, it was surprising to learn that people were more ok with what became our Documentation Insights product, more than with scarf-js. This was definitely something we didn’t anticipate.
We are pushing the envelope with the kind of data-sharing initiatives in open source, and we will continue to listen to what people say. That means that we have to be very, very careful with the data we collect, the data we store, and the data we expose. At the same time, I believe this data can be a force for good in open source, for maintainers to understand how their projects are being used.
Over time, I think the rise of no-code and low-code development will see more participation from non-engineers in software development. To help in this, we have to get designers and other creators more integrated into the actual development of software and how that process can work more efficiently for everyone.
This has to be available for everyone too, so that developers and line of business teams can deliver what they need. This links into the overall requirement for open source software in general – there is a lot more to open source than just the source code, and integrating more of the people involved with the actual life cycle of the software is necessary for this to be successful. That’s definitely the direction we’re headed, and we need data to help this take place.
Any last piece of advice for aspiring online entrepreneurs?
My advice? Just go for it! From an open source perspective, keep contributing to projects and learn how things work for communities over time. Even if you’re submitting a fix to a typo in a readme, just send the pull request!
One of the best parts of working in open source is getting new developers in, and helping them with their first contribution. That’s the best part. People are happy to help when you’re wanting to learn and get involved. Even if you are not a developer, you can get involved in open source. The barrier to entry is relatively low for fixing documentation or making updates, and this is a lightweight way to get started that people appreciate. Getting in and talking to people, showing up at a new project that’s interesting to you, and saying, “I’m new, is there anything I can help with?” can get you embedded in a community quickly and providing value.
It’s also important to think about how to make this work sustainable. Work should always be compensated, one way or another. People should own their work and people should be compensated for their work, and that is the same for open source – even if you are providing your software for free, you will still need to support your community and pay those bills, so that you can keep things going and innovation taking place.
Roberto has spent over a decade helping affiliate blogs and cybersecurity companies increase revenue through Digital PR and conversion-focused content marketing. Somehow he’s still able to live the digital nomad life in between linkbuilding campaigns, content audits and SEO strategy calls.