If you read security news, you may have heard about a recent attack that resulted in 144,000, that’s one hundred and forty four THOUSAND packages being uploaded to NuGet, PyPI, and NPM. That’s a mind boggling number, it seems like with all the supply chain news it would be all anyone is talking about. Except it seems to have flared up quickly then died right down.

The discovery of this attack was made by Checkmarx. Essentially what happened was attackers created a number of accounts in the NuGet, PyPI, and NPM packaging ecosystems. Those fake accounts then uploaded a huge number of packages that linked to phishing sites in the package description. The intention seems to have been to improve search ranking of those sites as well as track users that enter sensitive details.

Supply chain security is an overused term

This concept called “supply chain security” is a term that  is very overused these days. What we tend to call software supply chain security is many other things that are sometimes hard to describe. Reproducible builds, attestation, source code control, slim containers are a few examples. An attack like this can’t be solved with the current toolset we have which is almost certainly why it’s not getting the attention it deserves. It’s easy to talk about something an exciting project or startup can help with. It’s much harder to understand and fix systemic problems.

Why this one is different

To understand why this is so different and hard, let’s break this problem down into its pieces. The first part of the attack is the packaging ecosystems. The accounts in question were valid, they weren’t hacked or trying to impersonate someone else. The various packaging ecosystems have low barriers to entry, this is why we all use them and why they are so incredible. In these packaging ecosystems we see new accounts and packages all the time. In fact there are thousands of new packages added every day. There’s nothing unexpected if an attacker creates many accounts, no alarm bells would be expected. Once an account exists, it can start adding packages.

The second piece of this attack is someone has to download the package in question. It should be pointed out that in this particular instance the actual package content isn’t malicious, but it’s safe to say nobody wants any of these packages in their application. The volume of bad packages are important for this part of the attack. Developers will accidentally typo package names, or they might stumble on a package thinking it solves whatever their problem is. Or they might just have bad luck and install something by accident. Again, this is working within the constraints of the system. So far nothing happening is outside of everyday operations.

Then the last part of this attack is how it gets cleaned up. The packaging ecosystems have stellar security teams working behind the scenes. As soon as they find a bad package it gets delisted. It’s rare for these malicious packages to last more than a few days once they are discovered. Quickly removing packages is the best course of action. But again, the existing supply chain security solutions won’t pick up any of these happenings at this time. When a package is delisted, it just vanishes. How do you know if any of the packages you already installed are a problem? What if your artifact registry just cached a malicious package? It can be difficult to understand if you have a malicious package installed.

How should this work?

How we detect these problems is where things start to get really hard. There will be calls for the packaging ecosystems to lock down their environments, that’s probably a bad idea. The power of open source is how fast and easy it is to collaborate. Putting up walls won’t solve this, it just moves the problem somewhere else, often in a way that hides the real issues.

We have existing databases that track vulnerabilities and bad packages, but they can’t handle this scale today. There are examples of malicious packages listed in OSV and GitHub’s vulnerability database. Other databases like CVE have explicitly stated they don’t want to track this sort of malware. Just knowing where to look and how to catalog these malicious packages isn’t simple, yet it’s an ongoing problem. There have been several instances of malicious packages just this year.

To understand the scale of this data, the CVE project has existed since 1999 and there are about 200,000 IDs total at the end of 2022. Adding 144,000 new IDs would be significant.

At the end of the day, the vulnerability databases are where this data needs to exist. Creating a new way to track malicious packages and expecting everyone to watch it just creates new problems. We are good at finding and fixing vulnerabilities in our software, this is fundamentally the same problem. Malicious packages are no different than vulnerabilities. We also need to keep in mind this will continue to happen.

There are a huge number of tools that exist and parse vulnerability databases, then alert developers. Alerting developers is exactly what these datasets and tools were built for, but none of them are picking up this type of supply chain problem today. If we add this data to the existing data all the pieces can fall into place with minimal disruption.

What can we do right now?

A knee jerk reaction to an event like this is to create constraints on developers in an attempt to only use trusted packages. While that can work, it’s always important to remember that when you create constraints for a person, they become more creative. Using curated open source repositories will need ongoing maintenance. If you just make pulling new packages harder without the ability to quickly add new packages, the developers will find another way.

At the moment there’s no good solution for detecting these packages. The best option is to generate a software bill of materials (SBOM) for all of your software, then look for the list of known bad packages against what’s in the SBOMs. In this particular case even if you have one of these packages in your environment, it will be harmless. But the purpose of this post is to explain the problem so the community can have informed conversations. This is about starting to work together to solve hard problems.

In the future we need to see lists of these known malicious packages cataloged somewhere. It’s boring and difficult work though, so it’s unlikely to get much attention. This is the equivalent of buried utilities that let modern society function. Extremely important, but not something that turns many heads unless it goes terribly wrong.

There’s no way any one group can solve this problem. We will need a community effort. Everyone from the packaging ecosystems, to the vulnerability databases, to the tool manufacturers, and even the security researchers all need to be on the same page. There are efforts underway to help with this. OSV and GitHub allow community contributions. The OpenSSF has a Securing Software Repos working group. The Cloud Security Alliance has the Global Security Database. These are some of the places to find or generate productive and collaborative conversations that can drive progress that hinders use of malicious packages in the software supply chain.