Open Source is Bigger Than You Can Imagine

If we pay attention to the news lately we hear about supply chain security and how it’s the most important topic ever and we need to start doing something right now. But the term “supply chain security” isn’t well defined.  The real challenge we actually have is understanding open source. Open source is in everything now. There is no supply chain problem, there is an understanding open source problem.

Log4Shell was our Keyser Söze moment

There’s a scene in the movie “The Usual Suspects”, where the detective realizes everything he was just told has been a lie. His entire world changed in an instant. It was a plot twist, not even the audience saw coming. Humans love plot twists and surprises in our stories, but not in real life. Log4Shell was a plot twist but in real life. It was not a fun time.

Open source didn’t take over the world overnight. It took decades. It was a silent takeover that only the developers knew about. Until Log4Shell. When Log4Shell happened everyone started looking for Log4j and they found it, everywhere they looked. But while finding Log4j, we also found a lot more open source. And I mean A LOT more. Open source was in everything, both the software acquired from other vendors and the software built in house. Everything from what’s running on our phones to what’s running the toaster. It’s all full of open source software.

Now that we know open source is everywhere, we should start to ask what open source really is. It’s not what we’ve been told. There’s often talk of “the community”, but there is no community. Open source is a vast collection of independent projects. Some of these projects are worked on by Fortune 100 companies, some by scrappy startups, and some are just a person in their basement who only can work on their project from 9:15pm to 10:05pm every other Wednesday. And open source is big. We can steal a quote from Douglas Adams’ Hitchhiker’s Guide to the Galaxy to properly capture the magnitude of open source:

Space Open source … is big. Really big. You just won’t believe how vastly hugely mind-bogglingly big it is. I mean, you may think it’s a long way down the road to the chemist, but that’s just peanuts to space open source.”

The challenge for something like open source isn’t just claiming it’s big. We all know it’s big. The challenge is showing how mind-bogglingly big it is. Imagine the biggest thing we can, open source is bigger.

Let’s do some homework.

The size of NPM

For the rest of this post we will focus on NPM, the Node Package Manager. NPM is how we would install dependencies for our Node.js applications. The reason this data was picked is it’s very easy to work with, it has good public data, and it’s the largest package ecosystem in the world today.

It should be said, NPM isn’t special in the context of the below data, if we compare these graphs to Python’s PyPI for example, we see very similar shapes, just not as large. In the future we may explore other packaging ecosystems, but fundamentally it’s going to look a lot like this. All of this data was generated using the scripts stored in GitHub, the repo is aptly named npm-analysis.

Let’s start with the sheer number of NPM package releases over time. It’s a very impressive and beautiful graph.

This is an incredible number of packages. At the time of capturing data, there were 32,600,904 packages. There are of course far more now, just look at the growth. By packages, we mean every version of every package released. There are about 2.3 million unique packages, but when we take those packages times all the released versions, we end up with over 32 million.

It’s hard to imagine how big this really is. There was a proposal recently that suggested we could try to conduct a security review on 10,000 open source projects per year. This is already a number that would need thousands of people to accomplish. But even at 10,000 projects per year, it would take more than 3,000 years to get through just npm at its current size. Ignoring the fact that we’ve been adding more than 1 million packages per year, so doing some math … we will be done … never, the answer is never.

The people

As humans, we love to start creating reasons for this sort of growth. Maybe it’s all malicious packages, or spammers using NPM to sell vitamins. “It’s probably big projects publishing lots of little packages”, or “have you ever seen the amount of stuff in the React framework”? It turns out almost all of NPM is single maintainer projects. The graph below shows the number of maintainers for a given project. We see there are more than 18 million releases that list a single maintainer in their package.json file. That’s over half of all NPM releases ever having just one person maintaining them.

This graph shows a ridiculous amount of NPM is one person, or a small team. If we look at the graph on a logarithmic scale we can see what the larger projects look like, the linear graph is sort of useless because of the sheer number of one person projects.

These graphs contain duplicate entries when it comes to maintainers. There are many maintainers who have more than one project, it’s quite common in fact. If we filter the graph by the number of unique maintainers, we see this chart.

It’s a lot less maintainers, but we see the data is still dominated by single maintainer projects. In this data set we see 727,986 unique NPM maintainers. This is an amazing number of developers. This a true testament to the power and reach of open source.

New packages

Now that we see there are a lot of people doing an enormous amount of work. Let’s talk about how things are growing. We mentioned earlier that more than one million packages and versions are being added per year.

If this continues we’re going to be adding more than one million new packages per month soon.

Now, it should be noted this graph isn’t new packages, it’s new releases, so if an existing project releases five updates, it shows up in this graph all five times.

If we only look at brand new packages being added, we get the below graph. A moving average was used here because this graph is a bit jumpy otherwise. New projects don’t get added very consistently.

This shows us we’re adding less than 500,000 new projects per year, which is way better than one million! But still a lot more than 10,000.

The downloads

We unfortunately don’t have an impressive graph of downloads to show. The most npm data we can get is for one year of download statistics and it’s a single number, it’s not spread out by date.

In the last year, there were 130,046,251,733,027 NPM downloads. That feels like a fake number, 15 digits. That’s 130 TRILLION downloads. Now, that’s not spread out very evenly. The median downloads of a package are only 217. The bottom 5% are 71 downloads, and the top 5% are more than 16,000 downloads. It’s pretty clear the number of downloads are very uneven. The most popular projects are getting most of the downloads.

Here is a graph of the top 100 projects by downloads. It follows a very common power distribution curve.

We probably can’t imagine what this download data over all time must look like. It’s almost certainly even more mind boggling than the current data set.

Most of these don’t REALLY matter

Nobody would argue if someone said that the vast majority of NPM packages will never see widespread use. Using the download data we can show 95% of NPM packages aren’t widely used. But the sheer scale is what’s important. 5% of NPM is still more than 100,000 unique packages. That’s a massive number, even at our 10,000 packages a year review, that’s more than ten years of work and this is just NPM.

If we filter our number of maintainers graph to only include the top 5% of downloaded packages, it basically looks the same, just with smaller numbers

Every way we look at this data, these trends seem to hold.

Now that we know how incredibly huge this all really is, we can start to talk about this supposed supply chain and what comes next.

What we can actually do about this

First, don’t panic. Then the most important thing we can do is to understand the problem. Open source is already too big to manage and growing faster than we can keep up. It is important to have realistic expectations. Before now many of us didn’t know how huge NPM was. And that’s just one ecosystem. There is a lot more open source out there in the wild.

There’s another quote from Douglas Adams’ Hitchhiker’s Guide to the Galaxy that seems appropriate right now:

‘“I thought,” he said, “that if the world was going to end we were meant to lie down or put a paper bag over our head or something.”

“If you like, yes,” said Ford.

“Will that help?” asked the barman.

“No,” said Ford and gave him a friendly smile.”’

Open source isn’t a force we command, it is a resource for us to use. Open source also isn’t one thing, it’s a collection of individual projects. Open source is more like a natural resource. A recent report from the Atlantic Council titled Avoiding the success trap: Toward policy for open-source software as infrastructure compares open source to water. It’s an apt analogy on many levels, especially when we realize most of the surface of the planet is covered in water.

The first step to fixing a problem is understanding it. It’s hard to wrap our heads around just how huge open source is, humans are bad at exponential growth. We can’t have an honest conversation about the challenges of using open source without first understanding how big and fast it really is. The intent of this article isn’t to suggest open source is broken, or bad, or should be avoided. It’s to set the stage to understand what our challenge looks like.

The importance and overall size of open source will only grow as we move forward. Trying to use the ideas of the past can’t work at this scale. We need new tools, ideas, and processes to face our new software challenges. There are many people, companies, and organizations working on this but not always with a grasp of the true scale of open source. We can and should help existing projects, but the easiest first step is to understand how big our open source use is. Do we know what open source we’re using?

Anchore is working on this problem every day. Come help with our open source projects Syft and Grype, or have a chat with us about our enterprise solution.

Josh Bressers
Josh Bressers is vice president of security at Anchore where he guides security feature development for the company’s commercial and open source solutions. He serves on the Open Source Security Foundation technical advisory council and is a co-founder of the Global Security Database project, which is a Cloud Security Alliance working group that is defining the future of security vulnerability identifiers.

Breaking Down NIST SSDF: Spotlight on PW.6 Compilers and Interpreter Security

In this part of the long-running series breaking down NIST Secure Software Development Framework (SSDF), also known as the standard NIST 800-218, we are going to discuss PW 6. This control is broken into two parts, PW.6.1 and PW.6.2. These two controls are related and defined as:

PW.6.1: Use compiler, interpreter, and build tools that offer features to improve executable security.
PW.6.2: Determine which compiler, interpreter, and build tool features should be used and how each should be configured, then implement and use the approved configurations.

We’re going to lump both of these together for the purpose of this post. It doesn’t make sense to split these two controls apart when we are reviewing what this actually means, but there will be two posts for PW.6, this is part one. Let’s start by looking at the examples for some hints on what the standard is looking for:

Example 1: Use up-to-date versions of compiler, interpreter, and build tools.
Example 2: Follow change management processes when deploying or updating compiler, interpreter, and build tools, and audit all unexpected changes to tools.
Example 3: Regularly validate the authenticity and integrity of compiler, interpreter, and build tools. See PO.3.

Example 1: Enable compiler features that produce warnings for poorly secured code during the compilation process.
Example 2: Implement the “clean build” concept, where all compiler warnings are treated as errors and eliminated except those determined to be false positives or irrelevant.
Example 3: Perform all builds in a dedicated, highly controlled build environment.
Example 4: Enable compiler features that randomize or obfuscate execution characteristics, such as memory location usage, that would otherwise be predictable and thus potentially exploitable.
Example 5: Test to ensure that the features are working as expected and are not inadvertently causing any operational issues or other problems.
Example 6: Continuously verify that the approved configurations are being used.
Example 7: Make the approved tool configurations available as configuration-as-code so developers can readily use them.

If we review the references, you will find there’s a massive swath of suggestions. Everything from code signing to obfuscating binaries, to handling compiler warnings, to threat modeling. The net was cast wide on this one. Every environment is different. Every project or product uses its own technology. There’s no way to “one size fits all” this control. This is one of the challenges that has made compliance for developers so very difficult in the past. We have to determine how this applies to our environment, and the way we apply this finding will be drastically different than the way someone else applies it.

We’re going to split this topic along the lines of build environments and compiler/interpreter security. For this blog, we are going to focus on using modern protection technology, specifically in compiler security and runtimes. Of course, you will have to review the guidance and understand what makes sense for your environment, everything we discuss here is for example purposes only.

Compiler security
When we think about the security of applications, we tend to focus on the code itself. Security vulnerabilities are the result of attackers causing unexpected behavior in the code. Printing an unescaped string, adding or subtracting a very large integer. Maybe even getting the application to open a file it shouldn’t. We’ve all heard about memory safety problems and how hard they are to avoid in certain languages. C and C++ are legendary for their lack of memory protection. Our intent should be to write code that doesn’t have security vulnerabilities. The NSA and even Consumer Reports have recently come out against using memory unsafe languages. We can also lean on technology to help reduce the severity of memory safety bugs when we can’t abandon memory unsafe languages just yet, maybe never. There’s still a lot of COBOL out there, after all.

While attackers can exploit some bugs in ways that cause unexpected behavior, there are technologies, especially in compilers, that can lower the severity or even eliminate the danger of certain bug classes. For example, stack buffer overflows in C used to be a huge problem, then we created stack canaries which has reduced the severity of these bugs substantially.

Every compiler is different, every operating system is different, and every application is different, so all of this has to be decided for each individual application. For the purposes of simplicity, we will use gcc to show how some of these technologies work and how to enable them. The Debian Wiki Hardening page has a huge amount of detail, we’ll just cover some of the quick easy things.

user@debian:~/test$ gcc -o overflow test-overflow.c
root@debian:~/test$ ./overflow
Segmentation fault
user@debian:~/test$ gcc -fstack-protector -o overflow test-overflow.c
user@debian:~/test$ ./overflow
*** stack smashing detected ***: terminated

In the above example, we can see how the compiler can issue a warning instead of crashing if we enable the gcc stack protector feature.

Most of these protections will only reduce the severity of a very narrow group of bugs. These languages still have many other problems and moving away from a memory unsafe language is the best path forward. Not everyone can move to a memory safe language, so compiler flags can help.

Compiler warnings are bugs
There was once a time when compiler warnings were ignored because they were just warnings. It didn’t really matter, or so we thought. Compiler warnings were just suggestions from the compiler, if there’s time later those warnings can be fixed. Except there is never time later. It turns out that sometimes those warnings are really important. They can be hints that a serious bug is waiting to be exploited. It’s hard to know which warnings are harmless and which are serious, so the current best practice is to fix them all to minimize vulnerabilities in your code.

If we use our example code, we can see:

user@debian:~/test$ gcc -o overflow test-overflow.c
test-overflow.c: In function 'function':
test-overflow.c:6:2: warning: '__builtin_memcpy' writing 24 bytes into a region of size 9 overflows the destination [-Wstringop-overflow=]
6 | strcpy(s, "This string is too long");
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

We see a warning telling us our string is too long. The build doesn’t fail, but that’s not a warning you should ignore.

Interpreted languages

The suggestion in the SSDF is for interpreted languages is to use the latest interpreter. These languages are memory safe, but they are still vulnerable to logic bugs. Many of the interpreters are written in C or C++, so you could double check they are built with the various compiler hardening features enabled.

There aren’t often protections built into the interpreter itself. This goes back to the wide swath of guidance for this control. Programming languages have an infinite number of possible use cases, the problem set is too large to accurately protect. Memory safety is a very narrow set of problems that we still can’t get right. General purpose programming is an infinitely wide set of problems.

There were some attempts to secure interpreted languages in the past, but the hardening proved to be too easy to break to rely on as a security feature. PHP and Ruby used to have safe mode, but it turned out they weren’t actually safe. Compiler and interpreter protections are hard to make effective in meaningful ways.

The best way to secure interpreted languages is to run code in sandboxes using things like virtualization and containers. Such guidance won’t be covered in this post. In fact SSDF doesn’t have guidance on how to run applications securely, SSDF focuses on development. There is plenty of other guidance on that, we’ll make sure to cover those once the SSDF series is complete.

This complexity and difficulty are almost certainly why the SSDF guidance is to just run the latest interpreter. The latest interpreter version will ensure any bugs, security or otherwise are fixed.

Wrapping up
As we can see from this post, optimizing compiler and runtime security isn’t a simple task. It’s one of those things can can feel easy, but it’s really not. The devil is in the details. The only real guidance here is to figure out what works best in your environment and go with that.

If you missed the first post in this series, you can view it here. Next time we will discuss build systems. Build systems have been a popular topic over the last few years as they have been targets for attackers. Luckily for us there is some solid guidance we can draw upon for securing a build system.

Josh Bressers
Josh Bressers is vice president of security at Anchore where he guides security feature development for the company’s commercial and open source solutions. He serves on the Open Source Security Foundation technical advisory council and is a co-founder of the Global Security Database project, which is a Cloud Security Alliance working group that is defining the future of security vulnerability identifiers.

Anchore Adds Support for NIST 800-218 SSDF

This blog post has been archived and replaced by the supporting pillar page that can be found here:

The blog post is meant to remain “public” so that it will continue to show on the /blog feed. This will help discoverability for people browsing the blog and potentially help SEO. If it is clicked on it will automatically redirect to the pillar page.

Why is this massive supply chain attack being ignored?

If you read security news, you may have heard about a recent attack that resulted in 144,000, that’s one hundred and forty four THOUSAND packages being uploaded to NuGet, PyPI, and NPM. That’s a mind boggling number, it seems like with all the supply chain news it would be all anyone is talking about. Except it seems to have flared up quickly then died right down.

The discovery of this attack was made by Checkmarx. Essentially what happened was attackers created a number of accounts in the NuGet, PyPI, and NPM packaging ecosystems. Those fake accounts then uploaded a huge number of packages that linked to phishing sites in the package description. The intention seems to have been to improve search ranking of those sites as well as track users that enter sensitive details.

Supply chain security is an overused term

This concept called “supply chain security” is a term that  is very overused these days. What we tend to call software supply chain security is many other things that are sometimes hard to describe. Reproducible builds, attestation, source code control, slim containers are a few examples. An attack like this can’t be solved with the current toolset we have which is almost certainly why it’s not getting the attention it deserves. It’s easy to talk about something an exciting project or startup can help with. It’s much harder to understand and fix systemic problems.

Why this one is different

To understand why this is so different and hard, let’s break this problem down into its pieces. The first part of the attack is the packaging ecosystems. The accounts in question were valid, they weren’t hacked or trying to impersonate someone else. The various packaging ecosystems have low barriers to entry, this is why we all use them and why they are so incredible. In these packaging ecosystems we see new accounts and packages all the time. In fact there are thousands of new packages added every day. There’s nothing unexpected if an attacker creates many accounts, no alarm bells would be expected. Once an account exists, it can start adding packages.

The second piece of this attack is someone has to download the package in question. It should be pointed out that in this particular instance the actual package content isn’t malicious, but it’s safe to say nobody wants any of these packages in their application. The volume of bad packages are important for this part of the attack. Developers will accidentally typo package names, or they might stumble on a package thinking it solves whatever their problem is. Or they might just have bad luck and install something by accident. Again, this is working within the constraints of the system. So far nothing happening is outside of everyday operations.

Then the last part of this attack is how it gets cleaned up. The packaging ecosystems have stellar security teams working behind the scenes. As soon as they find a bad package it gets delisted. It’s rare for these malicious packages to last more than a few days once they are discovered. Quickly removing packages is the best course of action. But again, the existing supply chain security solutions won’t pick up any of these happenings at this time. When a package is delisted, it just vanishes. How do you know if any of the packages you already installed are a problem? What if your artifact registry just cached a malicious package? It can be difficult to understand if you have a malicious package installed.

How should this work?

How we detect these problems is where things start to get really hard. There will be calls for the packaging ecosystems to lock down their environments, that’s probably a bad idea. The power of open source is how fast and easy it is to collaborate. Putting up walls won’t solve this, it just moves the problem somewhere else, often in a way that hides the real issues.

We have existing databases that track vulnerabilities and bad packages, but they can’t handle this scale today. There are examples of malicious packages listed in OSV and GitHub’s vulnerability database. Other databases like CVE have explicitly stated they don’t want to track this sort of malware. Just knowing where to look and how to catalog these malicious packages isn’t simple, yet it’s an ongoing problem. There have been several instances of malicious packages just this year.

To understand the scale of this data, the CVE project has existed since 1999 and there are about 200,000 IDs total at the end of 2022. Adding 144,000 new IDs would be significant.

At the end of the day, the vulnerability databases are where this data needs to exist. Creating a new way to track malicious packages and expecting everyone to watch it just creates new problems. We are good at finding and fixing vulnerabilities in our software, this is fundamentally the same problem. Malicious packages are no different than vulnerabilities. We also need to keep in mind this will continue to happen.

There are a huge number of tools that exist and parse vulnerability databases, then alert developers. Alerting developers is exactly what these datasets and tools were built for, but none of them are picking up this type of supply chain problem today. If we add this data to the existing data all the pieces can fall into place with minimal disruption.

What can we do right now?

A knee jerk reaction to an event like this is to create constraints on developers in an attempt to only use trusted packages. While that can work, it’s always important to remember that when you create constraints for a person, they become more creative. Using curated open source repositories will need ongoing maintenance. If you just make pulling new packages harder without the ability to quickly add new packages, the developers will find another way.

At the moment there’s no good solution for detecting these packages. The best option is to generate a software bill of materials (SBOM) for all of your software, then look for the list of known bad packages against what’s in the SBOMs. In this particular case even if you have one of these packages in your environment, it will be harmless. But the purpose of this post is to explain the problem so the community can have informed conversations. This is about starting to work together to solve hard problems.

In the future we need to see lists of these known malicious packages cataloged somewhere. It’s boring and difficult work though, so it’s unlikely to get much attention. This is the equivalent of buried utilities that let modern society function. Extremely important, but not something that turns many heads unless it goes terribly wrong.

There’s no way any one group can solve this problem. We will need a community effort. Everyone from the packaging ecosystems, to the vulnerability databases, to the tool manufacturers, and even the security researchers all need to be on the same page. There are efforts underway to help with this. OSV and GitHub allow community contributions. The OpenSSF has a Securing Software Repos working group. The Cloud Security Alliance has the Global Security Database. These are some of the places to find or generate productive and collaborative conversations that can drive progress that hinders use of malicious packages in the software supply chain.

Josh Bressers
Josh Bressers is vice president of security at Anchore where he guides security feature development for the company’s commercial and open source solutions. He serves on the Open Source Security Foundation technical advisory council and is a co-founder of the Global Security Database project, which is a Cloud Security Alliance working group that is defining the future of security vulnerability identifiers.

Breaking Down NIST SSDF: Spotlight on PS.3.2

This is the second post in a long running series to explain the details of the NIST Secure Software Development Framework (SSDF), also known as the standard NIST 800-218. You can find more details about the SSDF on the NIST website.

Today we’re going to cover control PS.3.2 which is defined as

PS.3.2: Collect, safeguard, maintain, and share provenance data for all components of each software release (e.g., in a software bill of materials [SBOM]).

This one sounds really simple, we just need an SBOM, right? But nothing is ever that easy, especially in the world of cybersecurity compliance.

Let’s break this down into multiple parts. Nearly every word in this framework is important for a different reason. The short explanation is we need data that describes our software release. Then we need to safely store that data. It sounds simple, but like many things in our modern world of technology, the devil is in the details.

Start with the SBOM

Let’s start with an SBOM. Yes, you need an SBOM. That’s the provenance data. There are many ways to store release data, but the current expectation across the industry is that SBOMs will be the primary document. The intent is we have the ability to receive and give out SBOMs. For the rest of this post we will put a focus on how to meet this control using an SBOM and SBOM management.

It doesn’t matter how fast or slow the release process is, every time you ship or deploy software, you need an SBOM. For most of us the days of putting out a release every few years are long gone, almost everyone is releasing software at a breakneck pace. Humans cannot be a part of this process, because humans are slow and make mistakes. To solve the challenge of SBOM automation, we need, well, automation. SBOMs should be generated automatically during stages of the development process. There are many different ways to accomplish this, here at Anchore we’re pretty partial to the Syft SBOM generator. We will be using Syft in our examples,  but there are many ways to create this data.

Breaking it Down

Creating an SBOM is the easiest step of meeting this control. If we have a container we need an SBOM for, let’s use the Grype container for our example. It can be as easy as running

syft -o spdx-json anchore/grype:latest

and we have an SBOM of the Grype container image in the SPDX format. In this example we generated an SBOM from a container in the Docker registry, but there’s no reason to wait for a container to be pushed to the registry to generate an SBOM. You can add Syft into the build process. For  example, you can see a Syft GitHub action that does this step  automatically on every build. There are even ways to include the SBOM in the registry metadata now.

Once we have our SBOMs generated, keep in mind the ‘s’ is important, you are going to have a lot of SBOMs. Some applications will have one, some will have multiple. For example if you ship three container images for the application you will end up with at least three SBOMs. This is why the word “collect” exists in the control. Collecting all the SBOMs for a release is important. Collecting really just means making sure you can find the SBOMs that were automatically generated. In our case, we would collect and store the SBOMs in Anchore Enterprise. It’s a tool that does a great job of keeping track of a lot of SBOMs. More details can be found on the Anchore Enterprise website.

Protect the Data Integrity

After the SBOMs are collected, we have to safeguard the SBOMs contents. The word safeguard isn’t very clear. One of the examples states “​​Example 3: Protect the integrity of provenance data, and provide a way for recipients to verify provenance data integrity.” This seems pretty straightforward. It would be dishonest to make the claim “just sign the SBOM and you’re done” because digital signatures are still hard.

It’s probably best to use whatever mechanisms you use to safeguard your application artifacts to also safeguard the SBOM. This could be digital signatures. It could be a read only bucket storage over HTTPS. It could be checksum data available out of band. Maybe just a system that provides audit logs of when data changes. There’s no single way to do this and unfortunately there’s no good advice that can be handed out for this step. Be wary of anyone claiming this is a solved problem today. The smart folks working on Syft have some ideas on how to deal with this.

We also are expected to maintain the SBOMs we are now collecting and safeguarding. This one seems easy as in theory an SBOM is a static document. I think one could interpret this in several ways. NIST has a glossary, it doesn’t define maintain, but does define maintenance as “Any act that either prevents the failure or malfunction of equipment or restores its operating capability.” It’s safe to say the intent of this is to make sure the SBOMs are available now and into perpetuity. In a fast moving industry it’s easy to forget that in two or more years from now the data in an SBOM could be needed by customers, auditors, or even forensic investigators. But on the other side of that coin, it’s just as possible that in a few years what passes as an SBOM today won’t be considered an SBOM. Maintaining SBOMs should not be disregarded as unimportant or simple. You should find an SBOM management system that can store and convert SBOM formats as a way to future proof the documents.

There are new products coming to market that can help with this maintain stage. They are being touted as SBOM management platforms. Anchore Enterprise is a product that does this. There are also open source alternatives such as Dependency Track. There will no doubt be even more of these tools into the future as SBOM use increases and the market matures.

Lastly, and possibly most importantly, we have to share the SBOMs.

One aspect of SBOMs that keeps coming up is an idea that every SBOM needs to be available to the public. This is specifically covered by CISA in their SBOM FAQ. It comes up on a pretty regular basis and is a point of confusion. You get to decide who can access an SBOM. You can only distribute an SBOM to your customers, you can distribute them to the public, you can keep them internal only. Today there isn’t a well defined way to distribute SBOM data. Many ecosystems have their own ways of including SBOM data. For example in the world of containers, registries are putting them in metadata. Even GoReleaser lets you create SBOMs. Depending how your product or service is accessed, there may not be a simple answer to this question.

One solution could be having customers email support asking for a specific SBOM. Maybe you have the SBOM available in the same place customers download your application or login to your service. You can even just package the SBOM up into the application, like a file in a zip archive. Once again, the guidance does not specifically tell us how to accomplish this.

Pro Tip: Make sure you include instructions for anyone downloading the SBOM how to verify the integrity of your application and your SBOM. PS3.1 talks about how to secure the integrity of your application and we’ll cover that in a future blog post.

Final Thoughts

This is one control out of 42. It’s important to remember this is a journey, it’s not a one and done sort of event. We have many more blog posts to share on this topic, and a lot of SBOMs to  generate. Like any epic journey, there’s not one right way to get to the destination. 

Everyone has to figure out how they want to meet each NIST SSDF control, ideally in a way that is valuable to the organization as well as customers. Processes that create unnecessary burden will always end up worked around, and processes integrated into existing workflows are far less cumbersome. Let’s aim high and produce verifiable components that not only meet NIST compliance, but also ease the process for downstream consumers.

To sum it all up, you need to create SBOMs for every release, safeguard them the same way you safeguard your application, store them in a future proof manner, and be able to share the SBOMs. There’s no one way to do any of this, if you have any questions subscribe to our newsletter for monthly updates on software supply chain security insights and trends.

Josh Bressers
Josh Bressers is vice president of security at Anchore where he guides security feature development for the company’s commercial and open source solutions. He serves on the Open Source Security Foundation technical advisory council and is a co-founder of the Global Security Database project, which is a Cloud Security Alliance working group that is defining the future of security vulnerability identifiers.

Ask Me Anything: SBOMs and the Executive Order

The software supply chain is under intense pressure and scrutiny with the rise of malicious attacks that target open source software and components. Over the past year the industry has received guidance from the government with the Executive Order on Improving the Nation’s Cybersecurity and the most recent M-22-18 Enhancing the Security of the Software Supply Chain through Secure Software Development Practices. Now, perhaps more than ever before, it’s critical to have a firm understanding of the integrity of your software supply chain to ensure a strong security posture. This webinar will provide you with open access to a panel of Anchore experts who can discuss the role of a software bill of material (SBOM) and answer questions about how to understand and tackle government software supply chain requirements.

An Introduction to the Secure Software Development Framework

It’s very likely you’ve heard of a new software supply chain memo from the US White House that came out in September 2022. The content of the memo has been discussed at length by others. The actual memo is quite short and easy to read, you wouldn’t regret just reading it yourself.

The very quick summary of this document is that everyone working with the US Government will need to start following NIST 800-218, also known as the NIST Secure Software Development Framework, or SSDF. This is a good opportunity to talk about how we can start to do something with SSDF today. For the rest of this post we’re going to review the actual SSDF standard and start creating a plan of tackling what’s in it. The memo isn’t the interesting part, SSDF is.

This is going to be the first of many, many blog posts as there’s a lot to cover in the SSDF. Some of the controls are dealt with by policy. Some are configuration management, some are even software architecting. Depending on each control, there will be many different ways to meet the requirements. No one way is right, but there are solutions that are easier than others. This series will put extra emphasis on the portions of SSDF that deal with SBOM specifically, but we are not going to ignore the other parts.

An Introduction to the Secure Software Development Framework (SSDF)

If this is your first time trying to comply with a NIST standard, keep in mind this will be a marathon. Nobody starts following the entire compliance standard on day one. Make sure to set expectations with yourself and your organization appropriately. Complying with a standard will often take months. There’s also no end state, these standards need to be thought about as continuous projects, not one and done.

If you’re looking to start this journey I would suggest you download a spreadsheet NIST has put together that details the controls and standards for SSDF. It looks a little scary the first time you load it up, but it’s really not that bad. There are 42 controls. That’s actually a REALLY small number as far as NIST standards go. Usually you will see hundreds or even thousands.

An Overview of the NIST SSDF Spreadsheet

There are 4 columns: Practices, Tasks, Notional Implementation Examples, References

If we break it down further we see there are 19 practices and 42 Tasks. While this all can be intimidating, we can work with 19 practices and 42 tasks. The practices are the logical groupings of tasks, and the tasks are the actual controls we have to meet. The SSDF document covers all this in greater detail, but the spreadsheet makes everything more approachable and easy to group together.

The Examples Column

The examples column is where the spreadsheet really shines. The examples are how we can better understand the intent of a given control. Every control has multiple examples and they are written in a way anyone can understand. The idea here isn’t to force a rigid policy on anyone, but to show there are many ways to accomplish these tasks. Most of us learn better from examples than we do from technical control text, so be sure to refer to the examples often.

The References Section

The references sections are scary looking. Those are a lot of references and anyone who tries to read them all will be stuck for weeks or months. It’s OK though, they aren’t something you have to actively read, it’s to help give us additional guidance if something isn’t clear. There’s already a lot of security guidance out there, it can be easier to cross reference work that already exists than it is to make up all new content. This is how you can get clarifying guidance on the tasks. It’s also possible you already are following one or more of these standards which means you’ve already started your SSDF journey.

The Tasks

Every task has a certain theme. There’s no product you can buy that will solve all of these requirements. Some themes can only be met with policy. Some are secure software development processes. Most will have multiple ways to meet them. Some can be met with commercial tools, some can be met with open source tools.

Interpreting the Requirements

Let’s cover a very brief example (we will cover this in far more detail in a future blog post). PO 1.3. 3rd party requirements. The text of this reads

PO.1.3: Communicate requirements to all third parties who will provide commercial software components to the organization for reuse by the organization’s own software. [Formerly PW.3.1]

This requirement revolves around communicating your own requirements to your suppliers. But today the definition of supplier isn’t always obvious. You could be working with a company. But what if you’re working with open source? What if the company you’re working with is using open source? The important part of this is better explained in the examples: Example 3: Require third parties to attest that their software complies with the organization’s security requirements.

It’s easier to understand this in the context of having your supplier prove they are in compliance with your requirements. Proving compliance can be difficult in the best situations. Keep in mind you can’t just do this in one step. You probably first just need to know what you have (SBOM is a great way to do this.) Once you know what you have, you can start to define expectations for others. And once you have expectations and an SBOM you can hand out an attestation.

One of the references for this one is NIST 800-160. If we look at section 3.1.1, there are multiple pages that explain the expectations. There isn’t a simple solution as you will see if you read through NIST 800-160. This is an instance where a combination of policy, technology, and process will all come together to ensure the components used are held to a certain standard.

This is a lot to try to take in all at once, so we should think about how to break this down. Many of us already have existing components. How we tackle this with existing components is not the same approach we would take with a brand new application security project. One way to think about this is you will first need an inventory of your components before you can even try to create expectations for your suppliers.

We could go on explaining how to meet this control, but for now let’s just leave this discussion here. The intent was to show what this challenge looks like, not to try to solve it today. We will revisit this in another blog post when we can dive deep into the requirements and some ideas on how to meet the control requirements, and even define what those requirements are!

Your Next Steps

Make sure you check back for the next post in this series where we will take a deep dive into every control specified by the SSDF. New compliance requirements are a challenge, but they exist to help us improve what we are already doing in terms of secure software development practices. Securing the software supply chain is not just a popular topic, it’s a real challenge we all have to meet now. It’s easy to talk about securing the software supply chain, it’s a lot of hard work to actually secure it. But luckily for us there is more information and examples to build off of than ever before. Open source isn’t about code, it’s about sharing information and building communities. Anchore has several ways to help you on this journey. You can contact us, join our community Slack, and check out our open source projects: Syft and Grype.

Josh Bressers
Josh Bressers is vice president of security at Anchore where he guides security feature development for the company’s commercial and open source solutions. He serves on the Open Source Security Foundation technical advisory council and is a co-founder of the Global Security Database project, which is a Cloud Security Alliance working group that is defining the future of security vulnerability identifiers.

NSA Securing the supply chain for developers: the past, present, and future of supply chain security

Last week the NSA, CISA, and ODNI released a guide that lays out supply chain security, but with a focus on developers. This was a welcome break from much of the existing guidance we have seen which mostly focuses on deployment and integration rather than the software developers. The software supply chain is a large space, and that space includes developers.

The guide is very consumable. It’s short and written in a way anyone can understand. The audience on this one is not compliance professionals. They also provide fantastic references. Re-explaining the document isn’t needed, just go read it.

However, even though the guide is very readable, it could be considered immature compared to much of the other guidance we have seen come from the government recently. This immaturity of a developer focused supply chain guide came through likely because this is in fact an immature space. Developer compliance has never been successful outside of some highly regulated industries and this guide reminds us why. Much of the guidance presented has themes of the old heavy handed way of trying to do security, while also attempting to incorporate some new and interesting concepts being pioneered by groups such as the Open Source Security Foundation (OpenSSF).

For example, there is guidance being presented that suggests developer systems not be connected to the Internet. This was the sort of guidance that was common a decade ago, but no developers could imagine trying to operate a development environment without Internet access now. This is a non-starter in most organizations. The old way of security was to create heavy handed rules developers would find ways to work around. The new way is to empower developers while avoiding catastrophic mistakes.

But next to outdated guidance, we see modern guidance such as using Supply chain Levels for Software Artifacts, or SLSA. SLSA is a series of levels that can be attained when creating software to help ensure integrity of the built artifacts. SLSA is an open source project that is part of the OpenSSF project that is working to create controls to help secure our software artifacts.

If we look at SLSA Level 1 (there are 4 levels), it’s clearly the first step in a journey. All we need to do for SLSA level 1 is keep metadata about how an artifact was built and what is in it. Many of us are already doing that today! The levels then get increasingly more structured and strict until we have a build system that cannot connect to the internet, is version controlled, and signs artifacts. This gradual progress makes SLSA very approachable.

There are also modern suggestions that are very bleeding edge and aren’t quite ready yet. Reproducible builds are mentioned, but there is lack of actionable guidance on how to accomplish this. Reproducible builds are an idea where you can build the source code for a project on two different systems and get the exact same output, bit for bit. Today everyone doing reproducible builds does so from enormous efforts, not because the build systems allow it. It’s not realistic guidance for the general public yet.

The guide expands the current integrator guidance of SBOM and verifying components is an important point. It seems to be pretty accepted at this point that generating and consuming SBOMs are table stakes in the software world. The guide reflects this new reality.

Overall, this guide has an enormous amount of advice contained in it. Nobody could do all of this even if they wanted to, don’t feel like this is an all or none effort. This is a great starting point for developer supply chain security. We need to better define the guidance we can give to developers to secure the supply chain. This guide is the first step, the first draft is never perfect, but the first draft is where the journey begins.

Understand what you are doing today, figure out what you can easily do tomorrow, and plan for some of the big things well into the future. And most importantly, ignore the guidance that doesn’t fit into your environment. When guidance doesn’t match with what you’re doing it doesn’t mean you’re doing it wrong. Sometimes the guidance needs to be adjusted. The world often changes faster than compliance does.

The most important takeaway isn’t to view this guide as an end state. This guide is the start of something much bigger. We have to start somewhere, and developer supply chain security starts here. Both how we protect the software supply chain and how we create guidance are part of this journey. As we grow and evolve our supply chain security, we will grow and evolve the guidance and best practices.

3 Myths of Open Source Software Risk and the One Nobody Is Discussing

Open source software is being vilified once again and, in some circles, even considered a national security threat. Open source software risk has been a recurring theme: First it was classified as dangerous because anyone could work on it and then it was called insecure because nobody was in charge. After that, the concern was that open source licenses were risky because they would require you to make your entire product open source.

Let’s consider where open source stands today. It’s running at minimum 80% of the world. Probably more. Some of the most mission-critical applications and services on the planet (and on Mars) are open source. The reality is, open source software isn’t inherently more risky than anything else. It’s simply misunderstood, so it’s easy to pick on.

Myth 1: Open source software is a risk because it isn’t secure

Open source software may not be as risky as you have been led to believe, but that doesn’t mean it gets a free pass either.

The most recent and top-of-mind example is the Log4Shell vulnerability in Log4j. It’s easy to put the blame on open source, but it’s lack of proper insight into our infrastructure that is the fundamental issue.

The question, “Are we running Log4j?” took many of us weeks to answer when we needed that answer in a few minutes. The key to managing our software risk (and that’s all software, not just open source) is to have the ability to know what is running and where it’s running. This is the literal purpose for a software bill of materials (SBOM).

The foundation for managing open source risk begins with knowing what we have in our software supply chain. Any software can be a potential risk if you don’t know you’re running it. You should be generating and receiving an SBOM for every piece of software used and have the capability to store and search the data. Not knowing what you’re running in your software supply chain is a far greater risk than actually running it.

The reality is that open source software is just software. It’s when we do a poor job of incorporating it into our products, deploying it, and tracking it that creates this mythic “security risk” we often hear about.

Myth 2: Open source software is a risk because it isn’t high quality

It was easier a decade ago to claim that open source software was inferior because there wasn’t a lot of open source in use. Today too much of the world runs on top of open source software to make the claim that it is low quality — the idea is simply laughable.

The real purpose behind the message that open source software is not suitable for enterprise use — and which you’ll often hear from legacy software vendors — is that open source software is inferior to commercially developed software.

In actuality, we’re not in a place to measure the quality of any of our software. While work is ongoing to fill this need, your best option today is to find the open source software that solves your problem and then make sure that it is up to date and has no major bugs that can leave your software supply chain susceptible to vulnerabilities.

Myth 3: Open source software is a risk because you can’t trust the people writing it

Myth 3 is loosely tied to the first myth that open source software is not secure. There are efforts to measure open source quality, which is a noble cause. Not all open source is created equally. It’s a common misbelief that open source projects with only one maintainer are of lower quality (see myth 2) and you can’t trust the people who build them.

There are plenty of projects in wide use where nobody really knows who is working on them. It’s a GitHub ID and that’s about it. So it’s possible the maintainer is an adversary. It’s also possible the intern that your endpoint vendor just hired is an adversary. The only difference is that in the open source world, we can at least figure it out.

Although there are open source projects that are nefarious, there are also many people working to uncover the malicious activity. They include a wide range of individuals from end users pointing out strange behavior to researchers scanning repositories and endpoint teams looking for active threats. The global community is a mighty power when it turns its attention to finding malicious open source software.

Again, open source software risk is less about trust than it is about having insight into what we are using and how we are using it. Trying to find malicious code is not realistic for many of us, but when it does get found, we need the ability to quickly pinpoint it in our software and remove it.

The true risk of open source software

In an era where the use of open source software is only increasing, the true risk in using open source — or any software for that matter — is failing to understand how it works. In the early days of open source, we could only understand our software by creating it. There wasn’t a difference between being an open source user and an open source contributor.

Open source is very different today. The number of open source users is huge (the population of the world to be exact), while the number of open source contributors is much smaller. And this is OK because everyone shouldn’t be expected to be an open source contributor. There’s nothing wrong with taking in open source packages and using them to build something else. That’s the whole point!

If there’s one piece of advice I can give, it’s that consuming open source can help you create better software faster as long as you manage risk. There are many good tools that scan for vulnerabilities and there are SBOM-driven solutions to help you identify security issues in all your software components. Open source is an experience where we will all have a different journey. But like any journey, we have to pay attention along the way or we could find ourselves off course.

Josh Bressers
Josh Bressers is vice president of security at Anchore where he guides security feature development for the company’s commercial and open source solutions. He serves on the Open Source Security Foundation technical advisory council and is a co-founder of the Global Security Database project, which is a Cloud Security Alliance working group that is defining the future of security vulnerability identifiers.

Grype now supports CycloneDX and SPDX

In the world of software bills of materials (SBOM) there are currently two major standards: Software Package Data Exchange (SPDX) and CycloneDX. SPDX is a product of the Linux Foundation. It’s been a standard for over ten years now. CycloneDX is brought to us by the OWASP project. It’s a bit newer than SPDX, and just as capable. If you’re following the SBOM news, these two formats are often topics of discussion.

It is expected that anyone who is creating or consuming SBOMs will probably use one of these two formats to ensure a certain amount of interoperability. If you expect the consumers of your software to keep track of your SBOM, you need a standard way of communicating. Likewise, if we are expecting an SBOM from our vendors, we want to make sure it’s in a format we can actually use. This is one of those cases where more isn’t better, two is plenty.

If you’re familiar with Anchore’s open source projects Syft and Grype, there’s also another format you’ve probably seen known as the Syft lossless SBOM. This format has been tailored specifically to the needs of Syft and Grype when the projects were just starting out. It’s a great format and contains a huge amount of information, but there aren’t a lot of tools out there that can generate or consume this SBOM format today.

When we think about vulnerability scanners, we tend to think about pointing a scanner at a container, or directory, or even a source repo, then scanning that location to find vulnerabilities in the dependencies. Grype has a neat trick though, it can scan an SBOM for vulnerabilities. This means instead of having to first scan the files to identify them, then figure out if any have vulnerabilities. Grype can skip over that identification  step by using an SBOM. Most of the time a vulnerability scanner spends is in this identification stage, scanning an SBOM for vulnerabilities is incredibly fast.

Initially Grype was only able to use a Syft format SBOM to scan for vulnerabilities. This is awesome, but we come back to the problem of what happens when a vendor gives us an SBOM in SPDX or CycloneDX format? The easy answer is to support those formats too of course. The next obvious question is which format should Grype support next; SPDX or CycloneDX? Since making a decision is hard, and SBOM formats are like children, you can’t really pick a favorite, it was decided to support both!

If you download the latest version of Grype you can now use it to scan your SPDX and CycloneDX SBOMs for vulnerabilities. If a vendor ships you an SBOM, it can be fed directly into Grype. We’re pretty sure Grype is the first open source vulnerability scanner that supports both SPDX and CycloneDX at the time of writing this. We think that’s a pretty big deal!

Now, it should be noted that this functionality is very new. There are going to be bugs and difficulties scanning SPDX and CycloneDX SBOMs. We would be fools to pretend the features are perfect. However, Grype is also an open source project, you don’t have to sit on the sidelines and watch. Open source is a team sport. If you scan an SBOM with Grype and run into any problems, please file a bug here. You can even submit a patch if that’s more your style, we love pull requests from our community.

Stay tuned for even more awesome features coming soon. We’re just getting started!