We migrated from S3 to R2. Thankfully nobody noticed

Sometimes, the best changes are the ones that you don’t notice. Well, some of you reading this may not have noticed, but there’s a good chance that many of you did notice a hiccup or two in Grype database availability that suddenly became a lot more stable.

One of the greatest things about Anchore, is that we are empowered to make changes quickly when needed. This is the story about doing just that: identifying issues in our database distribution mechanism and making a change to improve the experience for all our users.

A Heisenbug is born

It all started some time ago, in a galaxy far away. As early as 2022, when we received reports that some users had issues downloading the Grype database. These issues included general slowness and timeouts, with users receiving the dreaded: context deadline exceeded; and manually downloading the database from a browser could show similar behavior:

Debugging these transient single issues among thousands of legitimate, successful downloads was problematic for the team, as no one could reproduce these reliably, so it remained unclear what the cause was. A few more reports trickled in here and there, but everything seemed to work well whenever we tested this ourselves. Without further information, we had to chalk this up to something like unreliable network transfers in specific regions or under certain conditions, exacerbated by the moderately large size of the database: about 200 MB, compressed.

To determine any patterns or provide feedback to our CDN provider that users are having issues downloading the files, we set up a job to download the database periodically, adding DataDog monitoring across many regions to do the same thing. We noticed a few things: periodic and regular issues downloading the database, and the failures seemed to correlate to high-volume periods – just after a new database was built, for example. We continued monitoring these, but the intermittent failures didn’t seem frequent enough to cause great concern.

Small things matter

At some point leading up to August, we also began to get reports of users experiencing issues downloading the Grype database listing file. When Grype downloads the database, it first downloads a listing file to determine if a newer database exists. At the time, this file contained a historical record of 450 databases worth of metadata (90 days × each of the 5 Grype database versions), so the listing file clocked in around 200 KB. 

Grype only really needs the latest database, so the first thing we did was trim this file down to only the last few days; once we shrunk this file to under 5k, the issues downloading the listing file itself went away. This was our first clue about the problem: smaller files worked fine.

Fast forward to August 16, 2024: we awoke to multiple reports from people worldwide indicating they had the same issues downloading the database. We finally started to see the same thing ourselves after many months of being unable to reproduce the failures meaningfully. What happened? We had reached an inflection point of traffic that was causing issues with the CDN being able to deliver these files reliably to end users. Interestingly, the traffic was not from Grype but rather from Syft invocations checking for application updates: 1 million times per hour – approximately double what we saw previously, and this amount of traffic was beginning to affect users of Grype adversely – since they were served from the same endpoint, possibly due to the volume causing some throttling by the CDN provider.

The right tool for the job

As a team, we had individually investigated these database failures, but we decided it was time for all of us to strap on our boots and solve this. The clue we had from decreasing the size of the listing file was crucial to understanding what was going on. We were using a standard CDN offering backed by AWS S3 storage. 

Finding documentation about the CDN usage resulted in vague information that didn’t help us understand if we were decidedly doing something wrong or not. However, much of the documentation was evident in that it talked about web traffic, and we could assume this is how the service is optimized based on our experience with a more web-friendly sized listing file. After much reading, it started to sound like larger files should be served using the Cloudflare R2 Object Storage offering instead…

So that’s what we did: the team collaborated via a long, caffeine-fuelled Zoom call over an entire day. We updated our database publishing jobs to additionally publish databases and updated listing files to a second location backed by the Cloudflare R2 Object Storage service, served from grype.anchore.io instead of toolbox-data.anchore.io/grype

We verified this was working as expected with Grype and finally updated the main listing file to point to this new location. The traffic load moved to the new service precisely as expected. This was completely transparent for Grype end-users, and our monitoring jobs have been green since!

While this wasn’t fun to scramble to fix, it’s great to know that our tools are popular enough to cause problems with a really good CDN service. Because of all the automated testing we have in place, our autonomy to operate independently, and robust publishing jobs, we were able to move quickly to address these issues. After letting this change operate over the weekend, we composed a short announcement for our community discourse to keep everyone informed. 

Many projects experience growing pains as they see increased usage; our tools are no exception. Still, we were able almost seamlessly to provide everyone with a more reliable experience quickly and have had reports that the change has solved issues for them. Hopefully, we won’t have to make any more changes even when usage grows another 100x…

If you have any feedback for the Syft & Grype developers, head over to our community discourse.

How to Generate an SBOM with Free Open Source Tools

Generating a Software Bill of Materials (SBOM) as part of your DevOps process is an essential technique to help secure your software supply chain. SBOMs are becoming critical due to the growing prominence of supply chain attacks such as Solarwinds, maintainers intentionally adding malware like node-ipc, and severe vulnerabilities like Log4Shell.

SBOMs can help identify the software components used within a system as well as licenses and vulnerabilities. SBOMs also can be used to comply with the Executive Order Improving the Nation’s Cybersecurity.

Fortunately, there are a number of tools that can help create SBOMs and generating your first one takes just a few easy steps:

  1. Choose your SBOM generation tool – we’ll use Syft here
  2. Download and install Syft
  3. Determine the SBOM output format you need
  4. Run Syft against the desired source: syft <source> -o <format>

Hold on! Before you jump into using open source tools for SBOMs, note that you can get instant access to a free trial of the Anchore Enterprise platform here.

Open Source Tools for Generating SBOMs

There are many tools available for generating SBOMs, so the first thing you’ll need to do is pick one to use. SBOM generators are often specific to a particular ecosystem such as Python or Go. Some are capable of generating SBOMs for a number of different ecosystems and environments. Some of the more popular SBOM tools are:

  1. Syft by Anchore
  2. Tern
  3. Kubernetes BOM tool
  4. spdx-sbom-generator

For this example we’ll focus on Syft, since it is easy to use in many different scenarios and supports a variety of ecosystems. Syft can run on your desktop, in CI systems, as a Docker container and scan a wide variety of ecosystems from Linux distributions to many types of build dependency specifications.

Getting Syft

The first thing to do is download Syft. There are a number of ways to do this:

Using curl

The recommended method to get Syft for macOS and Linux is by using curl:

curl -sSfL https://raw.githubusercontent.com/anchore/syft/main/install.sh | sh -s -- -b <SOME_BIN_PATH> <RELEASE_VERSION>

For example:

curl -sSfL https://raw.githubusercontent.com/anchore/syft/main/install.sh | sh -s -- -b /usr/local/bin

 

Homebrew

For macOS, you can install Syft using Homebrew:

brew tap anchore/syft
brew install syft

 

Direct Download

You can directly download Syft binaries for many platforms including Windows from the GitHub releases page.

Docker

There is also a Syft Docker image with every release: anchore/syft, which can be run like this:

docker run -it --rm anchore/syft <args>

 

Validate the Syft Installation

To confirm Syft was installed correctly, simply run:

syft version

You should see output similar to:

Application:        syft
Version:            0.43.2
JsonSchemaVersion:  3.2.2
BuildDate:          2022-04-06T21:49:04Z
GitCommit:          e415bb21e7a609c12dc37a2d6395796fb675e3fe
GitDescription:     v0.43.2
Platform:           linux/amd64
GoVersion:          go1.18
Compiler:           gc

Note: Syft was version 0.43.2 at the time of this writing

Generating Your First SBOM

Once you have Syft available, creating your first SBOM is simple. Syft supports multiple sources to scan when generating an SBOM using both the local filesystem and container images.

Scanning Images

To generate an SBOM for a Docker or OCI image – even without a Docker daemon, simply run:

syft <image>

By default, output includes only software that is included in the final layer of the container. To include software from all image layers in the SBOM, regardless of its presence in the final image, use the –scope all-layers option:

syft --scope all-layers <image>

 

Scanning the Filesystem

To generate an SBOM for the local filesystem, use the dir: and file: prefixes with either absolute or relative paths. For example to scan the current directory:

syft dir:.

Or a specific file:

syft file:/my-go-binary

Syft can generate SBOMs from a variety of other sources, such as Podman, tar archives, or directly from an OCI registry even when Docker is not available. Check out the full list of sources.

Basic Example

For example, to scan the latest Alpine image, simply run:

syft alpine:latest

You should see output similar to this:

 ✔ Loaded image            
 ✔ Parsed image            
 ✔ Cataloged packages      [14 packages]
NAME                    VERSION      TYPE 
alpine-baselayout       3.2.0-r18    apk   
alpine-keys             2.4-r1       apk   
apk-tools               2.12.7-r3    apk   
busybox                 1.34.1-r3    apk   
ca-certificates-bundle  20191127-r7  apk   
libc-utils              0.7.2-r3     apk   
libcrypto1.1            1.1.1l-r7    apk   
libretls                3.3.4-r2     apk   
libssl1.1               1.1.1l-r7    apk   
musl                    1.2.2-r7     apk   
musl-utils              1.2.2-r7     apk   
scanelf                 1.3.3-r0     apk   
ssl_client              1.34.1-r3    apk   
zlib                    1.2.11-r3    apk

By default, the SBOM you’ll see will be a nicely formatted table rather than any standardized SBOM format, which leads us to…

Choose Your SBOM Format

Depending on your use cases, it may be important to use a particular SBOM format. The most common ones are Software Package Data Exchange (SPDX) and CycloneDX, both of which Syft supports. Syft also has a format which interoperates losslessly with the Grype vulnerability scanner.

While Syft supports these different formats, they have slightly different goals and features. It may be important to pick SPDX or CycloneDX for interoperability with other tools or as a standardized format to distribute to downstream consumers.

Generating an SBOM in SPDX format

If your use case requires an SBOM in SPDX format, Syft has you covered. SPDX has been around the longest of all the formats mentioned here. There are multiple variants of SPDX. Syft supports SPDX Tag-value (spdx-tag-value) and SPDX JSON (spdx-json). For SPDX JSON, simply add the -o spdx-json argument. For example, running this against a docker image, again using the latest Alpine:

syft alpine:latest -o spdx-json

You’ll see there is a lot more data than the table view allows! You should see something resembling:

{
 "SPDXID": "SPDXRef-DOCUMENT",
 "name": "alpine-latest",
 "spdxVersion": "SPDX-2.2",
 "creationInfo": {
  "created": "2022-04-12T01:47:03.011148Z",
  "creators": [
   "Organization: Anchore, Inc",
   "Tool: syft-0.42.4"
  ],
  "licenseListVersion": "3.16"
 },
 "dataLicense": "CC0-1.0",
 "documentNamespace": "https://anchore.com/syft/image/alpine-latest-31e0e940-da83-4ea2-8a0c-fbba76371667",
 "packages": [
  {
   "SPDXID": "SPDXRef-8039c8621bcc1383",
   "name": "alpine-baselayout",
   "licenseConcluded": "GPL-2.0-only",
   "description": "Alpine base dir structure and init scripts",
   "downloadLocation": "https://git.alpinelinux.org/cgit/aports/tree/main/alpine-baselayout",
   "externalRefs": [
    {
     "referenceCategory": "SECURITY",
     "referenceLocator": "cpe:2.3:a:alpine:alpine-baselayout:3.2.0-r18:*:*:*:*:*:*:*",
     "referenceType": "cpe23Type"
    },
    {
     "referenceCategory": "PACKAGE_MANAGER",
     "referenceLocator": "pkg:alpine/[email protected]?arch=x86_64&upstream=alpine-baselayout&distro=alpine-3.15.0",
     "referenceType": "purl"
    }
   ],
   "filesAnalyzed": false,
   "licenseDeclared": "GPL-2.0-only",
   "originator": "Person: Natanael Copa <[email protected]>",
   "sourceInfo": "acquired package info from APK DB: /lib/apk/db/installed",
   "versionInfo": "3.2.0-r18"
  }
 ],
 "files": [
  {
   "SPDXID": "SPDXRef-2eaa15c5fc625ebe",
   "comment": "layerID: sha256:8d3ac3489996423f53d6087c81180006263b79f206d3fdec9e66f0e27ceb8759",
   "licenseConcluded": "NOASSERTION",
   "fileName": "/etc/crontabs/root"
  }
 ],
 "relationships": [
  {
   "spdxElementId": "SPDXRef-8039c8621bcc1383",
   "relationshipType": "CONTAINS",
   "relatedSpdxElement": "SPDXRef-2eaa15c5fc625ebe"
  }
 ]
}

Not only does this format contain the package names, but also Package URLs, license information, and a host of other things such as files Syft identified associated with a package.

Generating an SBOM in CycloneDX format

Similarly, if you need to generate an SBOM in CycloneDX format use a CycloneDX format option. Syft supports CycloneDX XML (cyclonedx-xml) and JSON (cyclonedx-json). For CycloneDX XML:

syft <source> -o cyclonedx-xml

To run this against the same latest Alpine image, run:

syft alpine:latest -o cyclonedx-xml

And you should see a result resembling this:

<?xml version="1.0" encoding="UTF-8"?>
<bom xmlns="http://cyclonedx.org/schema/bom/1.4" serialNumber="urn:uuid:fb2a4dac-b62b-4d78-b209-40bd09388022" version="1">
  <metadata>
    <timestamp>2022-04-11T22:01:51-04:00</timestamp>
    <tools>
      <tool>
        <vendor>anchore</vendor>
        <name>syft</name>
        <version>0.42.4</version>
      </tool>
    </tools>
    <component bom-ref="27f24e002ab47c1b" type="container">
      <name>alpine:latest</name>
      <version>sha256:a3f8ca28888378e4880b3f73504c78278a9038dccf906760a1afd4a08c81c1c1</version>
    </component>
  </metadata>
  <components>
    <component type="library">
      <publisher>Natanael Copa &lt;[email protected]&gt;</publisher>
      <name>alpine-baselayout</name>
      <version>3.2.0-r18</version>
      <description>Alpine base dir structure and init scripts</description>
      <licenses>
        <license>
          <id>GPL-2.0-only</id>
        </license>
      </licenses>
      <cpe>cpe:2.3:a:alpine-baselayout:alpine-baselayout:3.2.0-r18:*:*:*:*:*:*:*</cpe>
      <purl>pkg:alpine/[email protected]?arch=x86_64&amp;upstream=alpine-baselayout&amp;distro=alpine-3.15.0</purl>
      <externalReferences>
        <reference type="distribution">
          <url>https://git.alpinelinux.org/cgit/aports/tree/main/alpine-baselayout</url>
        </reference>
      </externalReferences>
      <properties>
        <property name="syft:package:foundBy">apkdb-cataloger</property>
        <property name="syft:package:metadataType">ApkMetadata</property>
        <property name="syft:package:type">apk</property>
        <property name="syft:cpe23">cpe:2.3:a:alpine:alpine-baselayout:3.2.0-r18:*:*:*:*:*:*:*</property>
        <property name="syft:location:0:layerID">sha256:8d3ac3489996423f53d6087c81180006263b79f206d3fdec9e66f0e27ceb8759</property>
        <property name="syft:location:0:path">/lib/apk/db/installed</property>
        <property name="syft:metadata:gitCommitOfApkPort">dfa1379357a321e638feef1cd8d55ab03d020f45</property>
        <property name="syft:metadata:installedSize">413696</property>
        <property name="syft:metadata:originPackage">alpine-baselayout</property>
        <property name="syft:metadata:pullChecksum">Q1EymS6rAgmGs7XYhqdyEoiWgEZ6A=</property>
        <property name="syft:metadata:pullDependencies">/bin/sh so:libc.musl-x86_64.so.1</property>
        <property name="syft:metadata:size">21101</property>
      </properties>
    </component>
    <component type="operating-system">
      <name>alpine</name>
      <version>3.15.0</version>
      <description>Alpine Linux v3.15</description>
      <swid tagId="alpine" name="alpine" version="3.15.0"></swid>
      <externalReferences>
        <reference type="issue-tracker">
          <url>https://bugs.alpinelinux.org/</url>
        </reference>
        <reference type="website">
          <url>https://alpinelinux.org/</url>
        </reference>
      </externalReferences>
      <properties>
        <property name="syft:distro:id">alpine</property>
        <property name="syft:distro:prettyName">Alpine Linux v3.15</property>
        <property name="syft:distro:versionID">3.15.0</property>
      </properties>
    </component>
  </components>
</bom>

Again, there is a lot more data than the table allows, but a different set of data than the SPDX format because there simply is not a one-to-one mapping of properties between the two.

Generating an SBOM in Syft Lossless format

The last format we’ll talk about is Syft’s own JSON format. If there isn’t a need to provide an SBOM to other tools and you may be using Grype to scan the SBOM, the format with the highest fidelity is the Syft JSON format. Both SPDX and CycloneDX lose some amount of information from the initial Syft data model whereas the Syft format does not.

Although Grype works great with SPDX and CycloneDX, there could be a situation where data was lost converting to one of these formats and Grype matching uses some of that extra data, so using the Syft JSON might make the most sense. To use the Syft JSON format, use the -o json argument.

Additional Syft Features

There’s a lot more that Syft can do, with quite a few configuration options. A few things to note include:

  • Output the SBOM to a file using –file path/to/file
  • Exclude paths from scanning using –exclude path/**/*.txt
  • Specify configuration in a .syft.yaml file
  • Connect to private OCI registries
  • Cryptographically sign and attest SBOMs

Next Steps

Now that you’ve got an SBOM, what’s next? A logical next step would be to integrate with your build pipeline to have SBOMs generated automatically. In fact, there could be more than one location where it makes sense to generate SBOMs such as build time and after a container is built or during a release process.

The SBOMs then could be scanned for license compliance and continuously for vulnerabilities. In fact, if you are using GitHub Actions, there are a couple actions to do just that: sbom-action to generate SBOMs using Syft and scan-action to perform vulnerability scanning. For a few repositories, it’s very simple to set these up but might be challenging when there are a lot of repositories to keep track of.

Managing SBOMs at scale

As we’ve talked about, using SBOMs as a central part of securing your software supply chain is increasingly important. Integrating automated SBOM generation into your DevOps process is vital. Storing, managing, and analyzing those SBOMs to inform security measures should be an important consideration for you and your organization.

For more comprehensive SBOM management, an enterprise level solution like Anchore Enterprise will enable you to generate comprehensive SBOMs with every build, detect drift from one build to the next, share SBOMs internally or externally, and quickly identify risk such as vulnerabilities, secrets, malware, and misconfiguration. To learn more about Anchore Enterprise, schedule a demo with one of our specialists here.

Conclusion

Now that you understand the many reasons to generate SBOMs (whether for compliance or vulnerability analysis) using Syft to generate SBOMs is a flexible and simple process with many options to tailor SBOMs to your specific use cases.

If you’d like to explore using Anchore Enterprise for its robust features like continuous visibility, SBOM monitoring, drift detection, and policy enforcement then access a free 15 day trial here.