When we were invited to participate in Carnegie Mellon University’s Software Engineering Institute (SEI) SBOM Harmonization Plugfest 2024, we saw an opportunity to contribute to SBOM generation standardization efforts and thoroughly exercise our open-source SBOM generator, Syft.
While the Plugfest only required two SBOM submissions, we decided to go all in – and learned some valuable lessons along the way.
The Plugfest Challenge
The SBOM Harmonization Plugfest aims to understand why different tools generate different SBOMs for the same software. It’s not a competition but a collaborative study to improve SBOM implementation harmonization. The organizers selected eight diverse software projects, ranging from Node.js applications to C++ libraries, and asked participants to generate SBOMs in standard formats like SPDX and CycloneDX.
Going Beyond the Minimum
Instead of just submitting two SBOMs, we decided to:
- SBOM generation for all eight target projects
- Create both source and binary analysis SBOMs where possible
- Output in every format Syft supports
- Test both enriched and non-enriched versions
- Validate everything thoroughly
This comprehensive approach would give us (and the broader community) much more data to work with.
Automation: The Key to Scale
To handle this expanded scope, we created a suite of scripts to automate the entire process:
- Target acquisition
- Source SBOM generation
- Binary building
- Binary SBOM generation
- SBOM validation
The entire pipeline runs in about 38 minutes on a well-connected server, generating nearly three hundred SBOMs across different formats and configurations.
The Power of Enrichment
One of Syft’s interesting features is its --enrich
option, which can enhance SBOMs with additional metadata from online sources. Here’s a real example showing the difference in a CycloneDX SBOM for Dependency-Track:
$ wc -l dependency-track/cyclonedx-json.json dependency-track/cyclonedx-json_enriched.json
5494 dependency-track/cyclonedx-json.json
6117 dependency-track/cyclonedx-json_enriched.json
The enriched version contains additional information like license URLs and CPE identifiers:
{
"license": {
"name": "Apache 2",
"url": "http://www.apache.org/licenses/LICENSE-2.0"
},
"cpe": "cpe:2.3:a:org.sonatype.oss:JUnitParams:1.1.1:*:*:*:*:*:*:*"
}
These additional identifiers are crucial for security and compliance teams – license URLs help automate legal compliance checks, while CPE identifiers enable consistent vulnerability matching across security tools.
SBOM Generation of Binaries
While source code analysis is valuable, many Syft users analyze built artifacts and containers. This reflects real-world usage where organizations must understand what’s being deployed, not just what’s in the source code. We built and analyzed binaries for most target projects:
Package | Build Method | Key Findings |
---|---|---|
Dependency Track | Docker | The container SBOMs included ~1000 more items than source analysis, including base image components like Debian packages |
HTTPie | pip install | Binary analysis caught runtime Python dependencies not visible in source |
jq | Docker | Python dependencies contributed significant additional packages |
Minecolonies | Gradle | Java runtime java archives appeared in binary analysis, but not in the source |
OpenCV | CMake | Binary and source SBOMs were largely the same |
hexyl | Cargo build | Rust static linking meant minimal difference from source |
nodejs-goof | Docker | Node.js runtime and base image packages significantly increased the component count |
Some projects, like gin-gonic (a library) and PHPMailer, weren’t built as they’re not typically used as standalone binaries.
The differences between source and binary SBOMs were striking. For example, the Dependency-Track container SBOM revealed:
- Base image operating system packages
- Runtime dependencies not visible in source analysis
- Additional layers of dependencies from the build process
- System libraries and tools included in the container
This perfectly illustrates why both source and binary analysis are important:
- Source SBOMs show some direct development dependencies
- Binary/container SBOMs show the complete runtime environment
- Together, they provide a full picture of the software supply chain
Organizations can leverage these differences in their CI/CD pipelines – using source SBOMs for early development security checks and binary/container SBOMs for final deployment validation and runtime security monitoring.
Unexpected Discovery: SBOM Generation Bug
One of the most valuable outcomes wasn’t planned at all. During our comprehensive testing, we discovered a bug in Syft’s SPDX document generation. The SPDX validators were flagging our documents as invalid due to absolute file paths:
file name must not be an absolute path starting with "/", but is:
/.github/actions/bootstrap/action.yaml
file name must not be an absolute path starting with "/", but is:
/.github/workflows/benchmark-testing.yaml
file name must not be an absolute path starting with "/", but is:
/.github/workflows/dependabot-automation.yaml
file name must not be an absolute path starting with "/", but is:
/.github/workflows/oss-project-board-add.yaml
The SPDX specification requires relative file paths in the SBOM, but Syft used absolute paths. Our team quickly developed a fix, which involved converting absolute paths to relative ones in the format model logic:
// spdx requires that the file name field is a relative filename
// with the root of the package archive or directory
func convertAbsoluteToRelative(absPath string) (string, error) {
// Ensure the absolute path is absolute (although it should already be)
if !path.IsAbs(absPath) {
// already relative
log.Debugf("%s is already relative", absPath)
return absPath, nil
}
// we use "/" here given that we're converting absolute paths from root to relative
relPath, found := strings.CutPrefix(absPath, "/")
if !found {
return "", fmt.Errorf("error calculating relative path: %s", absPath)
}
return relPath, nil
}
The fix was simple but effective – stripping the leading “/” from absolute paths while maintaining proper error handling and logging. This change was incorporated into Syft v1.18.0, which we used for our final Plugfest submissions.
This discovery highlights the value of comprehensive testing and community engagement. What started as a participation in the Plugfest ended up improving Syft for all users, ensuring more standard-compliant SPDX documents. It’s a perfect example of how collaborative efforts like the Plugfest can benefit the entire SBOM ecosystem.
SBOM Validation
We used multiple validation tools to verify our SBOMs:
- CycloneDX’s sbom-utility
- SPDX’s pyspdxtools
- NTIA’s online validator
Interestingly, we found some disparities between validators. For example, some enriched SBOMs that passed sbom-utility validation failed with pyspdxtools. Further, the NTA online validator gave us another different result in many cases. This highlights the ongoing challenges in SBOM standardization – even the tools that check SBOM validity don’t always agree!
Key Takeaways
- Automation is crucial: Our scripted approach allowed us to efficiently generate and validate hundreds of SBOMs.
- Real-world testing matters: Building and analyzing binaries revealed insights (and bugs!) that source-only analysis might have missed.
- Enrichment adds value: Additional metadata can significantly enhance SBOM utility, though support varies by ecosystem.
- Validation is complex: Different validators can give different results, showing the need for further standardization.
Looking Forward
The SBOM Harmonization Plugfest results will be analyzed in early 2025, and we’re eager to see how different tools handled the same targets. Our comprehensive submission will help identify areas where SBOM generation can be improved and standardized.
More importantly, this exercise has already improved Syft for our users through the bug fix and given us valuable insights for future development. We’re committed to continuing this thorough testing and community participation to make SBOM generation more reliable and consistent for everyone.
The final SBOMs are published in the plugfest-sboms repo, with the scripts in the plugfest-scripts repository. Consider using Syft for SBOM generation against your code and containers, and let us know how you get on in our community discourse.