In our latest Grype release, we've updated the DB schema to v6. This update isn't just a cosmetic change; it's a thoughtful redesign that optimizes data storage and matching performance. For you, this means faster database updates (65MB vs 210MB downloads), quicker scans, and more comprehensive vulnerability detection, all while maintaining the familiar output format and user experience you rely on.
The Past: Schema v5
Originally, grype’s vulnerability data was managed using two main tables:
- VulnerabilityModel: This table stores package-specific vulnerability details. Each affected package version required a separate row, which led to significant metadata duplication.
- VulnerabilityMetadataModel: To avoid duplicating large strings (like detailed vulnerability descriptions), metadata was separated into its own table.
This v1 design was born out of necessity. Early CGO-free SQLite drivers didn’t offer SQLite's plethora of features. In later releases we were able to swap out the SQLite driver to the newly available modernc.org/sqlite driver and use GORM for general access.
However, v2 - v5 had the same basic design approach. This led to space inefficiencies: the on-disk footprint grew to roughly 1.6 GB, and the cost was notable even after compression (210 MB as a tar.gz).
When it came to searching the database, we organized rows into “namespaces” which was a string that indicated the intended ecosystem this affected (e.g. a specific distro name + version, a language name, etc, for instance redhat:distro:redhat:7
or cpe:nvd
).
When searching for matches in Grype, we would cast a wide net on an initial search within the database by namespace + package name and refine the results by additionally parsed attributes, effectively casting a smaller net as we progressed. As the database grew we came across more examples where the idea of a “namespace” just didn’t make sense (for instance, what if you weren’t certain what namespace your software artifact landed in, do you simply search all namespaces?). We clearly needed to remove the notion of namespaces as a core input into searching the database.
Another thing that happened after the initial release of the early Grype DB schemas: the Open Source Vulnerability schema (OSV) was released. This format enabled a rich, machine-readable format that could be leveraged by vulnerability data providers when publishing vulnerability advisories, and meant that tools could more easily consume data from a broad set of vulnerability sources, providing more accurate results for end users. We knew that we wanted to more natively be able to ingest this format and maybe even express records internally in a similar manner.
The Present: Schema v6
To address these challenges, we've entirely reimagined how Grype stores and accesses vulnerability data:
At a high level, the new DB is primarily a JSON blob store for the bulk of the data, with specialized indexes for efficient searching. The stored JSON blobs are heavily inspired by the OSV schema, but tailored to meet Grype's specific needs. Each entity we want to search by gets its own table with optimized indexes, and these rows point to the OSV-like JSON blob snippets.
Today, we have three primary search tables:
- AffectedPackages: These are packages that exist in a known language, packaging ecosystem, or specific Linux distribution version.
- AffectedCPEs: These are entries from NVD which do not have a known packaging ecosystem.
- Vulnerabilities: These contain core vulnerability information without any packaging information.
One of the most significant improvements is removing "namespaces" entirely from within the DB. Previously, client-based changes were needed to craft the correct namespace for database searches. This meant shipping software updates for what were essentially data corrections. In v6, we've shifted these cases to simple lookup tables in the DB, normalizing search input. We can fix or add search queries through database updates alone, no client update required.
Moreover, the v6 schema's modular design simplifies extending functionality. Integrating additional vulnerability feeds or other external data sources is now far more straightforward, ensuring that Grype remains flexible and future-proof.
The Benefits: What's New in the Database
In terms of content, v6 includes everything from v5 plus important additions:
- Withdrawn vulnerabilities: We now persist "withdrawn" vulnerabilities. While this doesn't affect matching, it improves reference capabilities for related vulnerability data
- Enhanced datasets: We've added the CISA Known Exploited Vulnerabilities and EPSS (Exploit Prediction Scoring System) datasets to the database
The best way to explore this data is with the grype db search
and grype db search vuln
commands.
search
allows you to discover affected packages by a wide array of parameters (package name, CPE, purl, vulnerability ID, provider, ecosystem, linux distribution, added or modified since a particular date, etc):
$ grype db search --pkg log4j
VULNERABILITY PACKAGE ECOSYSTEM NAMESPACE VERSION CONSTRAINT
ALAS-2021-003 log4j rpm amazon:distro:amazonlinux:2022 < 2.15.0-1.amzn2022.0.1
ALAS-2021-004 log4j rpm amazon:distro:amazonlinux:2022 < 2.16.0-1.amzn2022
ALAS-2021-008 log4j rpm amazon:distro:amazonlinux:2022 < 2.17.0-1.amzn2022.0.1
ALAS-2022-011 log4j rpm amazon:distro:amazonlinux:2022 < 2.17.1-1.amzn2022.0.1
ALAS-2022-1739 log4j rpm amazon:distro:amazonlinux:2 < 1.2.17-17.amzn2
ALAS-2022-1750 log4j rpm amazon:distro:amazonlinux:2 < 1.2.17-18.amzn2
ALAS-2022-225 log4j rpm amazon:distro:amazonlinux:2022 < 2.17.2-1.amzn2022.0.3
CVE-2017-5645 log4j rpm redhat:distro:redhat:5
CVE-2017-5645 cpe:2.3:a:apache:log4j:*:*:*:*:*:* nvd:cpe >= 2.0, < 2.8.2
...
search vuln
enables being able to search just for vulnerability records:
$ grype db search vuln CVE-2021-44228
ID PROVIDER PUBLISHED SEVERITY REFERENCE
CVE-2021-44228 debian (10, 11, 12, 13, unstable) negligible https://security-tracker.debian.org/tracker/CVE-2021-44228
CVE-2021-44228 debian (9) critical https://security-tracker.debian.org/tracker/CVE-2021-44228
CVE-2021-44228 nvd 2021-12-10 CVSS:3.1/AV:N/AC:L/PR:N/UI:N/S:C/C:H/I:H/A:H https://nvd.nist.gov/vuln/detail/CVE-2021-44228
CVE-2021-44228 sles (15.4, 15.5, 15.6) critical https://www.suse.com/security/cve/CVE-2021-44228
CVE-2021-44228 ubuntu (14.4, 16.4, 18.4, 20.4, 21.10, 21.4) high https://ubuntu.com/security/CVE-2021-44228
As with all of our tools, there is -o json
available with these commands to be able to explore the raw affected package, affected CPE, and vulnerability records:
$ grype db search vuln CVE-2021-44228 -o json --provider nvd
[
{
"id": "CVE-2021-44228",
"assigner": [
"[email protected]"
],
"description": "Apache Log4j2 2.0-beta9 through 2.15.0 (excluding security releases 2.12.2, 2.12.3, and 2.3.1) JNDI features...",
"refs": [...],
"severities": [...],
"provider": "nvd",
"status": "active",
"published_date": "2021-12-10T10:15:09.143Z",
"modified_date": "2025-02-04T15:15:13.773Z",
"known_exploited": [
{
"cve": "CVE-2021-44228",
"vendor_project": "Apache",
"product": "Log4j2",
"date_added": "2021-12-10",
"required_action": "For all affected software assets for which updates exist, the only acceptable remediation actions are: 1) Apply updates; OR 2) remove affected assets from agency networks. Temporary mitigations using one of the measures provided at https://www.cisa.gov/uscert/ed-22-02-apache-log4j-recommended-mitigation-measures are only acceptable until updates are available.",
"due_date": "2021-12-24",
"known_ransomware_campaign_use": "known",
"urls": [
"https://nvd.nist.gov/vuln/detail/CVE-2021-44228"
],
"cwes": [
"CWE-20",
"CWE-400",
"CWE-502"
]
}
],
"epss": [
{
"cve": "CVE-2021-44228",
"epss": 0.97112,
"percentile": 0.9989,
"date": "2025-03-03"
}
]
}
]
Dramatic Size Reduction: The Technical Journey
One of the standout improvements of v6 is the dramatic size reduction:
Metric | Schema v5 | Schema v6 | Improvement |
---|---|---|---|
Raw DB Size | 1.6 GB | 900 MB | 44% smaller |
Compressed Archive | 210 MB | 65 MB | 69% smaller |
This means you'll experience significantly faster database updates and reduced storage requirements.
We build and distribute Grype database archives daily to provide users with the most up-to-date vulnerability information. Over the past five years, we've added more vulnerability sources, and the database has more than doubled in size, significantly impacting users who update their databases daily.
Our optimization strategy included:
- Switching to zstandard compression: This yields better compression ratios compared to gzip, providing immediate space savings.
- Database layout optimization: We prototyped various database layouts, experimenting with different normalization levels (database design patterns that eliminate data redundancy). While higher normalization saved space in the raw database, it sometimes yielded worse compression results. We found the optimal balance between normalization and leaving enough unnormalized data for compression algorithms to work effectively.
Real-World Impact
These improvements directly benefit several common scenarios:
- CI/CD Pipelines: With a 69% smaller download size, your CI/CD pipelines will update vulnerability databases faster, reducing build times and costs.
- Air-gapped Environments: If you're working in air-gapped environments and need to transport the database, its significantly smaller size makes this process much more manageable.
- Resource-constrained Systems: The smaller memory footprint means Grype can now run more efficiently on systems with limited resources.
Conclusion
The evolution of the Grype database schema from v5 to v6 marks a significant milestone. By rethinking our database structure and using the OSV schema as inspiration, we've created a more efficient, scalable, and feature-rich database that directly benefits your vulnerability management workflows.
We'd like to encourage you to update to the latest version of Grype to take advantage of these improvements. If you have feedback on the new schema or ideas for further enhancements, please share them with us on Discourse, and if you spot a bug, let us know on GitHub.
If you’d like to get updates about the Anchore Open Source Community, sign up for our low-traffic community newsletter. Stay tuned for more updates as we refine Grype and empower your security practices!