In our latest Grype release, we've updated the DB schema to v6. This update isn't just a cosmetic change; it's a thoughtful redesign that optimizes data storage and matching performance. For you, this means faster database updates (65MB vs 210MB downloads), quicker scans, and more comprehensive vulnerability detection, all while maintaining the familiar output format and user experience you rely on.

The Past: Schema v5

Originally, grype’s vulnerability data was managed using two main tables:

  • VulnerabilityModel: This table stores package-specific vulnerability details. Each affected package version required a separate row, which led to significant metadata duplication.
  • VulnerabilityMetadataModel: To avoid duplicating large strings (like detailed vulnerability descriptions), metadata was separated into its own table.

This v1 design was born out of necessity. Early CGO-free SQLite drivers didn’t offer SQLite's plethora of features. In later releases we were able to swap out the SQLite driver to the newly available modernc.org/sqlite driver and use GORM for general access.

However, v2 - v5 had the same basic design approach. This led to space inefficiencies: the on-disk footprint grew to roughly 1.6 GB, and the cost was notable even after compression (210 MB as a tar.gz).

When it came to searching the database, we organized rows into “namespaces” which was a string that indicated the intended ecosystem this affected (e.g. a specific distro name + version, a language name, etc, for instance redhat:distro:redhat:7 or cpe:nvd).

When searching for matches in Grype, we would cast a wide net on an initial search within the database by namespace + package name and refine the results by additionally parsed attributes, effectively casting a smaller net as we progressed. As the database grew we came across more examples where the idea of a “namespace” just didn’t make sense (for instance, what if you weren’t certain what namespace your software artifact landed in, do you simply search all namespaces?). We clearly needed to remove the notion of namespaces as a core input into searching the database.

Another thing that happened after the initial release of the early Grype DB schemas: the Open Source Vulnerability schema (OSV) was released. This format enabled a rich, machine-readable format that could be leveraged by vulnerability data providers when publishing vulnerability advisories, and meant that tools could more easily consume data from a broad set of vulnerability sources, providing more accurate results for end users. We knew that we wanted to more natively be able to ingest this format and maybe even express records internally in a similar manner.

The Present: Schema v6

To address these challenges, we've entirely reimagined how Grype stores and accesses vulnerability data:

At a high level, the new DB is primarily a JSON blob store for the bulk of the data, with specialized indexes for efficient searching. The stored JSON blobs are heavily inspired by the OSV schema, but tailored to meet Grype's specific needs. Each entity we want to search by gets its own table with optimized indexes, and these rows point to the OSV-like JSON blob snippets.

Today, we have three primary search tables:

  • AffectedPackages: These are packages that exist in a known language, packaging ecosystem, or specific Linux distribution version.
  • AffectedCPEs: These are entries from NVD which do not have a known packaging ecosystem.
  • Vulnerabilities: These contain core vulnerability information without any packaging information.

One of the most significant improvements is removing "namespaces" entirely from within the DB. Previously, client-based changes were needed to craft the correct namespace for database searches. This meant shipping software updates for what were essentially data corrections. In v6, we've shifted these cases to simple lookup tables in the DB, normalizing search input. We can fix or add search queries through database updates alone, no client update required.

Moreover, the v6 schema's modular design simplifies extending functionality. Integrating additional vulnerability feeds or other external data sources is now far more straightforward, ensuring that Grype remains flexible and future-proof.

The Benefits: What's New in the Database

In terms of content, v6 includes everything from v5 plus important additions:

  • Withdrawn vulnerabilities: We now persist "withdrawn" vulnerabilities. While this doesn't affect matching, it improves reference capabilities for related vulnerability data
  • Enhanced datasets: We've added the CISA Known Exploited Vulnerabilities and EPSS (Exploit Prediction Scoring System) datasets to the database

The best way to explore this data is with the grype db search and grype db search vuln commands. 

search allows you to discover affected packages by a wide array of parameters (package name, CPE, purl, vulnerability ID, provider, ecosystem, linux distribution, added or modified since a particular date, etc):

$ grype db search --pkg log4j            
VULNERABILITY   PACKAGE ECOSYSTEM  NAMESPACE                       VERSION CONSTRAINT
ALAS-2021-003   log4j rpm        amazon:distro:amazonlinux:2022  < 2.15.0-1.amzn2022.0.1
ALAS-2021-004   log4j rpm        amazon:distro:amazonlinux:2022  < 2.16.0-1.amzn2022
ALAS-2021-008   log4j rpm        amazon:distro:amazonlinux:2022  < 2.17.0-1.amzn2022.0.1
ALAS-2022-011   log4j rpm        amazon:distro:amazonlinux:2022  < 2.17.1-1.amzn2022.0.1
ALAS-2022-1739  log4j rpm        amazon:distro:amazonlinux:2     < 1.2.17-17.amzn2
ALAS-2022-1750  log4j rpm        amazon:distro:amazonlinux:2     < 1.2.17-18.amzn2
ALAS-2022-225   log4j rpm        amazon:distro:amazonlinux:2022  < 2.17.2-1.amzn2022.0.3
CVE-2017-5645   log4j rpm        redhat:distro:redhat:5
CVE-2017-5645   cpe:2.3:a:apache:log4j:*:*:*:*:*:* nvd:cpe >= 2.0, < 2.8.2
...

search vuln enables being able to search just for vulnerability records:

$ grype db search vuln CVE-2021-44228              

ID PROVIDER PUBLISHED   SEVERITY REFERENCE
CVE-2021-44228  debian (10, 11, 12, 13, unstable) negligible https://security-tracker.debian.org/tracker/CVE-2021-44228
CVE-2021-44228  debian (9) critical https://security-tracker.debian.org/tracker/CVE-2021-44228
CVE-2021-44228  nvd 2021-12-10  CVSS:3.1/AV:N/AC:L/PR:N/UI:N/S:C/C:H/I:H/A:H  https://nvd.nist.gov/vuln/detail/CVE-2021-44228
CVE-2021-44228  sles (15.4, 15.5, 15.6) critical https://www.suse.com/security/cve/CVE-2021-44228
CVE-2021-44228  ubuntu (14.4, 16.4, 18.4, 20.4, 21.10, 21.4) high https://ubuntu.com/security/CVE-2021-44228

As with all of our tools, there is -o json available with these commands to be able to explore the raw affected package, affected CPE, and vulnerability records:

$ grype db search vuln CVE-2021-44228 -o json --provider nvd
[
 {
  "id": "CVE-2021-44228",
  "assigner": [
   "[email protected]"
  ],
  "description": "Apache Log4j2 2.0-beta9 through 2.15.0 (excluding security releases 2.12.2, 2.12.3, and 2.3.1) JNDI features...",
  "refs": [...],
  "severities": [...],
  "provider": "nvd",
  "status": "active",
  "published_date": "2021-12-10T10:15:09.143Z",
  "modified_date": "2025-02-04T15:15:13.773Z",
  "known_exploited": [
   {
    "cve": "CVE-2021-44228",
    "vendor_project": "Apache",
    "product": "Log4j2",
    "date_added": "2021-12-10",
    "required_action": "For all affected software assets for which updates exist, the only acceptable remediation actions are: 1) Apply updates; OR 2) remove affected assets from agency networks. Temporary mitigations using one of the measures provided at https://www.cisa.gov/uscert/ed-22-02-apache-log4j-recommended-mitigation-measures are only acceptable until updates are available.",
    "due_date": "2021-12-24",
    "known_ransomware_campaign_use": "known",
    "urls": [
     "https://nvd.nist.gov/vuln/detail/CVE-2021-44228"
    ],
    "cwes": [
     "CWE-20",
     "CWE-400",
     "CWE-502"
    ]
   }
  ],
  "epss": [
   {
    "cve": "CVE-2021-44228",
    "epss": 0.97112,
    "percentile": 0.9989,
    "date": "2025-03-03"
   }
  ]
 }
]

Dramatic Size Reduction: The Technical Journey

One of the standout improvements of v6 is the dramatic size reduction:

Metric Schema v5 Schema v6 Improvement
Raw DB Size 1.6 GB 900 MB 44% smaller
Compressed Archive 210 MB 65 MB 69% smaller

This means you'll experience significantly faster database updates and reduced storage requirements.

We build and distribute Grype database archives daily to provide users with the most up-to-date vulnerability information. Over the past five years, we've added more vulnerability sources, and the database has more than doubled in size, significantly impacting users who update their databases daily.

Our optimization strategy included:

  1. Switching to zstandard compression: This yields better compression ratios compared to gzip, providing immediate space savings.
  2. Database layout optimization: We prototyped various database layouts, experimenting with different normalization levels (database design patterns that eliminate data redundancy). While higher normalization saved space in the raw database, it sometimes yielded worse compression results. We found the optimal balance between normalization and leaving enough unnormalized data for compression algorithms to work effectively.

Real-World Impact

These improvements directly benefit several common scenarios:

  • CI/CD Pipelines: With a 69% smaller download size, your CI/CD pipelines will update vulnerability databases faster, reducing build times and costs.
  • Air-gapped Environments: If you're working in air-gapped environments and need to transport the database, its significantly smaller size makes this process much more manageable.
  • Resource-constrained Systems: The smaller memory footprint means Grype can now run more efficiently on systems with limited resources.

Conclusion

The evolution of the Grype database schema from v5 to v6 marks a significant milestone. By rethinking our database structure and using the OSV schema as inspiration, we've created a more efficient, scalable, and feature-rich database that directly benefits your vulnerability management workflows.

We'd like to encourage you to update to the latest version of Grype to take advantage of these improvements. If you have feedback on the new schema or ideas for further enhancements, please share them with us on Discourse, and if you spot a bug, let us know on GitHub.

If you’d like to get updates about the Anchore Open Source Community, sign up for our low-traffic community newsletter. Stay tuned for more updates as we refine Grype and empower your security practices!