Can an LLM Really Fix a Bug? A Start-to-Finish Case Study

The software industry faces a growing problem: we have far more open issues than we have contributors multiplied by available time. Every project maintainer knows this pain. We certainly recognize this across our open source tools Syft, Grype and Grant.

The backlogs grow faster than teams can address them, and “good first issues” can sit untouched for months. What if we could use AI tools not just to write code, but to tackle this contributor time shortage systematically?

Friends and podcast presenters frequently share their perspective that LLMs are terrible at coding tasks. Rather than accepting this at face value, I wanted to conduct a controlled experiment using our own open source tools at Anchore.

My specific hypothesis: Can an LLM take me from start to finish; selecting a bug to work on, implementing a fix, and submitting a pull request that gets merged; while helping me learn something valuable about the codebase?

Explore SBOM use-cases for almost any department of the enterprise and learn how to unlock enterprise value to make the most of your software supply chain.

Download Now

WHITE PAPER Rnd Rect | Unlock Enterprise Value with SBOMs: Use-Cases for the Entire Organization

Finding the Right Issue to Work On

Most repositories tag specific issues as “good-first-issue”, a label typically assigned by the core developers. They tend to know the project well enough to identify work items suitable for newcomers. These issues represent the sweet spot: meaningful contributions that may not require deep architectural knowledge, which is why I think they might be suitable for this test.

Rather than manually browsing through dozens of issues, I wrote a quick script that uses gh to gather all the relevant data systematically. The expectation is that I can benefit from an LLM to pick an appropriate issue from this list.

#!/bin/bash
# Script to find and save open issues with 
# a specific label from a GitHub repository.
# 
# Usage: ./find-labelled-issues.sh [org/repo] [label] [limit]
set -e

repo="${1:-anchore/syft}"
label="${2:-good-first-issue}" 
limit="${3:-50}"
tmpfile=$(mktemp)
results="./results/$repo"

cleanup() {
    rm -f "$tmpfile"
}
trap cleanup EXIT

mkdir -p "$results"

# Grab the issues with the specified label
echo "Fetching issues from repo with label 'label'..."
gh issue list -R "repo" --label "$label" --state "open" --limit "$limit" --json number --jq '.[] | .number' > "tmpfile"

while read -r issue_number; do
    echo "Processing repo issue #issue_number"
    filename="(echo $repo | tr '/' '_')_issue_issue_number.json"
    gh issue view "issue_number" -R "$repo" --json title,body,author,createdAt,updatedAt,comments,labels --jq '. | {title: .title, body: .body, author: .author.login, createdAt: .createdAt, updatedAt: .updatedAt, comments: .comments, labels: [.labels[].name]}' | jq . > "$results/filename"
done < "$tmpfile"

echo "All issues processed. Results saved in the $results directory."

#!/bin/bash
# Script to find and save open issues with 
# a specific label from a GitHub repository.
# 
# Usage: ./find-labelled-issues.sh [org/repo] [label] [limit]
set -e

repo="${1:-anchore/syft}"
label="${2:-good-first-issue}" 
limit="${3:-50}"
tmpfile=$(mktemp)
results="./results/$repo"

cleanup() {
    rm -f "$tmpfile"
}
trap cleanup EXIT

mkdir -p "$results"

# Grab the issues with the specified label
echo "Fetching issues from repo with label 'label'..."
gh issue list -R "repo" --label "$label" --state "open" --limit "$limit" --json number --jq '.[] | .number' > "tmpfile"

while read -r issue_number; do
    echo "Processing repo issue #issue_number"
    filename="(echo $repo | tr '/' '_')_issue_issue_number.json"
    gh issue view "issue_number" -R "$repo" --json title,body,author,createdAt,updatedAt,comments,labels --jq '. | {title: .title, body: .body, author: .author.login, createdAt: .createdAt, updatedAt: .updatedAt, comments: .comments, labels: [.labels[].name]}' | jq . > "$results/filename"
done < "$tmpfile"

echo "All issues processed. Results saved in the $results directory."

This script does the heavy lifting of gathering not just the initial bug reports, but all the comments and discussions that often contain crucial implementation hints from the project maintainers.

I ran this across multiple Anchore repositories, to get the first fifty:

for repo in syft grype grant stereoscope; do 
    ./find-good-first-issues.sh anchore/$repo good-first-issue 50
done

for repo in syft grype grant stereoscope; do 
    ./find-good-first-issues.sh anchore/$repo good-first-issue 50
done

Letting the LLM Choose

With all the data collected, I presented the entire set to Claude and asked them to recommend which issue I should work on. I deliberately provided it with minimal criteria, allowing the LLM to develop its own evaluation framework.

Claude devised an evaluation criterion based on the following factors and weights:

Impact & User Value (Weight: 30%)

High: Critical functionality, affects many users, or enables new important use cases
Medium: Useful improvements or bug fixes affecting moderate user base
Low: Nice-to-have features or edge cases

Implementation Complexity (Weight: 25%)

Easy: Clear requirements, well-defined scope, straightforward implementation
Medium: Some complexity but manageable with good planning
Hard: Complex architecture changes, unclear requirements, or extensive testing needed

Information Quality (Weight: 20%)

Excellent: Clear problem description, reproduction steps, examples, context
Good: Adequate information with some details
Poor: Vague description, missing context or examples

LLM Assistance Potential (Weight: 15%)

High: Pattern matching, code generation, template work, documentation
Medium: Some assistance possible for research or boilerplate
Low: Requires deep domain expertise or complex architectural decisions

Community Need (Weight: 10%)

High: Many comments, long-standing issue, or frequently requested
Medium: Some community interest
Low: Limited engagement

Using the above criteria, Claude grouped the current open issues into three “Tiers” and “Avoid”, for those that might be too complex or have insufficient information to complete. The “Top Tier” issues included Syft issue #675 – “Markdown Table Output Support”, and #2555 – “Windows .NET6 Dependency Parsing Fix”, but finally landed on Syft issue #2250 – “package.json authors keyword parsing”. This involves parsing the authors field in package.json files. Quoting the LLM’s response, the rationale was somewhat compelling:

"This issue stands out as the best choice because it's ideal for AI assistance—it involves JSON parsing, regex patterns, and code generation—all areas where LLMs excel. The problem has crystal clear requirements with exact examples of expected JSON structures, a manageable scope contained in a single file. It provides real user impact by improving npm package metadata parsing accuracy."

"This issue stands out as the best choice because it's ideal for AI assistance—it involves JSON parsing, regex patterns, and code generation—all areas where LLMs excel. The problem has crystal clear requirements with exact examples of expected JSON structures, a manageable scope contained in a single file. It provides real user impact by improving npm package metadata parsing accuracy."

The issue was well-documented with specific examples:

"authors": [
   "Harry Potter <hp@hogwards.com> (http://youknowwho.com/)",
   "John Smith <j.smith@something.com> (http://awebsite.com/)"
]

"authors": [
   "Harry Potter <[email protected]> (http://youknowwho.com/)",
   "John Smith <[email protected]> (http://awebsite.com/)"
]

Starting the Development Work

With the issue selected, I moved into VS Code and enabled GitHub Copilot’s agent mode, which allows the AI to edit code and run commands rather than just make suggestions. My initial prompt was deliberately simple:

#codebase

This is the syft codebase. Syft is an SBOM generator.

I have analyzed all the open bugs in syft that are tagged "good first issue" and found 2250 a good one to start with.

The rationale is in #file:anchore_syft_issue_2250_rationale.md and the bug itself is detailed in #file:anchore_syft_issue_2250.json

Please formulate a plan for implementing the fix. Do not start working on the code. I would like you to break down the fix into the necessary steps and explain them. If you need more information, ask questions.

#codebase

This is the syft codebase. Syft is an SBOM generator.

I have analyzed all the open bugs in syft that are tagged "good first issue" and found 2250 a good one to start with.

The rationale is in #file:anchore_syft_issue_2250_rationale.md and the bug itself is detailed in #file:anchore_syft_issue_2250.json

Please formulate a plan for implementing the fix. Do not start working on the code. I would like you to break down the fix into the necessary steps and explain them. If you need more information, ask questions.

The key was treating this as a collaborative process. I read every response, examined all generated code, and made sure I understood each step. Working in a feature branch meant I could experiment freely, abandon approaches that weren’t working, and restart with different prompts when needed. I was under no obligation to accept any of the suggestions from the LLM.

The Iterative Process

The most valuable part of this experiment was the back-and-forth dialog. When the LLM-generated code was unclear to me, I asked questions. When it made assumptions about the codebase structure, I corrected them. When it needed more context about contributing guidelines, I provided that information by directing it to the CONTRIBUTING.md and DEVELOPING.md files from the repository.

This iterative approach allowed me to learn about the Syft codebase structure, Go programming patterns, and the project’s testing conventions throughout the process. The LLM worked as a knowledgeable pair-programming partner rather than a black-box code generator.

Testing and Validation

The LLM automatically detected the project’s existing test structure and generated appropriate test cases for the new functionality. It was understood that any changes needed to maintain backward compatibility and avoid breaking existing package.json parsing behavior.

Running the test suite confirmed that the implementation worked correctly and didn’t introduce regressions, a crucial step that many rushed “vibe-coded” AI-assisted contributions skip.

Pull Request Creation

When the code was ready, I asked the LLM to draft a pull request description using the project’s template. I edited this slightly to match my writing style before submitting, but the generated description covered all the key points: what the change does, why it’s needed, and how it was tested.

The pull request was submitted like any other contribution and entered the standard review process.

Results and Lessons Learned

The experiment succeeded: the pull request was merged after review and feedback from the maintainers. But the real insights came from what happened during the process:

Speed Gains: The development process was somewhere around 3-5 times faster than if I had tackled this issue manually. The LLM handled the routine parsing logic while I focused on understanding the broader codebase architecture.

Learning Acceleration: Rather than just producing code, the process accelerated my understanding of how Syft’s package parsing works, Go testing patterns, and the project’s contribution workflow.

Maintainer Perspective: The project maintainers could tell the code was AI-assisted (interesting in itself), but this wasn’t a significant problem. They provided thoughtful feedback that I was able to incorporate with the LLM’s help.

Room for Improvement: I should have explicitly pointed the LLM to the contributing guidelines instead of relying on the codebase to infer conventions. This would have saved some iteration cycles.

When This Approach Makes Sense

I wouldn’t use this process for every contribution. Consuming all the good-first-issues would leave nothing for human newcomers who want to learn through direct contribution. The sweet spot seems to be:

Straightforward issues with clear requirements.
Learning-focused development where you want to understand a new codebase.
Time-constrained situations where you need to move faster than usual.
Problems that involve routine parsing or data transformation logic.

Future Refinements

For the next contributions, I will make several improvements:

Add explicit prompts to match my writing style for pull request descriptions.
Point the LLM directly to the contributing guidelines and coding standards, which are in the repository, but sometimes require explicit mention.
Consider working on issues that aren’t tagged as “good-first-issue” to preserve those seemingly “easier” ones for other human newcomers.
Add a note in the pull request acknowledging the use of a tool-assisted approach.

The goal isn’t to replace human contributors, but to find ways that AI tools can help us tackle the growing backlog of open issues while genuinely accelerating our learning and understanding of the codebases we work with.

This experiment suggests that with the right approach, LLMs can be valuable partners in open source contribution, not just for generating code, but for navigating unfamiliar codebases and understanding project conventions. The key is maintaining active engagement with the entire process, rather than treating AI as a one-click magic solution.

After conducting this experiment, I discussed the outcomes with the Anchore Open Source team, which welcomes contributions to all of its projects. They were quick to point out the quality difference between a well-curated AI-assisted pull request and a “vibe-coded” one, thrown over the wall.

What similar experiments would you want to see? The intersection of AI tools and open-source contributions feels like fertile ground for making our development workflows both faster and more educational.

Learn about the role that SBOMs for the security of your organization in this white paper.

Download Now

Learn about the role that SBOMs for the security, including open source software (OSS) security, of your organization in this white paper.