To Update or Not to Update

In the previous blog, we presented our analysis of image update frequency for official DockerHub images and the implications for application images built on top of these base images. It was pointed out in a Reddit reply by /u/AskOnDock29 that users can update the operating system packages in the images themselves, independently of the official image and so the frequency, or infrequency, of base image updates, is not a concern since this is easily manageable by end-users. This Redditor is indeed correct, users can update operating system packages when building on top of an official or other base image. Whether this happens, in reality, is an interesting question that we will get to shortly.

When the Anchore Navigator downloads images from Docker Hub we derive the Dockerfile from metadata contained in the image. The Anchore Navigator’s Build Summary pane on the overview tab displays this information by showing the commands run in each layer of the dockerfile. Using library/msyql as an example, we see that new files and packages are added to the image however the base packages are not updated.

This is the view that the navigator gives of the mysql dockerfile. The derived Dockerfile is not identical to the Dockerfile used to construct the image, since metadata such as the name of files copied or the image that was used as a base, for example, are lost during the build. But the derived Dockerfile does include the commands used and image metadata. In this example, searching through each layer, we do not find package update instructions.

Running some quick analysis against our dataset, out of the 22,413 non-official images tagged as ‘latest’ since September of last year, 6,099 (27%) included package update commands in their Dockerfiles. Grouping by repository instead of image, 80 out of 559 (14%) non-official repositories at some point over the last year had update commands. This does not mean that all of these images have outdated packages or known CVEs since their base images may be up to date and there are other ways to include the latest packages, for example starting from scratch and adding in files and packages manually. Anchore’s dataset includes file and package manifests for all of these images so we can verify the package set to look for updates without requiring analysis of the Dockerfile.

So should a user upgrade operating system packages when they build their images? Ideally no. Docker’s best practices recommend that the user should not run apt-get upgrade or dist-upgrade within their Dockerfile, but should instead contact the maintainer of the parent image used. Minimizing package changes within the image also helps to improve the reproducibility of the build.

If there are no upgrade/update commands or manual package management, then there are two ways to keep the base image up to date: the user can update it manually, or the maintainers can push a new image whenever the base image is updated, which the user then has to use as they rebuild their image. As previously covered, the second method is preferable. Since the majority of repositories do not use upgrade commands and therefore depend on the maintainer or user updating the base image, it would be interesting to see how well repositories are at handling the responsibility and keeping up to date with base image updates.

Starting with about 10,000 public community images, we see that a new, updated image is pushed close to a week (six days and 20 hours) after an updated operating system image is made available. We have excluded the first image of each repository from the analysis since our focus is the frequency and timing of updates. The analysis does include many images that are just small side projects that served a single purpose and aren’t actively updated. At the same time, images that get updated nightly are also in there, so there is a balance.

However, looking at a number of the most popular community images offers a view into repositories that have a following and need to be maintained more actively than other images.

Here, the average time to update is just a little over five days, a good bit lower than the average for all of the images.

As mentioned earlier, no official images include update commands, so updates have to come in the form of image updates. Due to their elevated standard and visibility, these updates need to be timely; an update to the base image should be responded to quickly. We see that this is, in fact, the case; the average update times across the official images are about one day and 10 hours.

Taking a similar look into popular images, things to note are that none of these respond in longer than three days, and there is no real correlation between popularity and update times among these images. A repo like library/node takes nearly three days on average, while library/mysql takes a little over half a day. There is certainly a correlation on a larger scale – more popular images have quicker update times – but there is quite a bit of variance along the way.

To fully visualize see why these updates matter, we’ll go through the life cycle of a security flaw, RHSA-2017:1481. This flaw exploited the glibc package in Red Hat Enterprise Linux (RHEL) and could allow a user to increase their privileges. Because CentOS is compiled from RHEL sources, any images that are built on top of RHEL or CentOS carry this flaw. To focus on just one, we will be looking at jboss/wildfly, which is built on top of CentOS. Knowledge of the flaw was made public on June 19th of this year, and a fix was published by Red Hat almost immediately, with the fix for CentOS being made available on June 20th. The CentOS image was then updated to include the fix 15 days later on July 5th.

Using the security tab for that image, you can see that RHSA-2017:1481 is not present.

However, clicking on the previous image button will take you to the image that was pushed on June 5th, which was affected by the glibc flaw.

The maintainers of the jboss/wildfly image, have a really good update schedule, so a new image that implemented the fix was made available within only 50 minutes of the CentOS image being released, however since the parent image was only updated after 15 days the wildfly image was vulnerable during that period.

There are a number of key points to take away from this analysis:

  • Choose your base images carefully. Ensure that the base image you are using is well maintained. If not consider maintaining your own base image or pick a different base image to use.
  • Just because an image is official that does not mean that it is frequently updated or necessarily the best image to build from. You may other repos that have images suited to your needs.
  • Keep track of updates to the base images that you use. One method for tracking updates and receiving notifications is covered in this blog.

The frequency of updates is not the only metric to consider when looking at images; you need to know what has changed. Was the image just rebuilt based on a schedule? Or were files and packages changed? Users often focus solely on CVE (security) updates but do not consider other package updates that include bugfixes. In our next blog, we will take a deeper look into what changes in an image update.