To Update or Not to Update

In the previous blog, we presented our analysis of image update frequency for official DockerHub images and the implications for application images built on top of these base images. It was pointed out in a Reddit reply by /u/AskOnDock29 that users can update the operating system packages in the images themselves, independently of the official image and so the frequency, or infrequency, of base image updates, is not a concern since this is easily manageable by end-users. This Redditor is indeed correct, users can update operating system packages when building on top of an official or other base image. Whether this happens, in reality, is an interesting question that we will get to shortly.

When the Anchore Navigator downloads images from Docker Hub we derive the Dockerfile from metadata contained in the image. The Anchore Navigator’s Build Summary pane on the overview tab displays this information by showing the commands run in each layer of the dockerfile. Using library/msyql as an example, we see that new files and packages are added to the image however the base packages are not updated.

This is the view that the navigator gives of the mysql dockerfile. The derived Dockerfile is not identical to the Dockerfile used to construct the image, since metadata such as the name of files copied or the image that was used as a base, for example, are lost during the build. But the derived Dockerfile does include the commands used and image metadata. In this example, searching through each layer, we do not find package update instructions.

Running some quick analysis against our dataset, out of the 22,413 non-official images tagged as ‘latest’ since September of last year, 6,099 (27%) included package update commands in their Dockerfiles. Grouping by repository instead of image, 80 out of 559 (14%) non-official repositories at some point over the last year had update commands. This does not mean that all of these images have outdated packages or known CVEs since their base images may be up to date and there are other ways to include the latest packages, for example starting from scratch and adding in files and packages manually. Anchore’s dataset includes file and package manifests for all of these images so we can verify the package set to look for updates without requiring analysis of the Dockerfile.

So should a user upgrade operating system packages when they build their images? Ideally no. Docker’s best practices recommend that the user should not run apt-get upgrade or dist-upgrade within their Dockerfile, but should instead contact the maintainer of the parent image used. Minimizing package changes within the image also helps to improve the reproducibility of the build.

If there are no upgrade/update commands or manual package management, then there are two ways to keep the base image up to date: the user can update it manually, or the maintainers can push a new image whenever the base image is updated, which the user then has to use as they rebuild their image. As previously covered, the second method is preferable. Since the majority of repositories do not use upgrade commands and therefore depend on the maintainer or user updating the base image, it would be interesting to see how well repositories are at handling the responsibility and keeping up to date with base image updates.

Starting with about 10,000 public community images, we see that a new, updated image is pushed close to a week (six days and 20 hours) after an updated operating system image is made available. We have excluded the first image of each repository from the analysis since our focus is the frequency and timing of updates. The analysis does include many images that are just small side projects that served a single purpose and aren’t actively updated. At the same time, images that get updated nightly are also in there, so there is a balance.

However, looking at a number of the most popular community images offers a view into repositories that have a following and need to be maintained more actively than other images.

Here, the average time to update is just a little over five days, a good bit lower than the average for all of the images.

As mentioned earlier, no official images include update commands, so updates have to come in the form of image updates. Due to their elevated standard and visibility, these updates need to be timely; an update to the base image should be responded to quickly. We see that this is, in fact, the case; the average update times across the official images are about one day and 10 hours.

Taking a similar look into popular images, things to note are that none of these respond in longer than three days, and there is no real correlation between popularity and update times among these images. A repo like library/node takes nearly three days on average, while library/mysql takes a little over half a day. There is certainly a correlation on a larger scale – more popular images have quicker update times – but there is quite a bit of variance along the way.

To fully visualize see why these updates matter, we’ll go through the life cycle of a security flaw, RHSA-2017:1481. This flaw exploited the glibc package in Red Hat Enterprise Linux (RHEL) and could allow a user to increase their privileges. Because CentOS is compiled from RHEL sources, any images that are built on top of RHEL or CentOS carry this flaw. To focus on just one, we will be looking at jboss/wildfly, which is built on top of CentOS. Knowledge of the flaw was made public on June 19th of this year, and a fix was published by Red Hat almost immediately, with the fix for CentOS being made available on June 20th. The CentOS image was then updated to include the fix 15 days later on July 5th.

Using the security tab for that image, you can see that RHSA-2017:1481 is not present.

However, clicking on the previous image button will take you to the image that was pushed on June 5th, which was affected by the glibc flaw.

The maintainers of the jboss/wildfly image, have a really good update schedule, so a new image that implemented the fix was made available within only 50 minutes of the CentOS image being released, however since the parent image was only updated after 15 days the wildfly image was vulnerable during that period.

There are a number of key points to take away from this analysis:

  • Choose your base images carefully. Ensure that the base image you are using is well maintained. If not consider maintaining your own base image or pick a different base image to use.
  • Just because an image is official that does not mean that it is frequently updated or necessarily the best image to build from. You may other repos that have images suited to your needs.
  • Keep track of updates to the base images that you use. One method for tracking updates and receiving notifications is covered in this blog.

The frequency of updates is not the only metric to consider when looking at images; you need to know what has changed. Was the image just rebuilt based on a schedule? Or were files and packages changed? Users often focus solely on CVE (security) updates but do not consider other package updates that include bugfixes. In our next blog, we will take a deeper look into what changes in an image update.

A Look at How Often Docker Images are Updated

In our last blog, we reported on operating systems usage on Docker Hub, focusing on official base images.

Most users do not build their container image from scratch, they build on top of these base images. For example, extending an image such as library/alpine:latest with their own application content. Whenever one of these base operating system images is updated, images built on top are typically rebuilt in order to inherit the fixes included in the base image. In this blog, we will be looking at the update frequency of base images: frequency of updates, changes made and how that impacts end users.

Whenever one of these base operating system images is updated, images built on top are typically rebuilt in order to inherit the fixes included in the base image. In this blog, we will be looking at the update frequency of base images: frequency of updates, changes made and how that impacts end users.

If you want to check on the updated history of a particular image the Navigator makes that simple.

For example, looking at debian:latest, currently the most popular base operating system among official images.

Here you can see the date that the image was created and when it was analyzed by Anchore.
The Anchore service continually polls Docker Hub and when an update to a repository is detected the list of tags and images are retrieved and any new images are downloaded and analyzed. Anchore maintains a database with image and tag history so at any point in time previous versions of a Tag may be inspected. Clicking on the Previous Image button navigates to the image that was tagged library/debian:latest previously. The Next Image button is disabled because, at the time the screenshot was acquired, image ID this was the latest image tagged as library/debian:latest.

By clicking on Previous Image in the top left, you can explore the Navigator’s analysis of older images of the same tag. In this case, the previous version is only a week old, but for the most part, you will see that Debian is updated every two to five weeks.

Putting these dates onto a timeline, we see that debian:latest is updated roughly every month. Looking at the update frequency of other popular official operating system images, once a month is just about average. While this might seem ideal, what really matters is the timeliness of updates and the content of the update. For example, if a new critical vulnerability is discovered the day after the scheduled image update then a user should not wait another month for an update. Users can certainly update these images with fixes, in fact, this should be part of the due diligence that is performed in creating images, however, the content published in public registries should be secure off-the-shelf.

This timeline compares the update frequency of some major operating system repositories. Of these repos, none have a fixed update schedule. Ubuntu and Debian are pretty consistent, while the rest are quite varied. For example, CentOS sticks to about an update a month now, but before would have large gaps, up to 3 months long, between updates. On the flip side, Oracle Linux has clusters where multiple updates will come out in a short time period. What is interesting is that there are 4 that all have had 8 or 9 updates over the last year. Is that the number where exposure to security issues and pushing too many updates is balanced? Something else to consider is that having more packages means that there are more things to keep up to date, so lightweight operating systems like Alpine and BusyBox do not need as much maintenance. However, this doesn’t explain why CentOS and Fedora are updated infrequently, as they are both much larger than Debian and Ubuntu.

Moving on to popular non-OS images, the difference in update frequency is striking. NGINX, the repo with the fewest updates here, has more updates than Oracle Linux, which had the most updates of all the operating systems. Calling back to the fact that more complexity means more maintenance, this increase makes sense. In future blogs, we will dig into what is changing between image updates.

Because many of the application images are built on top of official base operating system images, in theory, they should be rebuilt when the underlying base image is updated. Sadly that is often not the case, where we will see a base operating system image be updated with a fix but the application image may not be rebuilt for several weeks and in some cases, it is rebuilt on top of an older base operating system image.

While all official images should follow Docker Hub best practices and should, therefore, be well maintained it is clear from our historic data that many images can be updated infrequently and carry security vulnerability for many weeks.

If you are trying to choose a non-official image, it is important that you look into its update history, since many images on Docker Hub are one-offs that were built by an engineer to ‘scratch an itch’ pushed to Docker Hub and never maintained. While that image may seem to have exactly what you are looking for it’s important to note that you are in effect adopting the image and you are then responsible for its care and feeding!

One last interesting piece of information is that there are a few days (10/21, 1/17, 2/28, 4/25, …) where many of the repos push updates at the same time. In many cases this occurs the day after their base image, debian:latest was updated. This backs up the idea that these images update more frequently because they have to keep up with updates of their base image.

As we alluded to earlier the content of an image update is just as, if not more, important than the timing of an update. In the next blog, we’ll dig into a more detailed timeline of updates, starting with the disclosure of a vulnerability, when the operating system vendor patched it, when that patch was included in a container image and when an application image pulled in the update.

Just Because They Pushed Doesn’t Mean You Need to Pull

While that may sound like advice your mother gave you after you got into a fight at school we are actually talking about Docker images.

Yesterday we started to notice a lot of activity on our worker nodes on anchore.io which were analyzing a large number of images that were updated on Docker Hub.

The Anchore service monitors Docker Hub looking for changes made to our customer’s private images, official images and thousands of other tags of popular images on Docker Hub.

We poll Docker Hub and when images are updated our workers pull down the new images and perform analysis and policy evaluations. Users can also subscribe to images to get notifications when images they use are updated.

Since yesterday we’ve seen over a thousand images get updated including official OS based images such as Alpine, CentOS, Debian, Oracle, and Ubuntu.

What was odd was that looking at these images we saw no changes in files or package manifests. As part of Anchore’s analysis we look at all the files in the image down to the checksum level and all the package data, this allows us to perform policy checks that go beyond the usual CVE checks that you see with most tools.

We show a brief changelog summary on the overview page for an image, showing how many files and packages were added, removed or changed.

What had us scratching our heads yesterday was the high number of images with no apparent changes. The image metadata, such as ID and Digest were changed but the underlying content was the same.

Digging deeper it appears that while with the actual content of the images has not changed, the manifests have been updated. This seems to have been driven by a change to the bashbrew utility which is used to build official images. Bashbrew now defaults to using the manifest list format which allows for multi-arch images, so even if an image has been built only for a single architecture it will now use the manifest list.

We will continue to dig into this but in the meantime, we’d recommend that you look to see what, if anything, changed in an image before you rebuild all your application images on top of a new base image.

Introducing the Anchore Engine

Today Anchore announced a new open source project that allows users to install a local copy of the powerful container analysis and policy engine that powers the Anchore Navigator service.

The Anchore Engine is an open source project that provides a centralized service for inspection, analysis and certification of container images. The Anchore Engine is provided as a Docker container image that can be run standalone or on an orchestration platform such as Kubernetes, Docker Swarm, Rancher or Amazon ECS.

Using the Anchore Engine, container images can be downloaded from Docker V2 compatible container registries, analyzed and evaluated against user-defined policies. The Anchore Engine can integrate with Anchore’s Navigator service allowing you to define policies and whitelists using a graphical editor that is automatically synchronized to the Anchore Engine.

The Anchore Engine can be integrated into CI/CD pipelines such as Jenkins to secure your CI/CD pipeline by adding image scanning including not just CVE based security scans but policy-based scans that can include checks around security, compliance and operational best practices.

The Anchore Engine can be accessed directly through a RESTful API or via the Anchore CLI. Adding an image to be analyzed is a simple one-line command:

$ anchore-cli image add docker.io/library/nginx:latest

The Anchore Engine will now download the image from the registry and perform deep inspection collecting data on packages, files, software artifacts and image metadata.

Once analyzed we can retrieve information about the image. For example, retrieving a list of packages:

$ anchore-cli image content docker.io/library/nginx:latest os

Will return a list of operating system (os) packages found in the image. In addition to operating system packages, we can retrieve details about files, Ruby GEMs and Node.JS NPMs.

$ anchore-cli image content docker.io/library/rails:latest gem
Package Version Location
actioncable 5.0.1 /usr/local/bundle/specifications/actioncable-5.0.1.gemspec
actionmailer 5.0.1 /usr/local/bundle/specifications/actionmailer-5.0.1.gemspec
actionpack 5.0.1 /usr/local/bundle/specifications/actionpack-5.0.1.gemspec
actionview 5.0.1 /usr/local/bundle/specifications/actionview-5.0.1.gemspec
activejob 5.0.1 /usr/local/bundle/specifications/activejob-5.0.1.gemspec
activemodel 5.0.1 /usr/local/bundle/specifications/activemodel-5.0.1.gemspec
activerecord 5.0.1 /usr/local/bundle/specifications/activerecord-5.0.1.gemspec
activesupport 5.0.1 /usr/local/bundle/specifications/activesupport-5.0.1.gemspec
arel 7.1.4 /usr/local/bundle/specifications/arel-7.1.4.gemspec

And if we wanted to see how many security vulnerabilities in an image you can run the following command:

$ anchore-cli image vuln docker.io/library/ubuntu:latest os
Vulnerability ID Package Severity Fix Vulnerability URL
CVE-2013-4235 login-1:4.2-3.1ubuntu5.3 Low None http://people.ubuntu.com/~ubuntu-security/cve/CVE-2013-4235
CVE-2013-4235 passwd-1:4.2-3.1ubuntu5.3 Low None http://people.ubuntu.com/~ubuntu-security/cve/CVE-2013-4235
CVE-2015-5180 libc-bin-2.23-0ubuntu9 Low None http://people.ubuntu.com/~ubuntu-security/cve/CVE-2015-5180
CVE-2015-5180 libc6-2.23-0ubuntu9 Low None http://people.ubuntu.com/~ubuntu-security/cve/CVE-2015-5180
CVE-2015-5180 multiarch-support-2.23-0ubuntu9 Low None http://people.ubuntu.com/~ubuntu-security/cve/CVE-2015-5180

As with the content sub-command we pass a parameter for the type of content we want to analyze – in this case, OS for operating system packages. Future releases will add support for non-package vulnerability data.

Next, we can evaluate the image against a policy that was defined either manually on the command line or using the Anchore Navigator

$ anchore-cli evaluate check registry.example.com/webapps/frontend:latest
Image Digest: sha256:86774cefad82967f97f3eeeef88c1b6262f9b42bc96f2ad61d6f3fdf54475ac3
Full Tag: registry.example.com/webapps/frontend:latest
Status: pass
Last Eval: 2017-09-09T18:30:22
Policy ID: 715a6056-87ab-49fb-abef-f4b4198c67bf

Here we can see that the image passed. To see the details of the evaluation you can add the –detail parameter. For example:

$ anchore-cli evaluate check registry.example.com/webapps/broker:latest --detail
Image Digest: sha256:7f97f3eeeef88c1b6262f9b42bc96f2ad61d6f3fdf54475ac354475ac
Full Tag: registry.example.com/webapps/broker:latest
Status: fail
Last Eval: 2017-09-09T17:30:22
Policy ID: 715a6056-87ab-49fb-abef-f4b4198c67bf

Gate                   Trigger              Detail                                                          Status        
DOCKERFILECHECK        NOHEALTHCHECK        Dockerfile does not contain any HEALTHCHECK instructions        warn
ANCHORESEC             VULNHIGH             HIGH Vulnerability found in package - libmount1 (CVE-2016-2779 - https://security-tracker.debian.org/tracker/CVE-2016-2779)                    stop          
ANCHORESEC             VULNHIGH             HIGH Vulnerability found in package - libncurses5 (CVE-2017-10684 - https://security-tracker.debian.org/tracker/CVE-2017-10684)                stop          
ANCHORESEC             VULNHIGH             HIGH Vulnerability found in package - libncurses5 (CVE-2017-10685 - https://security-tracker.debian.org/tracker/CVE-2017-10685)                stop

Here you can see that the broker image failed the policy evaluation due to 3 high severity vulnerabilities.

We can subscribe to an image to receive webhook notifications when an image is updated when new security vulnerabilities are found or if the image’s policy status is updated – for example going from Fail to Pass.

$ anchore-cli subscription activate image tag_update registry.example.com/webapps/broker:latest

A Breakdown of Operating Systems of Docker Hub

While containers are thought of as “micro-services” or applications, if you open up the image you will see more than just an application – more often than not, you’ll see an entire operating system image along with the application. If you dig into the image you will find that certain parts of the operating system are missing such as kernel and hardware-specific modules and often, but sadly not always, the package list is reduced. If you are deploying a pre-packaged container built by a 3rd party you may not even know what operating system has been used to build the container let alone what packages are inside.

As part of the analysis that Anchore performs on the container, it identifies the underlying operating system. To check this out go to the Anchore Navigator and search for the image that you wish to inspect. Halfway down on the overview tab you’ll see the operating system name and version listed. For example, searching for library/nginx:latest will show that it is built on top of Debian 9, Stretch.

nginx

Let’s take a look at what operating systems are used on Docker Hub:

  • Which operating system gets used the most?
  • How has the choice of operating system changed over time?
  • Are there different usage patterns for official images compared to public images?

To get our toes wet, here is the breakdown of what operating systems official images are being built on.

It is clear that Debian is the most popular, with Alpine taking second place, and then a number of others each taking a smaller share. Raspbian will also be analyzed even though it doesn’t appear in this chart, because it is not used as a base OS by any official images. When looking at public images’ usage of operating systems, we will see that Raspbian gets used a fair bit. These make up the 7 most popular operating systems amongst Docker repositories, with all others taking up a little less than 2% of the share, so they will be excluded to keep things uncluttered. A notable exclusion here is Red Hat Enterprise Linux. The license agreement prohibits redistribution, which is likely why we see CentOS but no RHEL in the list of official images however, our data shows many public RHEL images from users.

The repositories that are included in our dataset are those that have been analyzed by Anchore. This means all official repos, the most popular (based on a combination of pulls and stars) public community repos, and user-requested images. Right now Anchore is pulling data only from Docker Hub, but soon we will be expanding to includes images on Amazon EC2 Container Registry (Amazon ECR).

From these repositories, we looked at only the latest tag so that the information was pulled from tags that were being consistently updated. Also, different repositories have different update schedules; where one will push updates every other month, another might update every week. If we counted each update, it would skew results towards operating systems that have a couple of repositories that update multiple times a day. For this reason, we only counted a repository’s use of an operating system on its latest tag once, unless it switched to a different operating system later on.

Something else to note is the “Unknown” on the chart. If you look at library/swarm:latest, for example, you will see that the operating system is listed as “Unknown.” What this means is that swarm doesn’t have a standard operating system install and so the system cannot recognize what it is built on top of. Images like these are often statically compiled binaries, and so don’t require anything extra beyond what’s needed to run the application. With Docker’s recent improvements to multi-stage builds, binaries might see a rise in the near future as developers become more familiar with the process and greatly decrease their file size.

Image size is often used as a criteria for the selection of base images so we performed some quick analysis to see the average size of official images broken down by the operating system distribution.

To get some context, here are the sizes of the images of popular operating systems.

The difference in image size is striking: the range goes from BusyBox at 1MB all the way up to Fedora at 230MB. It’s interesting to see the clustering happening. Alpine and BusyBox are lightweight and right near 0MB, then the midweights like Debian and Ubuntu are around 100MB, and the largest are heavyweights such as CentOS and Oracle Linux up by 200MB.

Shown here is the size of official images split by what underlying OS it uses. Do note that the OS image is not excluded from the average, so for lesser-used operating systems, the average is brought down.

You can see that as application images are built on top of these base images their size grows as dependencies are added. For example, adding required runtimes such as Python or Java.

The pie chart above showing official OS distribution only covers the creation of images in the last three months, but our data extends further back.

Taking a look at the distribution of operating systems over the course of the past year, we see that Debian has always held its popularity among official repositories. It had a peak of over 80% back in February, and since then appears to have been ever so slowly tailing off. It looks like Alpine is gradually growing, but it is difficult to see any sure trends due to the fluctuation of the data, especially during the summer months which are traditionally slower. We will continue to monitor and report on this trend.

Digging more into Debian’s two-thirds share, we can look at the distribution of versions of Debian. Debian 8, Jessie, has held near 100% of the share until July amongst official images, with only a small number of images being built on Wheezy (7) and Stretch (9). This, of course, makes sense as Debian 9 was only released halfway through June, and has since been adopted by more than a third of images and growing. Before its stable release, a few repositories were using the unstable release, presumably favoring new features enough to make the jump ahead of everyone else.

Docker Hub official repositories make up only a small number of the total repositories on DockerHub. They follow best practices, are often base images that users build their own apps on top of, and are updated frequently. These standards don’t apply to community images. However, the most popular ones – those that we analyze – only just don’t make that mark. Despite that, there are quite a few differences in operating system usage between the community and official images.

Debian still holds the largest share, but only just. Both Alpine and Ubuntu see their percentage nearly double, with raspbian emerging and taking a small percentage, focused on IoT use cases. Ubuntu’s popularity might be explained by the fact that it is the most commonly used Linux distribution by users, and people like to work with something they are familiar with especially as they learn new technology. For Alpine, it’s possible that community repos are quicker to change tech quicker, and the appeal of the security and tiny size of Alpine is pulling more developers towards it.

To counter that willingness to change, Stretch doesn’t see as much adoption amongst community images as official ones, getting about half as much usage. What is interesting, however, is that unstable Stretch received more usage here than among official images, which may come from some users experimenting with it to see new features.

The graph of community operating system usage over time is much more interesting than the graph for official images, as there are a few trends to see. At the end of 2016, the distribution of operating systems was much more spread. Although Debian was leading then, it had a smaller share than it does now. Starting in February we saw a reduction in the usage of Ubuntu, and now it only has half of the usage of the leaders. Alpine started growing shortly after to take Ubuntu’s place at the top, joining Debian. The other four operating systems all have steadily tailed off, as developers choose to use one of the main three. Going forward, it will be interesting to see if Ubuntu’s recent uptick will continue at the expense of Debian.

Official images are typically smaller than public images since they are used as a foundation to build an application image. However, Alpine contradicts this trend, and public images using it are half the size of official images on average.

In our next blog, we will dig deeper into updates – looking at how frequently images are updated and the relationship between operating system patches, base image patches and updates to end-user images.