While containers are thought of as “micro-services” or applications, if you open up the image you will see more than just an application - more often than not, you’ll see an entire operating system image along with the application. If you dig into the image you will find that certain parts of the operating system are missing such as kernel and hardware-specific modules and often, but sadly not always, the package list is reduced. If you are deploying a pre-packaged container built by a 3rd party you may not even know what operating system has been used to build the container let alone what packages are inside.
As part of the analysis that Anchore performs on the container, it identifies the underlying operating system. To check this out go to the Anchore Navigator and search for the image that you wish to inspect. Halfway down on the overview tab you’ll see the operating system name and version listed. For example, searching for library/nginx:latest will show that it is built on top of Debian 9, Stretch.
Let’s take a look at what operating systems are used on Docker Hub:
- Which operating system gets used the most?
- How has the choice of operating system changed over time?
- Are there different usage patterns for official images compared to public images?
To get our toes wet, here is the breakdown of what operating systems official images are being built on.
It is clear that Debian is the most popular, with Alpine taking second place, and then a number of others each taking a smaller share. Raspbian will also be analyzed even though it doesn’t appear in this chart, because it is not used as a base OS by any official images. When looking at public images’ usage of operating systems, we will see that Raspbian gets used a fair bit. These make up the 7 most popular operating systems amongst Docker repositories, with all others taking up a little less than 2% of the share, so they will be excluded to keep things uncluttered. A notable exclusion here is Red Hat Enterprise Linux. The license agreement prohibits redistribution, which is likely why we see CentOS but no RHEL in the list of official images however, our data shows many public RHEL images from users.
The repositories that are included in our dataset are those that have been analyzed by Anchore. This means all official repos, the most popular (based on a combination of pulls and stars) public community repos, and user-requested images. Right now Anchore is pulling data only from Docker Hub, but soon we will be expanding to includes images on Amazon EC2 Container Registry (Amazon ECR).
From these repositories, we looked at only the latest tag so that the information was pulled from tags that were being consistently updated. Also, different repositories have different update schedules; where one will push updates every other month, another might update every week. If we counted each update, it would skew results towards operating systems that have a couple of repositories that update multiple times a day. For this reason, we only counted a repository’s use of an operating system on its latest tag once, unless it switched to a different operating system later on.
Something else to note is the “Unknown” on the chart. If you look at library/swarm:latest, for example, you will see that the operating system is listed as “Unknown.” What this means is that swarm doesn’t have a standard operating system install and so the system cannot recognize what it is built on top of. Images like these are often statically compiled binaries, and so don’t require anything extra beyond what’s needed to run the application. With Docker’s recent improvements to multi-stage builds, binaries might see a rise in the near future as developers become more familiar with the process and greatly decrease their file size.
Image size is often used as a criteria for the selection of base images so we performed some quick analysis to see the average size of official images broken down by the operating system distribution.
To get some context, here are the sizes of the images of popular operating systems.
The difference in image size is striking: the range goes from BusyBox at 1MB all the way up to Fedora at 230MB. It’s interesting to see the clustering happening. Alpine and BusyBox are lightweight and right near 0MB, then the midweights like Debian and Ubuntu are around 100MB, and the largest are heavyweights such as CentOS and Oracle Linux up by 200MB.
Shown here is the size of official images split by what underlying OS it uses. Do note that the OS image is not excluded from the average, so for lesser-used operating systems, the average is brought down.
You can see that as application images are built on top of these base images their size grows as dependencies are added. For example, adding required runtimes such as Python or Java.
The pie chart above showing official OS distribution only covers the creation of images in the last three months, but our data extends further back.
Taking a look at the distribution of operating systems over the course of the past year, we see that Debian has always held its popularity among official repositories. It had a peak of over 80% back in February, and since then appears to have been ever so slowly tailing off. It looks like Alpine is gradually growing, but it is difficult to see any sure trends due to the fluctuation of the data, especially during the summer months which are traditionally slower. We will continue to monitor and report on this trend.
Digging more into Debian’s two-thirds share, we can look at the distribution of versions of Debian. Debian 8, Jessie, has held near 100% of the share until July amongst official images, with only a small number of images being built on Wheezy (7) and Stretch (9). This, of course, makes sense as Debian 9 was only released halfway through June, and has since been adopted by more than a third of images and growing. Before its stable release, a few repositories were using the unstable release, presumably favoring new features enough to make the jump ahead of everyone else.
Docker Hub official repositories make up only a small number of the total repositories on DockerHub. They follow best practices, are often base images that users build their own apps on top of, and are updated frequently. These standards don’t apply to community images. However, the most popular ones - those that we analyze - only just don’t make that mark. Despite that, there are quite a few differences in operating system usage between the community and official images.
Debian still holds the largest share, but only just. Both Alpine and Ubuntu see their percentage nearly double, with raspbian emerging and taking a small percentage, focused on IoT use cases. Ubuntu’s popularity might be explained by the fact that it is the most commonly used Linux distribution by users, and people like to work with something they are familiar with especially as they learn new technology. For Alpine, it’s possible that community repos are quicker to change tech quicker, and the appeal of the security and tiny size of Alpine is pulling more developers towards it.
To counter that willingness to change, Stretch doesn’t see as much adoption amongst community images as official ones, getting about half as much usage. What is interesting, however, is that unstable Stretch received more usage here than among official images, which may come from some users experimenting with it to see new features.
The graph of community operating system usage over time is much more interesting than the graph for official images, as there are a few trends to see. At the end of 2016, the distribution of operating systems was much more spread. Although Debian was leading then, it had a smaller share than it does now. Starting in February we saw a reduction in the usage of Ubuntu, and now it only has half of the usage of the leaders. Alpine started growing shortly after to take Ubuntu’s place at the top, joining Debian. The other four operating systems all have steadily tailed off, as developers choose to use one of the main three. Going forward, it will be interesting to see if Ubuntu’s recent uptick will continue at the expense of Debian.
Official images are typically smaller than public images since they are used as a foundation to build an application image. However, Alpine contradicts this trend, and public images using it are half the size of official images on average.
In our next blog, we will dig deeper into updates - looking at how frequently images are updated and the relationship between operating system patches, base image patches and updates to end-user images.