Dockerfile anti-patterns and best practices
I've been using Docker for some time now. There is already a lot of documentation available online but I recently saw the same "anti-patterns" several times, so I thought it was worth writing a post about it.
I won't repeat all the Best practices for writing Dockerfiles here. You should definitively read that page.
I want to emphasize some things that took me some time to understand.
Avoid invalidating the cache
Let's take a simple example with a Python application:
FROM python:3.6 COPY . /app WORKDIR /app RUN pip install -r requirements.txt ENTRYPOINT ["python"] CMD ["ap.py"]
It's actually an example I have seen several times online. This looks fine, right?
The problem is that the COPY . /app command will invalidate the cache as soon as any file in the current directory is updated. Let's say you just change the README file and run docker build again. Docker will have to re-install all the requirements because the RUN pip command is run after the COPY that invalidated the cache.
The requirements should only be re-installed if the requirements.txt file changes:
FROM python:3.6 WORKDIR /app COPY requirements.txt /app/requirements.txt RUN pip install -r requirements.txt COPY . /app ENTRYPOINT ["python"] CMD ["ap.py"]
With this Dockerfile, the RUN pip command will only be re-run when the requirements.txt file changes. It will use the cache otherwise.
This is much more efficient and will save you quite some time if you have many requirements to install.
Minimize the number of layers
What does that really mean?
Each Docker image references a list of read-only layers that represent filesystem differences. Every command in your Dockerfile will create a new layer.
Let's use the following Dockerfile:
FROM centos:7 RUN yum update -y RUN yum install -y sudo RUN yum install -y git RUN yum clean all
Build the docker image and check the layers created with the docker history command:
$ docker build -t centos-test . ... $ docker images REPOSITORY TAG IMAGE ID CREATED SIZE centos-test latest 1fae366a2613 About a minute ago 470 MB centos 7 98d35105a391 24 hours ago 193 MB $ docker history centos-test IMAGE CREATED CREATED BY SIZE COMMENT 1fae366a2613 2 minutes ago /bin/sh -c yum clean all 1.67 MB 999e7c7c0e14 2 minutes ago /bin/sh -c yum install -y git 133 MB c97b66528792 3 minutes ago /bin/sh -c yum install -y sudo 81 MB e0c7b450b7a8 3 minutes ago /bin/sh -c yum update -y 62.5 MB 98d35105a391 24 hours ago /bin/sh -c #(nop) CMD ["/bin/bash"] 0 B <missing> 24 hours ago /bin/sh -c #(nop) LABEL name=CentOS Base ... 0 B <missing> 24 hours ago /bin/sh -c #(nop) ADD file:29f66b8b4bafd0f... 193 MB <missing> 6 months ago /bin/sh -c #(nop) MAINTAINER https://gith... 0 B
There are two problems with this Dockerfile:
We added too many layers for nothing.
The yum clean all command is meant to reduce the size of the image but it actually does the opposite by adding a new layer!
Let's check that by removing the latest command and running the build again:
FROM centos:7 RUN yum update -y RUN yum install -y sudo RUN yum install -y git # RUN yum clean all
$ docker build -t centos-test . ... $ docker images REPOSITORY TAG IMAGE ID CREATED SIZE centos-test latest 999e7c7c0e14 11 minutes ago 469 MB centos 7 98d35105a391 24 hours ago 193 MB
The new image without the yum clean all command is indeed smaller than the previous image (1.67 MB smaller)!
If you want to remove files, it's important to do that in the same RUN command that created those files. Otherwise there is no point.
Here is the proper way to do it:
FROM centos:7 RUN yum update -y \ && yum install -y \ sudo \ git \ && yum clean all
Let's build this new image:
$ docker build -t centos-test . ... $ docker images REPOSITORY TAG IMAGE ID CREATED SIZE centos-test latest 54a328ef7efd 21 seconds ago 265 MB centos 7 98d35105a391 24 hours ago 193 MB $ docker history centos-test IMAGE CREATED CREATED BY SIZE COMMENT 54a328ef7efd About a minute ago /bin/sh -c yum update -y && yum install ... 72.8 MB 98d35105a391 24 hours ago /bin/sh -c #(nop) CMD ["/bin/bash"] 0 B <missing> 24 hours ago /bin/sh -c #(nop) LABEL name=CentOS Base ... 0 B <missing> 24 hours ago /bin/sh -c #(nop) ADD file:29f66b8b4bafd0f... 193 MB <missing> 6 months ago /bin/sh -c #(nop) MAINTAINER https://gith... 0 B
The new image is only 265 MB compared to the 470 MB of the original image. There isn't much more to say :-)
If you want to know more about images and layers, you should read the documentation: Understand images, containers, and storage drivers.
Conclusion
Avoid invalidating the cache:
start your Dockerfile with commands that should not change often
put commands that can often invalidate the cache (like COPY .) as late as possible
only add the needed files (use a .dockerignore file)
Minimize the number of layers:
put related commands in the same RUN instruction
remove files in the same RUN command that created them
Comments
Comments powered by Disqus