This lesson is in the early stages of development (Alpha version)

Creating a Docker container

Overview

Teaching: 10 min
Exercises: 20 min
Questions
  • What is the difference between a Docker image and a container?

  • How to create my own Docker image?

  • What are the most important components of a Dockerfile?

Objectives
  • Create your own Docker image using Dockerfile

When starting to work with containers you will soon notice that existing images may not always satisfy your needs. In these situations you want to create your own custom image.

Docker images are defined by a text file called Dockerfile. Dockerfiles contain the instructions for Docker how to create a cusom image as the basis for Docker containers.

Let’s build and run our first image

We start by creating a textfile called Dockerfile in the folder ~/using-containers-in-science/.

$ cd ~
$ mkdir using-containers-in-science
$ cd using-containers-in-science
$ nano Dockerfile

Now, we add the content below into the Dockerfile:

FROM python:3.9
LABEL maintainer="support@hifis.net"

RUN pip install ipython numpy

ENTRYPOINT ["ipython"]

After that we can save and leave the editor (In the case of nano: Ctrl+O then Ctrl+X). Congratulations, it is that simple. The image can be built using the docker build command as shown below.

Note that to build a custom Docker image, you have to be in the folder containing the Dockerfile. The latter is implicitly used as the input for the build and you have to specify the name of the image to be built.

$ docker build -t my-ipython-image .

Which should yield something along the line of the following output. (Details may vary.)

Sending build context to Docker daemon  5.861MB
Step 1/4 : FROM python:3.9
3.9: Pulling from library/python
0ecb575e629c: Pull complete
7467d1831b69: Pull complete
feab2c490a3c: Pull complete
f15a0f46f8c3: Pull complete
937782447ff6: Pull complete
e78b7aaaab2c: Pull complete
06c4d8634a1a: Pull complete
42b6aa65d161: Pull complete
f7fc0748308d: Pull complete
Digest: sha256:ca8bd3c91af8b12c2d042ade99f7c8f578a9f80a0dbbd12ed261eeba96dd632f
Status: Downloaded newer image for python:3.9
 ---> 2a93c239d591
Step 2/4 : LABEL maintainer="support@hifis.net"
 ---> Running in 05ae980fe8f8
Removing intermediate container 05ae980fe8f8
 ---> d7fd298563bb
Step 3/4 : RUN pip install ipython numpy
 ---> Running in 88aec2275e64
Collecting ipython
  Downloading ipython-7.20.0-py3-none-any.whl (784 kB)
Collecting numpy
  Downloading numpy-1.20.1-cp39-cp39-manylinux2010_x86_64.whl (15.4 MB)
Collecting prompt-toolkit!=3.0.0,!=3.0.1,<3.1.0,>=2.0.0
  Downloading prompt_toolkit-3.0.16-py3-none-any.whl (366 kB)
Collecting pygments
  Downloading Pygments-2.8.0-py3-none-any.whl (983 kB)
Collecting traitlets>=4.2
  Downloading traitlets-5.0.5-py3-none-any.whl (100 kB)
Collecting pickleshare
  Downloading pickleshare-0.7.5-py2.py3-none-any.whl (6.9 kB)
Collecting jedi>=0.16
  Downloading jedi-0.18.0-py2.py3-none-any.whl (1.4 MB)
Requirement already satisfied: setuptools>=18.5 in /usr/local/lib/python3.9/site-packages (from ipython) (53.0.0)
Collecting backcall
  Downloading backcall-0.2.0-py2.py3-none-any.whl (11 kB)
Collecting decorator
  Downloading decorator-4.4.2-py2.py3-none-any.whl (9.2 kB)
Collecting pexpect>4.3
  Downloading pexpect-4.8.0-py2.py3-none-any.whl (59 kB)
Collecting parso<0.9.0,>=0.8.0
  Downloading parso-0.8.1-py2.py3-none-any.whl (93 kB)
Collecting ptyprocess>=0.5
  Downloading ptyprocess-0.7.0-py2.py3-none-any.whl (13 kB)
Collecting wcwidth
  Downloading wcwidth-0.2.5-py2.py3-none-any.whl (30 kB)
Collecting ipython-genutils
  Downloading ipython_genutils-0.2.0-py2.py3-none-any.whl (26 kB)
Installing collected packages: wcwidth, ptyprocess, parso, ipython-genutils, traitlets, pygments, prompt-toolkit, pickleshare, pexpect, jedi, decorator, backcall, numpy, ipython
Successfully installed backcall-0.2.0 decorator-4.4.2 ipython-7.20.0 ipython-genutils-0.2.0 jedi-0.18.0 numpy-1.20.1 parso-0.8.1 pexpect-4.8.0 pickleshare-0.7.5 prompt-toolkit-3.0.16 ptyprocess-0.7.0 pygments-2.8.0 traitlets-5.0.5 wcwidth-0.2.5
Removing intermediate container 88aec2275e64
 ---> 7415cc5bf8d9
Step 4/4 : ENTRYPOINT ["ipython"]
 ---> Running in 9f4990ec4fa3
Removing intermediate container 9f4990ec4fa3
 ---> bf26c28ba752
Successfully built bf26c28ba752
Successfully tagged my-ipython-image:latest

Let’s try out the newly created image by running it.

$ docker run --rm -it my-ipython-image
Python 3.9.1 (default, Feb  9 2021, 07:42:03)
Type 'copyright', 'credits' or 'license' for more information
IPython 7.20.0 -- An enhanced Interactive Python. Type '?' for help.

In [1]:

We end up in an IPython shell allowing us to interact like in an IPython shell installed in the usual manner. Once we exit the shell, the container also stops running. Let’s see how this works by disassembling the Dockerfile.

Disassembling the Dockerfile

The Dockerfile used above contains four different instructions:

Let’s build the image again and see what happens.

$ docker build -t my-ipython-image .
Sending build context to Docker daemon  9.305MB
Step 1/4 : FROM python:3.9
 ---> 2a93c239d591
Step 2/4 : LABEL maintainer="support@hifis.net"
 ---> Using cache
 ---> d7fd298563bb
Step 3/4 : RUN pip install ipython numpy
 ---> Using cache
 ---> 7415cc5bf8d9
Step 4/4 : ENTRYPOINT ["ipython"]
 ---> Using cache
 ---> bf26c28ba752
Successfully built bf26c28ba752
Successfully tagged my-ipython-image:latest

This time, the output is much shorter than in our initial run of the docker build command. In each of the step it claimed to have used the cache. As each instruction is executed, Docker looks for an existing image in its cache that has already been created in the same manner. If there is such an image, Docker will re-use that image instead of creating a duplicate. If you do not want Docker to use its cache, provide the --no-cache=true option to the docker build command.

Create and run your own data science Docker image

Your goal in this exercise is to create your own custom data science image as follows:

  1. Build your image on top of the latest Python image of release series 3.8.
  2. Mark yourself as the maintainer of the image.
  3. Install numpy, scipy, pandas, scikit-learn and jupyterlab using pip install.
  4. Create a custom user using the command useradd -ms /bin/bash jupyter.
  5. Tell the image to automatically start as the jupyter user and to use the working directory /home/jupyter.
  6. Make sure the image starts with the command jupyter lab --ip=0.0.0.0 by default.

Hint: Use the instructions USER and WORKDIR for task 5.

When having built the image, make sure to test by running it and opening jupyter in your browser. You should be able to execute any command now, e.g.

import numpy as np
np.__config__.show()

Solution

  • Create a Dockerfile with below content.
FROM python:3.8

RUN pip install ipython jupyterlab numpy pandas scikit-learn

# Create a custom user under which the application runs
RUN useradd -ms /bin/bash jupyter

# Use this user by default for all subsequent operations
USER jupyter
# Default to start the container in the home directory of the jupyter user
WORKDIR /home/jupyter

# Publish port 8888 to the outside, for documentation purpose
EXPOSE 8888

ENTRYPOINT ["jupyter", "lab", "--ip=0.0.0.0"]
  • Build the docker image.
    $ docker build -t my-datascience-image .
    
  • Run the image and bind port 8888.
    $ docker run -p 8888:8888 -it --rm my-datascience-image
    

This yields an output as shown below. (Details may vary)

[I 2021-02-24 10:44:06.465 ServerApp] jupyterlab | extension was successfully linked.
[I 2021-02-24 10:44:06.485 ServerApp] Writing notebook server cookie secret to /home/jupyter/.local/share/jupyter/runtime/jupyter_cookie_secret
[I 2021-02-24 10:44:06.891 ServerApp] nbclassic | extension was successfully linked.
[I 2021-02-24 10:44:06.929 LabApp] JupyterLab extension loaded from /usr/local/lib/python3.8/site-packages/jupyterlab
[I 2021-02-24 10:44:06.929 LabApp] JupyterLab application directory is /usr/local/share/jupyter/lab
[I 2021-02-24 10:44:06.935 ServerApp] jupyterlab | extension was successfully loaded.
[I 2021-02-24 10:44:06.941 ServerApp] nbclassic | extension was successfully loaded.
[I 2021-02-24 10:44:06.941 ServerApp] Serving notebooks from local directory: /home/jupyter
[I 2021-02-24 10:44:06.941 ServerApp] Jupyter Server 1.4.1 is running at:
[I 2021-02-24 10:44:06.941 ServerApp] http://6e2f223e7a69:8888/lab?token=5d01365f726a90b6eb94f798fe6ecefb87e3fcaf642a38bd
[I 2021-02-24 10:44:06.941 ServerApp]  or http://127.0.0.1:8888/lab?token=5d01365f726a90b6eb94f798fe6ecefb87e3fcaf642a38bd
[I 2021-02-24 10:44:06.941 ServerApp] Use Control-C to stop this server and shut down all kernels (twice to skip confirmation).
[W 2021-02-24 10:44:06.946 ServerApp] No web browser found: could not locate runnable browser.
[C 2021-02-24 10:44:06.946 ServerApp] 
    
    To access the server, open this file in a browser:
        file:///home/jupyter/.local/share/jupyter/runtime/jpserver-1-open.html
    Or copy and paste one of these URLs:
        http://6e2f223e7a69:8888/lab?token=5d01365f726a90b6eb94f798fe6ecefb87e3fcaf642a38bd
     or http://127.0.0.1:8888/lab?token=5d01365f726a90b6eb94f798fe6ecefb87e3fcaf642a38bd

Key Points

  • A Dockerfile is a text file containing instructions for building a Docker image.

  • Use the command docker build to build an image from a Dockerfile.

  • Specify your bild instruction in a file called Dockerfile.