Docker and Containers: Understanding the Basics and Common Doubts Explained

fog37 · Dec 6, 2023

Hello,

anyone on Physics Forums using Docker? I have understood the overarching idea behind Docker (packaging code, dependencies, etc. into a container that can be shared across OSs) but I have some doubts:

I am confused about the base image line of code in the Dockerfile (which is simple text file): FROM python: 3.9

Why is that instruction called an "image"?
That first line line specifies the Python interpreter to use inside the container...Does that mean that when the container is built, using the instructions in the Dockerfile, a python 3.9 interpreter gets also downloaded and used into the built container?
Where does that specified interpreter get exactly downloaded from? When creating the docker container, the code and the dependencies are on my local machine....Is the Python interpreter instead downloaded form the internet?
The last instruction, CMD ["python, iris_classification.py"], specifies what to do at the command line to launch the program, i.e. type python followed by the name of the script, correct? Would it be possible to launch the program as a Jupyter notebook (ipynb) or does the code need to be in a .py file?

Thank you!!!!

PeterDonis · Dec 6, 2023

fog37 said:

Why is that instruction called an "image"?

Have you read the Docker documentation? It explains what an image is. See, for example, these pages:

https://docs.docker.com/build/building/packaging/

https://docs.docker.com/engine/reference/builder/

fog37 said:

That first line line specifies the Python interpreter to use inside the container

No, it specifies a base image that has that Python interpreter installed.

fog37 said:

...Does that mean that when the container is built, using the instructions in the Dockerfile, a python 3.9 interpreter gets also downloaded and used into the built container?

No, it means that Python interpreter is already installed in the base image. That's why you use that base image instead of saying "FROM scratch" and having to manually do the installation of the Python interpreter yourself.

fog37 said:

Where does that specified interpreter get exactly downloaded from? When creating the docker container, the code and the dependencies are on my local machine....Is the Python interpreter instead downloaded form the internet?

It depends on how that base image was built, which in turn depends on where you are getting that base image from.

fog37 said:

The last instruction, CMD ["python, iris_classification.py"], specifies what to do at the command line to launch the program, i.e. type python followed by the name of the script, correct?

Basically, yes. However, your invocation probably won't work because you are using the form that doesn't invoke a shell, but you don't give full paths to either the python executable or the .py file you are trying to run. See the dockerfile reference page I linked to above.

fog37 said:

Would it be possible to launch the program as a Jupyter notebook (ipynb)

I don't see why not if you installed all the required support for Jupyter notebooks with the appropriate dockerfile commands.

Borg · Dec 6, 2023

One thing that tripped me up when I was learning Docker was that your base image and commands in your Dockerfile are only a guarantee that you'll have the exact same starting point when you start it. Whatever you do in the container is lost when you shut it down though. However, you can save your container as a new image as long as you do it while it's running. You'll also learn the joy of having local volumes that your container connects to for long-term storage.

Borg · Dec 6, 2023

Here's a Dockerfile example that I used to create a personal image that runs a shell script stored locally on Windows when the container starts. It achieves this using the volumes that I mentioned.

Code:

# Dockerfile Example:

# Build this from the Docker_Project directory using one of these commands:
#    docker build . -t my_image_name:0.1 -f Dockerfile
# Run as follows:
#    docker run --rm -it my_image_name:0.1 bash
#    docker run --rm -it -v D:/JupyterPrograms/0-Playground/Docker_Tests/Docker_Project/my_test_github_project/:/opt/ml/code/ my_image_name:0.1 bash

FROM python:3.8-slim-buster

# Install the necessary python libraries
RUN pip install matplotlib && \
    pip install mlflow \
    pip install pandas \
    pip install ruamel.yaml \
    pip install sklearn \
    pip install tensorflow
    
RUN mkdir -p /opt/ml/code
RUN mkdir -p /opt/ml/output

    
# Don't actually include the code so that you can load it at run time through a volume setting
# Notice how the volume setting on line 7 points to a local drive on Windows and maps it to the /opt/ml/code directory in my container.
#  - this allows me to run various tests on the run_service.sh script that is on Windows and see how it runs when the container starts.
# Similarly, I can put anything else in my Windows directory and see it from the container.  Whatever change I make in Windows is seen in the container.
CMD ["/opt/ml/code/run_service.sh"]

jedishrfu · Dec 6, 2023

Docker is a pretty cool technology. It reminded me of how we used to build bootable diskettes. We'd copy over all the PC DOS commands we needed, our application to run and setup an autoexec.bat + config.sys and we had a bootable diskette.

In Docker, one often starts with a minimal linux image like Alpine or a base Ubuntu or Fedora image. If you need python then you'd add it to the mix. In the past, I used the popular Anaconda python distro and then copy in my application to run. Anaconda can bring in a lot of extra unneeded stuff which you may want to delete.

One of the first issues, you can run into is image bloat where the image is much larger than you need and so you have to go in and pare it down but either not copying in some stuff or deleting stuff. Some of my early functional images were upto to 2GB and they were taking a fair amount of time to load and initialize. Later I learned to start with a developer base, build my application and then switch to a smaller Alpine footprint.

I would use the ldd command to discover what libraries were needed and delete the others. I chose to not install large datafiles in the image if they were used but other applications. Basically, you must know your application and its needs and build your image accordingly.

With respect to Python, I've heard some shops will convert their python to Go applications which are binary self-contained apps that run fast and take up a lot less space.

One other thing, is that some folks use podman over Docker due to the permissions required in Docker. Podman can apparently run in userspace and not require superuser mode.

As you work with Docker for awhile, you may run into image repository issues. I remember having to clear the repository from time to time to provide space for new images.A couple of times I had to delete the repo and rebuild it with base images again. This happened to me a few years ago so Docker may have fixed the problem. In my case, it was exasperated by my large 2GB images which I eventually pared down to 100MB - 200MB.

fog37 · Feb 11, 2024

Borg said:

Here's a Dockerfile example that I used to create a personal image that runs a shell script stored locally on Windows when the container starts. It achieves this using the volumes that I mentioned.

Code:

# Dockerfile Example:

# Build this from the Docker_Project directory using one of these commands:
#    docker build . -t my_image_name:0.1 -f Dockerfile
# Run as follows:
#    docker run --rm -it my_image_name:0.1 bash
#    docker run --rm -it -v D:/JupyterPrograms/0-Playground/Docker_Tests/Docker_Project/my_test_github_project/:/opt/ml/code/ my_image_name:0.1 bash

FROM python:3.8-slim-buster

# Install the necessary python libraries
RUN pip install matplotlib && \
    pip install mlflow \
    pip install pandas \
    pip install ruamel.yaml \
    pip install sklearn \
    pip install tensorflow
  
RUN mkdir -p /opt/ml/code
RUN mkdir -p /opt/ml/output

  
# Don't actually include the code so that you can load it at run time through a volume setting
# Notice how the volume setting on line 7 points to a local drive on Windows and maps it to the /opt/ml/code directory in my container.
#  - this allows me to run various tests on the run_service.sh script that is on Windows and see how it runs when the container starts.
# Similarly, I can put anything else in my Windows directory and see it from the container.  Whatever change I make in Windows is seen in the container.
CMD ["/opt/ml/code/run_service.sh"]

Hello Borg, thank you for your reply...just to be clear, my understanding is that once an image is used to create a container, we can run the container...That is a read-only operation: we cannot view, change the code inside the container and rerun it again...Or can we? Can we just look at the code, modify it, create a new image and run a new container?

If an instructor wanted to share code+dependencies with the students, docker container would not be the optimal choice since the students would only be able to run the application but not able to apply changes, etc. and rerun it....

Borg · Feb 11, 2024

Good morning fog. Think about this for a moment. At some point, someone created a blank container, modified it and then saved it. So yes, they can be modified and saved. When you start one or more containers, you can see what is running using this command "docker ps -a". You will see a CONTAINER_ID associated with each container which you can then save using a command like this "docker commit [CONTAINER_ID] [new_image_name]". In my example above, someone did exactly that and saved the new_image_name as python:3.8-slim-buster.

With respect to modifying the insides of a container, you can log into it using an app like Putty, make your changes and then, while the container with those changes is running, save it using the docker commit command.

fog37 said:

If an instructor wanted to share code+dependencies with the students, docker container would not be the optimal choice since the students would only be able to run the application but not able to apply changes, etc. and rerun it....

Actually, the opposite. It is a good platform because he can give each of the students the exact same starting environment. They make their changes to their container, save it and return to the instructor.

Docker and Containers: Understanding the Basics and Common Doubts Explained

Attachments

Similar threads

Hot Threads

Recent Insights