Lab 5: Containerizing ML Models with Docker

Overview

In this lab, you will containerize a machine learning training pipeline and inference server using Docker. You will train a Wine classifier, serve it via Flask, and orchestrate both containers with Docker Compose. A key focus of this lab is Docker volume management — you will use both named volumes and bind mounts to share data between containers and persist artifacts on the host.

Deliverables

Deliverable 1: The training script has been run in a container and the resulting model is saved to a shared volume. Able to explain why Docker is useful for reproducibility and portability in ML training scenarios.
Deliverable 2: Containerize the inference service to serve predictions on a specific port and show the ./logs/predictions.log file on your host to the TA. Explain what the Dockerfile is and how it helps containerize the inference service.
Deliverable 3: Call the inference service health endpoint before and after destroying the named volume to demonstrate how model availability changes. Explain the difference between named volumes and bind mounts in Docker.

Step 0: Setup Docker

Install Docker on your system and verify your installation:

docker run hello-world

Follow the instructions for your operating system: https://docs.docker.com/get-docker/

Step 1: Containerize the Training Pipeline

Create a Docker container for the training code. When launched, the container should train a Wine classifier and save the model file to a shared volume. You can use the partially completed Dockerfile and code in docker/training/.

1a. Complete `train.py`

Create a RandomForestClassifier and train it on the Wine dataset

Save the trained model using joblib.dump() to /app/models/wine_model.pkl

1b. Complete the Training Dockerfile

Open docker/training/Dockerfile. It is nearly complete. Fill in the TODO to set the command that runs the training script.

1c. Build and Run with a Named Volume

Build the training image and run it, mounting a named volume for model storage:

docker build -t mlip-training -f docker/training/Dockerfile .
docker run --rm -v wine_model_storage:/app/models mlip-training

You should see output showing the test accuracy and a message that the model was saved.

What is a named volume? When you use -v wine_model_storage:/app/models, Docker creates a named volume called wine_model_storage that is managed by Docker. The data in this volume persists even after the container exits.

Step 2: Containerize the Inference Server

Create a Docker container that loads the trained model from the shared volume and serves predictions via a Flask API. The server also logs predictions to a bind-mounted directory so you can inspect them from the host.

2a. Complete `server.py`

Open docker/inference/server.py. You need to:

Load the trained model from the shared volume, extract features from the incoming JSON request, run inference, and log each prediction to a host-mounted log file (/app/logs/predictions.log)

The server includes a /health endpoint that reports whether the model file exists — this is useful for debugging volume issues.

2b. Create the Inference Dockerfile

Create a new file docker/inference/Dockerfile from scratch. It should:

Use python:3.11-slim as the base image
Set the working directory to /app
Copy and install dependencies from a requirements file
Copy server.py to the working directory
Expose port 8081
Set the command to run server.py

HINT: Look at the training Dockerfile for reference. Note that the build context is the project root, so paths should be docker/inference/....

You will also need to create docker/inference/requirements.txt with the necessary packages (Flask, scikit-learn, joblib, numpy).

2c. Build and Run with Both Volume Types

Create a local directory for logs, then run the inference container with both a named volume (for the model) and a bind mount (for logs):

mkdir -p ./logs
docker build -t mlip-inference -f docker/inference/Dockerfile .
docker run --rm -p 8081:8080 \
  -v wine_model_storage:/app/models \
  -v $(pwd)/logs:/app/logs \
  mlip-inference

Notice the two -v flags:

wine_model_storage:/app/models — named volume (Docker-managed, shared with training)
$(pwd)/logs:/app/logs — bind mount (maps your local ./logs/ directory into the container)

2d. Test the Inference Server

Check the health endpoint:

curl http://localhost:8081/health

Send a prediction request (13 Wine features):

curl -X POST http://localhost:8081/predict \
  -H 'Content-Type: application/json' \
  -d '{"input": [13.2, 1.78, 2.14, 11.2, 100, 2.65, 2.76, 0.26, 1.28, 4.38, 1.05, 3.40, 1050]}'

Test error handling with a bad request:

curl -X POST http://localhost:8081/predict \
  -H 'Content-Type: application/json' \
  -d '{"bad_key": [1,2,3]}'

After sending predictions, check your local ./logs/ directory — you should see a predictions.log file with timestamped entries. This is the bind mount in action: the container writes to /app/logs/ and the file appears on your host filesystem.

Step 3: Docker Compose

Docker Compose allows you to define and manage multi-container applications without long command-line parameters. Complete the docker-compose.yml file to set up both services.

You need to fill in:

Build context and Dockerfile path for each service
Named volume wine_model_storage mounted to /app/models on both services (for sharing the model)
Bind mount ./logs mapped to /app/logs on the inference service (for prediction logs)
Port mapping for the inference service
Named volume definition in the volumes: section at the bottom

Then run:

docker compose up --build

After both services start, test with the same curl commands from Step 2d. Verify that:

The prediction endpoint returns a valid wine class
The ./logs/predictions.log file on your host is being written to

Shut it down:

docker compose down

Step 4: Volume Lifecycle

This step demonstrates the differences in how Docker manages data persistence and host-container sharing.

4a. Inspect the Named Volume

Check where Docker physically stores your model on the host:

docker volume ls
docker volume inspect wine_model_storage

Observe the Mountpoint field; this is the Docker-managed path on your machine.

4b. Persistence Verification

Verify that the trained model persists across training container destruction:

Start both containers to run training and inference: docker compose up --build (Confirm health: curl http://localhost:8081/health)

Stop and remove containers while retaining the named volume: docker compose down

Restart only the inference container: docker compose up inference

Run the health check again and confirm it still returns healthy, demonstrating that the model was successfully persisted in the named volume.

4c. Removing Volumes

To fully reset the environment and delete the model:

docker compose down -v

The -v flag wipes the Named Volume. Now run up the inference container again, and verify the health again, discuss your results with the TA.

Additional Resources

Troubleshooting

If you encounter issues:

Check Docker daemon status
Verify port availability (is port 8081 already in use?)
Review service logs with docker compose logs
Ensure the training service completes before the inference service starts
Use docker compose exec to inspect container file systems
If the model file is missing, check that your named volume is correctly mounted with docker volume ls
If logs are not appearing on the host, verify your bind mount path

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
docker		docker
README.md		README.md
docker-compose.yml		docker-compose.yml
test_predict.py		test_predict.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Lab 5: Containerizing ML Models with Docker

Overview

Deliverables

Step 0: Setup Docker

Step 1: Containerize the Training Pipeline

1a. Complete `train.py`

1b. Complete the Training Dockerfile

1c. Build and Run with a Named Volume

Step 2: Containerize the Inference Server

2a. Complete `server.py`

2b. Create the Inference Dockerfile

2c. Build and Run with Both Volume Types

2d. Test the Inference Server

Step 3: Docker Compose

Step 4: Volume Lifecycle

4a. Inspect the Named Volume

4b. Persistence Verification

4c. Removing Volumes

Additional Resources

Troubleshooting

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Lab 5: Containerizing ML Models with Docker

Overview

Deliverables

Step 0: Setup Docker

Step 1: Containerize the Training Pipeline

1a. Complete train.py

1b. Complete the Training Dockerfile

1c. Build and Run with a Named Volume

Step 2: Containerize the Inference Server

2a. Complete server.py

2b. Create the Inference Dockerfile

2c. Build and Run with Both Volume Types

2d. Test the Inference Server

Step 3: Docker Compose

Step 4: Volume Lifecycle

4a. Inspect the Named Volume

4b. Persistence Verification

4c. Removing Volumes

Additional Resources

Troubleshooting

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

1a. Complete `train.py`

2a. Complete `server.py`

Packages