Jupyter + Spark as Docker Compose

- 2 mins read

Prerequisite

Make sure you have docker compose installed on your machine. I assume you have docker compose V2 but V1 should also do the job. We both know you are just as lazy as I am. So i will start with most important stuff:

How to run

1. Create compose.yml file and add content bellow

vim compose.yml

compose.yml :

services:
  spark-scala-notebook:
    build:
      context: .
      dockerfile: Dockerfile
      args:
        NB_UID: 1000
        NB_GID: 1000
    image: spark-scala-notebook:latest
    ports:
      - "8888:8888"
    environment:
      - JUPYTER_TOKEN=
    volumes:
      - ./work:/home/jovyan/work
    user: root
    command: >
      bash -c "
        chown -R jovyan:users /home/jovyan/work &&
        start-notebook.sh --NotebookApp.token=''
      "
    networks:
      - spark_network

networks:
  spark_network:
    driver: bridge

2. Compose it

execute the following command in the same file you created your compose.yml file

docker compose up

3. Start using Jupyter

notebook will be available at localhost:8888 link


About compose file

Good to know:

After work you can compose down this container with this command:

docker compose down 

All your notebooks are stored in work directory. This directory is created when you run docker compose for the first time

Explaining weird stuff in compose.yml

NB_UID: 1000 — user ID for the notebook user.
NB_GID: 1000 — group ID for the notebook user.
command: keyword overrides the container’s default command with a shell script:

    command: >
      bash -c "
        chown -R jovyan:users /home/jovyan/work &&
        start-notebook.sh --NotebookApp.token=''
      "

First, it ensures that the work directory is owned by the jovyan user (used inside the notebook). Then, it starts the Jupyter notebook server with no token (no auth).