SQR-064: The sciplat-lab build process

  • Adam Thornton

Latest Revision: 2021-12-17

This technote documents how sciplat-lab (the JupyterLab RSP container) is built.

1   Repository

The Lab container resides at the LSST SQuaRE GitHub sciplat-lab repository.

1.1   Layout

There are several different categories of files in the repository directory.

  1. Makefile and Dockerfile.template directly control the build process; GNU Make is used to generate a Dockerfile from the template and arguments, and then docker build generates the sciplat-lab container. bld provides compatibility with our old build system and is a wrapper for make; it will be removed soon.
  2. The stage shell files are executed during the docker build and each control a fairly large section of the container build. texlive.profile is used to control the build of TeX in the container.
  3. The other executable files, except for lsstlaunch.bash, are used during JupyterLab startup. The most important, and most likely to need modification, is runlab.sh, which sets up the JupyterLab environment prior to launching the Lab.
  4. Everything else is copied into the container during build and controls various runtime behaviors of the Lab.

1.2   Branch Conventions

Standard Lab containers (that is, dailies, weeklies, release candidates, and releases) are built from the prod branch. Experimental containers may be built from any branch. The build process enforces this condition, and will force the tag to an experimental one when building from a non-prod branch.

Note that from the GitHub perspective, prod rather than main is the default branch.

1.3   Updating the Default Branch

  1. Do your work in a ticket branch, as with any other repository.
  2. PR that ticket branch into main. Note that the default branch to PR into is going to be prod and you will have to change the selection to main.
  3. Rebase (if possible) or cherry-pick the changes from main into prod_update. At the time of writing, there’s no difference between main and prod_update, but as we migrate between major versions of JupyterLab, it is possible for the two branches to diverge significantly (as they did in the JL2-JL3 transition).
  4. Merge prod_update into prod.

It is worth noting that the only place we use a PR in this process is getting changes into main. Typically you would build an experimental container from your branch, test that, and once satisfied, proceed with the PR.

Once your changes are on main, in the usual case where main and prod_update do not differ, the following incantation will suffice:

git checkout main && \
git pull && \
git checkout prod_update && \
git rebase main && \
git push && \
git checkout prod && \
git merge prod_update && \
git push

2   Build Process

GNU Make is used to drive the build process. The Makefile accepts three arguments and has three useful targets.

The arguments are as follows:

  1. tag – mandatory: this is the tag on the input DM Stack container, e.g. w_2021_50. If it starts with a v that v becomes an r in the output version.
  2. image – optional: this is the name of the image you’re building and pushing. It defaults to docker.io/lsstsqre/sciplat-lab.
  3. supplementary – optional: if specified, this turns the build into an experimental build where the tag starts with exp_ and ends with _<supplementary>.

The targets are one of:

  1. clean – remove the generated Dockerfile. Not terribly useful on its own, but a good first step before running the next target (because the template rarely changes, make cannot tell on its own that the Dockerfile needs rebuilding when the arguments change).
  2. dockerfile – just generate the Dockerfile from the template and the arguments. Do not build or push.
  3. image – build the Lab container, but do not push it.
  4. push – build and push the container.

push is the default, and all is a synonym for it. build is a synonym for image. Note that we assume that the building user already has appropriate push credentials for the repository to which the image is pushed, and that any necessary docker login has already been performed.

If the image is built from a branch that is not prod, and the supplementary tag is not specified, the supplementary tag will be set to a value derived from the branch name. This prevents building standard containers from branches other than prod.

2.1   Dockerfile template substitution

Dockerfile.template substitutes {{TAG}}, {{IMAGE}}, and {{VERSION}}. Despite the fact that we use double-curly-brackets, the substitution is nothing as sophisticated as Jinja 2: instead, we just run sed in the dockerfile target of the Makefile.

2.2   Examples

Build and push the weekly 2021_50 container:

make tag=w_2021_50

Build and push an experimental container with a newnumpy supplementary tag:

make tag=w_2021_50 supplementary=newnumpy

Just create the Dockerfile for w_2021_49:

make dockerfile tag=w_2021_49

Build the newnumpy container, but don’t push it:

make image tag=w_2021_50 supplementary=newnumpy

Build and push w_2021_50 to ghcr.io:

make tag=w_2021_50 image=ghcr.io/lsst-sqre/sciplat-lab``.

3   Modifying Lab container Contents

This is probably why you’re reading this document.

You will need to understand the structure of Dockerfile.template a little. It is very likely that the piece you need to modify is in one of the stage*.sh scripts, although it is plausible that what you want is actually one of the container setup-at-runtime pieces.

3.1   stage*.sh scripts

Most of the action in the Dockerfile comes from five shell scripts executed by docker build as RUN actions.

These are, in order:

  1. stage1-rpm.sh – we will always be building on top of centos in the current regime. This stage first reinstalls all the system packages but with man pages this time (the Stack container isn’t really designed for interactive use, but ours is), and then adds some RPM packages we require, or at least find helpful, for our user environment.
  2. stage2-os.sh – this installs os-level packages that are not packaged via RPM. Currently the biggest and hairiest of these is TeXLive–the conda TeX packaging story is not good, and if we don’t install TeXLive a bunch of the export-as options in JupyterLab will not work.
  3. stage3-py.sh – this is probably where you’re going to be spending your time. Mamba is faster and reports errors better than conda, so we install and then use it. Anything that is packaged as a Conda package should be installed from conda-forge. However, that’s not everything we need. Thus, the first thing we do is add all the Conda packages we need. Then we do a pip install of the rest, and a little bit of bookkeeping to create a kernel for the Stack Python. It is likely that what you need to do will be done by inserting (or pinning versions of) python packages in the mamba or pip sections.
  4. stage4-jup.sh – this is for installation of Jupyter packages–mostly Lab extensions, but there are also server and notebook extensions we rely upon. Use pre-built Lab extensions if at all possible, which will mean they are packaged as conda-forge or pip-installable packages and handled in the previous Python stage.
  5. stage5-ro.sh – this is Rubin Observatory-specific setup. This, notably, creates quite a big layer because, among other things, it checks out the tutorial notebooks as they existed at build time, and people keep checking large figure outputs into these notebooks.

3.2   Other files

The rest of the files in this directory are either things copied to various well-known locations (for example, all the local*.sh files end up in /etc/profile.d) or they control various aspects of the Lab startup process. For the most part they are moved into the container by COPY statements in the Dockerfile. They do not often need modification.

runlab.sh is the other file you are likely to need to modify. This is executed, as the target user, and the last thing it does is start jupyterlab (well, almost: it also knows if it’s a dask worker or a noninteractive container, and does something different in those cases).

3.3   Indentation conventions

There’s a lot of shell scripting in here. Please use four-space indentations, and convert tabs to spaces, if you’re working on the scripts.