SQR-064: The sciplat-lab build process

  • Adam Thornton

Latest Revision: 2021-12-09

Note

This technote is not yet published.

This technote documents how sciplat-lab (the JupyterLab RSP container) is built.

1   Repository

1.1   Layout

The Lab container resides at the LSST SQuaRE Github sciplat-lab repository.

1.2   Branch Conventions

Development of the Lab container contents typically happens on a ticket branch, which is then PRed into main. Once the resulting containers are built and tested and we feel they are ready to be promoted to container builds, the changes are fed into prod_update either by rebase or cherry-picking. (At the moment, main and prod_update should generally be the same; if we find ourselves in another situation like we did near the end of Jupyterlab 2.x, it is possible that main will have severely diverged and cherry-picking changes will be necessary.)

Once prod_update is ready, changes from that branch are merged into prod. Our current (Jenkins) CI process builds containers from prod rather than main.

2   Build Process

GNU Make is used to drive the build process. The Makefile accepts three arguments and has three useful targets.

The arguments are as follows:

  1. tag – mandatory: this is the tag on the input DM Stack container, e.g. w_2021_50. If it starts with a v that v becomes an r in the output version.
  2. image – optional: this is the name of the image you’re building and pushing. It defaults to docker.io/lsstsqre/sciplat-lab.
  3. supplementary – optional: if specified, this turns the build into an experimental build where the tag starts with exp_ and ends with _<supplementary>.

The targets are one of:

  1. dockerfile – just generate the Dockerfile from the template and the arguments. Do not build or push.
  2. image – build the Lab container, but do not push it.
  3. push – build and push the container.

“push” is a synonym for “all” and is the default. “build” is a synonym for “image”. Note that we assume that the building user already has appropriate push credentials for the repository to which the image is pushed, and that no docker login is needed.

2.1   Dockerfile template substitution

Dockerfile.template looks like it’s ready for Jinja 2: we’re substituting {{TAG}}, {{IMAGE}}, and {{VERSION}}. It’s nothing that sophisticated. We just run sed in the dockerfile target of the Makefile.

2.2   Examples

Build and push the weekly 2021_50 container: make tag=w_2021_50.

Build and push an experimental container with a newnumpy supplementary tag: make tag=w_2021_50 supplementary=newnumpy.

Just create the Dockerfile for w_2021_49: make dockerfile tag=w_2021_49.

Build the newnumpy container, but don’t push it: make image tag=w_2021_50 supplementary=newnumpy.

Build and push w_2021_50 to ghcr.io: make tag=w_2021_50 image=ghcr.io/lsst-sqre/sciplat-lab.

3   Modifying Lab container Contents

This is probably why you’re reading this document.

You will need to understand the structure of Dockerfile.template a little. It is very likely that the piece you need to modify is in one of the stage*.sh scripts, although it is plausible that what you want is actually one of the container setup-at-runtime pieces.

3.1   stage*.sh scripts

Most of the action in the Dockerfile comes from five shell scripts executed by docker build as RUN actions.

These are, in order:

  1. rpm – we will always be building on top of centos in the current regime. This stage first reinstalls all the system packages but with man pages this time (the Stack container isn’t really designed for interactive use, but ours is), and then adds some RPM packages we require, or at least find helpful, for our user environment.
  2. os – this installs os-level packages that are not packaged via RPM. Currently the biggest and hairiest of these is TeXLive–the conda TeX packaging story is not good, and if we don’t install TeXLive a bunch of the export-as options in JupyterLab will not work.
  3. py – this is probably where you’re going to be spending your time. Mamba is faster and reports errors better than conda, so we install and then use it. Anything that is packaged as a Conda package should be installed from conda-forge. However, that’s not everything we need. Thus, the first thing we do is add all the Conda packages we need. Then we do a pip install of the rest, and a little bit of bookkeeping to create a kernel for the Stack Python. It is likely that what you need to do will be done by inserting (or pinning versions of) python packages in the mamba or pip sections.
  4. jup – this is for installation of Jupyter packages–mostly Lab extensions, but there are also server and notebook extensions we rely upon. Use pre-built Lab extensions if at all possible, which will mean they are packaged as conda-forge or pip-installable packages and handled in the previous Python stage.
  5. ro – this is Rubin Observatory-specific setup. This, notably, creates quite a big layer because, among other things, it checks out the tutorial notebooks as they existed at build time, and people keep checking large figure outputs into these notebooks.

3.2   Other files

The rest of the files in this directory are either things copied to various well-known locations (for example, all the local*.sh files end up in /etc/profile.d) or they control various aspects of the Lab startup process. For the most part they are moved into the container by COPY statements in the Dockerfile.

runlab.sh is the other file you are likely to need to modify. This is executed, as the target user, and the last thing it does is start jupyterlab (well, almost: it also knows if it’s a dask worker or a noninteractive container, and does something different in those cases).

3.3   Indentation conventions

There’s a lot of shell scripting in here. Please use four-space indentations, and convert tabs to spaces, if you’re working on the scripts.