You can run CMS analysis code in a Docker container provided together with the CMS open data. If you have not already installed Docker, instructions for installation are provided by Docker. For an introduction and for getting started, you can follow the links provided in the CMS Open data guide.
The quickest way to get started analyzing CMS open data is to consider data in the NanoAOD format, either derived data from Run 1 or NanoAOD from Run 2. These file formats can be analyzed using ROOT and python tools. The container images linked here contain the packages required to access open data files over the network.
ROOT container: this image contains ROOT 6 coupled with Python 3 and a VNC server for access to the graphical user interface.
Python container: this images contains Python 3 with a HEP software stack and a VNC server for access to the graphical user interface. HEP-related python packages in this image include awkward
, uproot
, numpy
, matplotlib
, and jupyterlab
.
Software |
Container image (dockerhub) | Alternative image location (GitLab) |
---|---|---|
ROOT | cmscloud/root-vnc | gitlab-registry.cern.ch/cms-cloud/root-vnc:latest |
Python | cmscloud/python-vnc | gitlab-registry.cern.ch/cms-cloud/python-vnc:latest |
For the first access of each set of CMS open data, you will need a specific container image containing the software corresponding to that particular set of data. The following images are available:
CMS open data |
CMSSW version |
Container image (dockerhub) Alternative image location (GitLab) |
---|---|---|
2016 proton-proton | CMSSW_10_6_30 | cmsopendata/cmssw_10_6_30-slc7_amd64_gcc700 gitlab-registry.cern.ch/cms-cloud/cmssw-docker-opendata/cmssw_10_6_30-slc7_amd64_gcc700 |
2015 proton-proton | CMSSW_7_6_7 | cmsopendata/cmssw_7_6_7-slc6_amd64_gcc493 gitlab-registry.cern.ch/cms-cloud/cmssw-docker-opendata/cmssw_7_6_7-slc6_amd64_gcc493 |
2015 proton-proton heavy-ion reference data at 5.02TeV | CMSSW_7_5_8_patch3 | cmsopendata/cmssw_7_5_8_patch3-slc6_amd64_gcc491 gitlab-registry.cern.ch/cms-cloud/cmssw-docker-opendata/cmssw_7_5_8_patch3-slc6_amd64_gcc491 |
2013 proton-lead and proton-proton heavy-ion reference data | CMSSW_5_3_20 | cmsopendata/cmssw_5_3_20-slc6_amd64_gcc472 gitlab-registry.cern.ch/cms-cloud/cmssw-docker-opendata/cmssw_5_3_20-slc6_amd64_gcc472 |
2011-2012 proton-proton | CMSSW_5_3_32 | cmsopendata/cmssw_5_3_32-slc6_amd64_gcc472 gitlab-registry.cern.ch/cms-cloud/cmssw-docker-opendata/cmssw_5_3_32-slc6_amd64_gcc472 |
2011 heavy-ion | CMSSW_4_4_7 | cmsopendata/cmssw_4_4_7-slc5_amd64_gcc434 gitlab-registry.cern.ch/cms-cloud/cmssw-docker-opendata/cmssw_4_4_7-slc5_amd64_gcc434 |
2010 proton-proton | CMSSW_4_2_8 | cmsopendata/cmssw_4_2_8-slc5_amd64_gcc434 gitlab-registry.cern.ch/cms-cloud/cmssw-docker-opendata/cmssw_4_2_8-slc5_amd64_gcc434 |
2010 proton-proton with CASTOR calorimeter | CMSSW_4_2_8_lowpupatch1 | cmsopendata/cmssw_4_2_8_lowpupatch1-slc5_amd64_gcc434 gitlab-registry.cern.ch/cms-cloud/cmssw-docker-opendata/cmssw_4_2_8_lowpupatch1-slc5_amd64_gcc434 |
2010 heavy-ion | CMSSW_3_9_2_patch5 | cmsopendata/cmssw_3_9_2_patch5-slc5_amd64_gcc434 gitlab-registry.cern.ch/cms-cloud/cmssw-docker-opendata/cmssw_3_9_2_patch5-slc5_amd64_gcc434 |
In the following instructions, make sure to replace the example container image name according to the table above.
These commands are for 2016 proton-proton data, with the CMSSW version 10_6_30 and the cmssw_10_6_30-slc7_amd64_gcc700
container image.
Once you have installed Docker on your computer, you can fetch a container image, and create and start a container using the docker run
command:
docker run -it --name my_od -P -p 5901:5901 -p 6080:6080 cmsopendata/cmssw_10_6_30-slc7_amd64_gcc700 /bin/bash
Here we fetch the cmssw_10_6_30-slc7_amd64_gcc700
docker image from dockerhub and name the container my_od
.
This will install a stand-alone CMSSW image (several gigabytes). Therefore this may take a while. However, the image will only have to be downloaded once. The following will appear in your terminal, with messages changing during the download:
$ docker run -it --name my_od -P -p 5901:5901 -p 6080:6080 cmsopendata/cmssw_10_6_30-slc7_amd64_gcc700 /bin/bash
Unable to find image 'cmsopendata/cmssw_10_6_30-slc7_amd64_gcc700:latest' locally
latest: Pulling from cmsopendata/cmssw_10_6_30-slc7_amd64_gcc700
8e644b3666d3: Already exists
945e96025c00: Pull complete
41a70f52f56f: Pull complete
77c4aea19d7c: Pull complete
3e40d434bd23: Pull complete
52d966019a75: Pull complete
913ddaff535b: Pull complete
2a41aaf2ef99: Pull complete
6e773ee02fe9: Pull complete
878de2d80b06: Pull complete
c59f44225a9d: Pull complete
f0782ac1f652: Pull complete
4d506d893fa2: Pull complete
3f1785fba3dc: Pull complete
Digest: sha256:56ef1955c399912f4cdf53e91b39d66aca04a084d8a3a1002a7e27500ac1efa0
Status: Downloaded newer image for cmsopendata/cmssw_10_6_30-slc7_amd64_gcc700:latest
Setting up CMSSW_10_6_30
CMSSW should now be available.
This is a standalone image for CMSSW_10_6_30 slc7_amd64_gcc700.
Once done, in a CMSSW container you should see the commmand prompt for the CMSSW instance within Docker:
(/code/CMSSW_10_6_30/src)
If you are using a linux distribution on WSL2, and do not get this prompt, but get back to your local terminal prompt, see the instructions below under "Running CMS open data containers on WSL2".
Non-CMSSW containers: in the ROOT or Python containers the pull output will look very similar to the example above,
but without any messages about setting up CMSSW. After the image has been downloaded you will see a command prompt in the code/
directory:
(/code)
In the following, some useful commands are given. For a complete list of commands, see the docker command line documentation.
When you want to exit the container simply type exit
.
If you want to restart the container (e.g. the one named my_od
) and return to your work then use the command
docker start -i my_od
You can remove the container my_od
with
docker rm my_od
This does not remove the image, which took long to download. You can create a new container from that image with the same docker run ...
command as above, but it will be much faster than the first time.
If the container was created and started using the --rm
option (e.g. docker run --rm ...
) then the container will be removed when you exit.
You can copy file out of a runnning container to your local computer. Create an file in the container (for example) with
echo $SHELL > $HOME/example.txt
In order to copy this file out of a running container, open another terminal of your local computer and run one of the following commands:
docker cp my_od:/home/cmsusr/example.txt . # for CMSSW container
docker cp my_od:/code/example.txt . # ROOT or Python container
Likewise, in order to copy a file into a running container:
docker cp <my file> my_od:/home/cmsusr/ # for CMSSW container
docker cp <my file> my_od:/code # for ROOT or Python container
It is possible to create a local directory system and mount it in the container, so that files are shared automatically without needing to copy in and out of the container. If you already have a container, exit from the container and remove it using the docker rm command. For CMSSW containers in particular it is important to create the working area with the proper permissions before creating the container.
export workpath=$PWD
mkdir cms_open_data_work
chmod -R 777 cms_open_data_work
Then create a container that includes mounting information through the -v
option.
docker run -it --name my_od -P -p 5901:5901 -p 6080:6080 -v ${workpath}/cms_open_data_work:/code cmsopendata/cmssw_10_6_30-slc7_amd64_gcc700 /bin/bash
You may need to submit a command from your local host into a running container. For example, to see the running processes in the my_od
container, run:
docker exec my_od ps -ef
For opening graphics windows, the container image has a VNC application installed. Start the VNC application in the container with
start_vnc
You can either install a VNC viewer (e.g. TigerVNC) on your local computer (Linux, MacOS or Windows) and start the viewer there, or open the graphics window in your browser with the http address given in the message.
Connect with the default VNC password cms.cern
.
Each time you exit from the container, close the VNC application with
stop_vnc
You can find more details on the configuration and usage of VNC in the CMS open data containers in the image repository.
If you are running on a Linux computer, you can also use X11 forwarding. If you already started a container name my_od
and now decide to use X11 forwarding instead of VNC, exit from the container shell with exit
, remove the existing container with docker rm my_od
. Then start a new container with
docker run -it --name my_od --net=host --env="DISPLAY" -v $HOME/.Xauthority:/home/cmsusr/.Xauthority:rw cmsopendata/cmssw_10_6_30-slc7_amd64_gcc700 /bin/bash
You can test if the graphics window opens by typing in the container shell
root
In the root [0]
prompt, type
TBrowser t
.q
in the root[..]
prompt, or from the browser window menu.
If you are new to ROOT, have a quick look to the Getting started page, or follow the links in the CMS open data guide.
If the container prompt causes trouble for line wrapping, increase the size of the terminal. If it does not help, you can change the prompt with
export PS1="(\w) "
To change it permanently, add this line to the file /home/cmsusr/.bashrc
in the container.
The Python container supports jupyter notebook usage though the container. To access this tool, create a new python container with port 8888 enabled as well as the other ports in the original example:
docker run --rm -it -P -p 5901:5901 -p 6080:6080 -p 8888:8888 gitlab-registry.cern.ch/cms-cloud/python-vnc:latest
Inside the container, launch jupyter-lab with the following command. You will see output that includes a link:
$ jupyter-lab --ip=0.0.0.0 --no-browser
[I 2024-02-15 16:14:59.384 ServerApp] jupyterlab | extension was successfully linked.
[I 2024-02-15 16:14:59.395 ServerApp] nbclassic | extension was successfully linked.
[I 2024-02-15 16:14:59.397 ServerApp] Writing Jupyter server cookie secret to /home/cmsusr/.local/share/jupyter/runtime/jupyter_cookie_secret
[I 2024-02-15 16:14:59.953 ServerApp] notebook_shim | extension was successfully linked.
[I 2024-02-15 16:14:59.990 ServerApp] notebook_shim | extension was successfully loaded.
[I 2024-02-15 16:14:59.992 LabApp] JupyterLab extension loaded from /usr/local/venv/lib/python3.10/site-packages/jupyterlab
[I 2024-02-15 16:14:59.992 LabApp] JupyterLab application directory is /usr/local/venv/share/jupyter/lab
[I 2024-02-15 16:14:59.997 ServerApp] jupyterlab | extension was successfully loaded.
[I 2024-02-15 16:15:00.010 ServerApp] nbclassic | extension was successfully loaded.
[I 2024-02-15 16:15:00.010 ServerApp] Serving notebooks from local directory: /code
[I 2024-02-15 16:15:00.010 ServerApp] Jupyter Server 1.18.1 is running at:
[I 2024-02-15 16:15:00.010 ServerApp] http://ae4189a5ed44:8888/lab?token=bf6db43d28d8073b3859885d8ffcf8693785cc0d59a146ca
[I 2024-02-15 16:15:00.010 ServerApp] or http://127.0.0.1:8888/lab?token=bf6db43d28d8073b3859885d8ffcf8693785cc0d59a146ca
[I 2024-02-15 16:15:00.010 ServerApp] Use Control-C to stop this server and shut down all kernels (twice to skip confirmation).
[C 2024-02-15 16:15:00.015 ServerApp]
To access the server, open this file in a browser:
file:///home/cmsusr/.local/share/jupyter/runtime/jpserver-9-open.html
Or copy and paste one of these URLs:
http://ae4189a5ed44:8888/lab?token=bf6db43d28d8073b3859885d8ffcf8693785cc0d59a146ca
or http://127.0.0.1:8888/lab?token=bf6db43d28d8073b3859885d8ffcf8693785cc0d59a146ca
Paste the http
link into your browser to access a jupyter-lab session where you can create notebooks and see your existing notebooks.
If you have read the instructions above, you can now follow the getting started instructions for CMS AOD data (Run 1), MiniAOD data (Run 2), or NanoAOD data (Run 2).
The CMSSW open data containers, or any CentOS6-based containers, may fail if docker is run on WSL2. This problem is fixed by adding a new file .wslconfig
with the following contents
[wsl2]
kernelCommandLine = vsyscall=emulate
in the \Users\<username>
folder (make sure that it is saved without extension), then shutting down with wsl --shutdown
in the Windows command prompt and restarting again.
Test that the settings are properly passed by doing, in the WSL2 linux installation:
docker run -ti ubuntu cat /proc/cmdline
The ouput should contain vsyscall=emulate
, e.g.:
initrd=\initrd.img panic=-1 pty.legacy_count=0 nr_cpus=4 vsyscall=emulate
The CMS open data container images contain the software needed for analysis, and the CMS condition database can be accessed from predefined locations. In the container images for standard proton-proton data, they are stored in a local /cvmfs
file system. Therefore, when using these containers, access to the namespace /cvmfs
(CernVM-File System) at CERN for software and condition data access is not mandatory.
If desired, it is possible to "see" the full cvmfs space by installing the cvmfs client following the official instructions. In essence, there are two basic ways to achieve this:
The preferred option is to install the cvmfs client locally, on the host machine, and mount it on the container:
docker run --name my_od -it -v "/cvmfs/cms-opendata-conddb.cern.ch:/cvmfs/cms-opendata-conddb.cern.ch:shared" cmsopendata/cmssw_10_6_30-slc7_amd64_gcc700 /bin/bash
Do not mount the full /cvmfs
or /cvmfs/cms.cern.ch
areas as that will overwrite necessary settings in the local /cvmfs
area of the container.
The other option is to install the cvmfs client directly in the container after it is created (only working for the slc6-based containers). For this, the container needs to get started in privileged mode like
docker run --privileged --name my_od -it cmsopendata/cmssw_10_6_30-slc7_amd64_gcc700 /bin/bash