If you are interested in step-by-step instructions to start working with CMS Open Data, please consult these pages:
This page offers hints, tips and guidance for conducting a research-oriented analysis using CMS Open Data. More detailed information can be found in the CMS Open Data Guide.
I want to get a general introduction into HEP and CMS software and terminology, with a simplified event format.
I want to learn about the terms under which I can access and use the CMS Open Data, and publish results obtained from them.
I want to get inspiration for some potential physics topics.
I want to learn about the nature of the CMS physics objects and the corresponding variables and terminology.
I want to follow a set of detailed tutorials to learn how to analyze CMS Open Data.
CMS has released data proton collision data from Run 1 and Run 2, as well as heavy ion collision data from Run 1.
High-energy proton collisions:
Collisions | Energy (TeV) | Simulation | Getting Started | CMSSW version |
---|---|---|---|---|
proton-proton 2010 | 7 | 2010 simulation | AOD data | CMSSW_4_2_8 |
proton-proton 2011 | 7 | 2011 simulation | AOD data | CMSSW_5_3_32 |
proton-proton 2012 | 8 | 2012 simulation | AOD data | CMSSW_5_3_32 |
proton-proton 2015 | 13 | 2015 simulation | MiniAOD data | CMSSW_7_6_7 |
proton-proton 2016 | 13 | 2016 simulation | MiniAOD data NanoAOD data |
CMSSW_10_6_30 Not required |
For Run 1 data, the 2010 datasets are smaller and offer a better environment for low-momentum, low-pileup studies. The 2011-2012 datasets are suitable for replicating CMS Run 1 physics results or performing new searches or studies at 7 - 8 TeV collision energy. Considering Run 2, the 2015 dataset is smaller than the 2011-2012 Run 1 datasets, but offered the first look at 13 TeV collisions and a much broader array of simulation. The 2016 13 TeV dataset (released as of 2024) has a similar luminosity to the Run 1 datasets, and offers a more advanced computing environment and new identification algorithms for Run 2. Information on the respective luminosities and pile-up rates vs time can be found in public CMS luminosity information.
Heavy-ion program:
Collisions | Energy (TeV) | Simulation | Getting Started | CMSSW version |
---|---|---|---|---|
lead-lead 2010 | 2.76 | 2010-2011 Pb-Pb simulation | Pb-Pb 2010 | CMSSW_3_9_2_patch5\* |
lead-lead 2011 | 2.76 | 2010-2011 Pb-Pb simulation | Pb-Pb 2011 | CMSSW_4_4_7\* |
proton-proton 2011 | 2.76 | N/A | Pb-Pb 2011 | CMSSW_4_4_7 |
proton-proton 2013 | 2.76 | 2013 p-p simulation | p-Pb data | CMSSW_5_3_20 |
proton-lead 2013 | 5.02 | 2013 p-Pb simulation | p-Pb data | CMSSW_5_3_20 |
proton-proton 2015 | 5.02 | N/A | p-Pb data | CMSSW_7_5_8_patch3 |
* The Pb-Pb simulation linked in these rows was produced later and should be analyzed using CMSSW_5_3_20.
The Pb-Pb collisions from 2010 and 2011 are accompanied by "reference" proton-proton collisions at the same energy, collected during 2011 and 2013. The p-Pb collisions from 2013 are accompanied by referece proton-proton collisions collected during 2015. Some simulations have also been released that correspond to the heavy-ion collisions, as well as some of the reference collision data.
Visualizing CMS events is a very helpful way to get acquainted with the CMS detector and the features of different datasets. Software is provided to produce event display files from the Run 1 datasets, but many events are already available in this format for viewing on the web:
Open File
→ Open Files from Web
→ choose a year, and choose a dataset.Example analyses help demonstrate how analysts can process CMS data files to accomplish a real physics goal. Examples range from data validation exercises to full searches. The "Getting Started" pages linked in the table able all offer links to example analyses.
I want to find out how to use the trigger and trigger prescale information in the dataset I am interested in.
I want to find out how to access the luminosity information for the dataset I am interested in and how to select "good data" only.
I want to find the luminosity of my dataset, possibly constrained by using specific triggers.
I want to find out whether I need condition data base information, and if so, how to access it.
How do I interpret the simulated dataset names?
I want to find the generator cross section of a particular simulation.
I want to find the effective luminosity of my simulated dataset.
I want information that is not documented here and elsewhere on the CERN Open Data portal.
I ran into a problem and need help!