CMSDAS Pre-Exercise 3: ROOT and python basics

Overview

Teaching: 0 min
Exercises: 60 min
Questions
Objectives
  • Learn how to use ROOT and python

  • Use ROOT and python to inspect a CMS NanoAOD file

Questions

For this lesson, please submit your answers for the CMSDAS@LPC2025 Google Form 3.

Python and ROOT

Python and ROOT are two of the most important software tools in HEP. If you have never used python before, we highly recommend you go through a tutorial, for example the HSF python lesson. If you are comfortable with basic python, feel free to proceed, as you will learn by doing in the following exercises.

For ROOT, please follow the lesson on the HEP Software Foundation website for ROOT, up through at least the fifth lesson, “05-tfile-read-write-ttrees.ipynb” (of course, you can keep going and learn about RDataFrames, but we won’t use them here). We recommend you click the “SWAN” button, which opens a session in CERN’s “Service for Web-based ANalysis.” From the service, you can open and run code through Jupyter notebooks, all inside the web browser.

Inspect a NanoAOD file with ROOT

Once you’re comfortable with python and ROOT, let’s go back to cmslpc and look at some real CMS data. Login to the cluster again from your computer:

kinit <YourUsername>@FNAL.GOV
ssh -Y <YourUsername>@cmslpc-el8.fnal.gov

By default, your shell environment will not have ROOT, and will have some very old version of python. A quick way to setup ROOT is to use the LCG (LHC Computing Grid) releases. All you have to do is execute the following script (once per login):

source /cvmfs/sft.cern.ch/lcg/views/LCG_106a/x86_64-el8-gcc11-opt/setup.sh

First, let’s inspect a ROOT file using ROOT’s built-in C++ interpreter, CINT. Run the following to launch ROOT and simultaneously open a file (the same file we copied from EOS in the previous exercise).

cd ~/nobackup/cmsdas
root -l DYJetsToLL_M50_NANOAOD.root
# Note: the -l option stops ROOT from displaying its logo image, which is very slow over SSH

CINT is a quick way to inspect files (to exit CINT/ROOT, just type .q). First, let’s see what’s in the file. Enter _file0->ls() into the interpreter:

root [1] _file0->ls()
TFile**		DYJetsToLL_M50_NANOAOD.root	
 TFile*		DYJetsToLL_M50_NANOAOD.root	
  KEY: TTree	Events;1	Events

The file contains just one interesting object, a TTree named Events. Next, let’s check the basic contents of Events. For the number of events in the TTree:

root [4] Events->GetEntries()
(long long) 10000

For the branches in Events, try:

root [6] Events->Print()
******************************************************************************
*Tree    :Events    : Events                                                 *
*Entries :    10000 : Total =        65055196 bytes  File  Size =   19389904 *
*        :          : Tree compression factor =   3.33                       *
******************************************************************************
*Br    0 :run       : run/i                                                  *
*Entries :    10000 : Total  Size=      40621 bytes  File Size  =        410 *
*Baskets :        2 : Basket Size=      29696 bytes  Compression=  97.91     *
*............................................................................*
*Br    1 :luminosityBlock : luminosityBlock/i                                *
*Entries :    10000 : Total  Size=      40693 bytes  File Size  =        493 *
*Baskets :        2 : Basket Size=      29696 bytes  Compression=  81.48     *
*............................................................................*
*Br    2 :event     : event/l                                                *
*Entries :    10000 : Total  Size=      80641 bytes  File Size  =      20690 *
*Baskets :        2 : Basket Size=      58880 bytes  Compression=   3.87     *
...
...
...
*............................................................................*
*Br 1626 :HLTriggerFinalPath : HLTriggerFinalPath/O                          *
*Entries :    10000 : Total  Size=      10705 bytes  File Size  =        280 *
*Baskets :        2 : Basket Size=       8192 bytes  Compression=  36.34     *
*............................................................................*
*Br 1627 :L1simulation_step : L1simulation_step/O                            *
*Entries :    10000 : Total  Size=      10699 bytes  File Size  =        279 *
*Baskets :        2 : Basket Size=       8192 bytes  Compression=  36.46     *
*............................................................................*

This prints out a super long list of branches, because NanoAOD assigns a branch for every trigger in CMS. You can filter to looks at, e.g., only muon branches:

root [8] Events->Print("Muon*")
******************************************************************************
*Tree    :Events    : Events                                                 *
*Entries :    10000 : Total =        65055196 bytes  File  Size =   19389904 *
*        :          : Tree compression factor =   3.33                       *
******************************************************************************
*Br    0 :Muon_dxy  : Float_t dxy (with sign) wrt first PV, in cm            *
*Entries :    10000 : Total  Size=      67214 bytes  File Size  =      29269 *
*Baskets :        4 : Basket Size=      32000 bytes  Compression=   2.28     *
*............................................................................*
*Br    1 :Muon_dxyErr : Float_t dxy uncertainty, in cm                       *
*Entries :    10000 : Total  Size=      67222 bytes  File Size  =      21835 *
*Baskets :        4 : Basket Size=      32000 bytes  Compression=   3.05     *
...

Question 3.1

The method Long64_t TTree:GetEntries (const char *selection) accepts a selection string, and returns the number of events passing the selection criteria (note: written in C++). Use this method to get the number of events with two reconstructed muons:

root [0] Events->GetEntries("nMuon >= 2")

Write the number of events with at least 2 muon candidates in the Google form.

Plotting with pyROOT

You can also use ROOT in python, almost identically to CINT except with python instead of C++. (This is possible because of pyROOT, a wrapper around ROOT that creates a nearly 1-to-1 map of all the C++ classes to python classes.) Make sure you’re logged into cmslpc, and that you have called the LCG setup script in the session. Then, let’s reopen the NanoAOD file in python. Start a python interactive session by entering python3 (type ctrl-d or exit() to quit), then enter the following into the python interpreter:

import ROOT
f = ROOT.TFile("DYJetsToLL_M50_NANOAOD.root", "READ")
f.ls()

You should see the same contents of the file as before.

Long64_t TTree::Draw(const char * varexp, const char * selection, ...) is a fast way to make a plot from a TTree. The first argument, varexp, is what you want to plot (accepts single branches as well as expressions; the string is compiled on the fly by CINT). The second argument, selection, allows you to filter events before plotting. Let’s use this function to plot the generator-level Z boson mass distribution, i.e., the truth particles generated by Madgraph (versus the reconstructed particles after the CMS detector simulation has been run). The branch GenPart_mass contains the mass of all generator-level particles. The branch GenPart_pdgId contains the so-called PDG ID of the generator-level particles; Z bosons are assigned a PDG ID of 23.

events = f.Get("Events")
c = ROOT.TCanvas()
events.Draw("GenPart_mass", "GenPart_pdgId==23")
c.Draw()

A plot of the Z boson mass should appear, with a mean value close to the Z boson mass of 91.1876 GeV. (If no plot appears, something is probably wrong with the X window system that displays graphical windows over SSH. Following the cmslpc instructions, make sure you have an X windows program installed on your computer, and that you logged in using ssh -Y; ask Mattermost for more help.)

Question 3.2: Z mass plot

The plot includes a “stat box” with basic information about the plotted histogram. Please fill in the mean of the distribution in the Google form.

Key Points

  • ROOT and python are two key software tools in HEP.

  • Many CMS analyses use the NanoAOD format, which are simple ROOT ntuples that can be analyzed with standalone ROOT or pyROOT.

  • There are numerous ways to use ROOT, including the build-in command line interface (based on CINT, a C++ interpreter), pyROOT, Jupyter notebooks, compiled C++, and more.