Python How Can I Improve Nested For Loops in Python for Combining Cut Expressions?

ChrisVer · Mar 10, 2017

Hello, well I am not sure how to search for this online, but I raise this question here:

Suppose that I have several bins of let's say cuts, here I list 3 but the main idea is to make their numbers tunable by the user:

Python:

cut1 = { "p10" : "p <= 10" ,
        "11p20 " : "10<p && p<=20" ,
        "21p " : "p>20" ,
        }

cut2 = { "BoolPass" : "bool" ,
        "BoolFail" : "!bool" ,
        }
cut3 = { "High" : "T>=0" ,
        "Low" :  "T<0",
         }

and I want to combine all of them together

Python:

cut_comb = dict()
for cutname1,cut1 in cut1.items():
    cut_comb[cutname1] =dict()
    for cutname2,cut2 in cut2.items():
         cut_comb[cutname1][cutname2] =dict()
         for cutname3,cut3 in cut3.items():
              cut_comb[cutname1][cutname2][cutname3] = add_cuts( [cut1,cut2,cut3] , name="{}_{}_{}".format(cutname1,cutname2,cutname3) )

Where add_cuts is just a function which loops over the cut expression and concatenates the cut expressions approriately...eg:
cut_comb["p10"]["BoolPass"]["High"] would be: "(p<=10)&&(bool)&&(T<0)"
How could I make the so ugly for-in-for-in-for loop look better [and also be able to change some cuts, eg remove the cut3, without having to rewrite the code]... I was thinking about using recursion on a list that contains [cut1,cut2,cut3], but I don't know I get stuck in the logic somewhere, in particular the increasing dict type of cut_comb...maybe I could change the whole name of the key? instead of having 3 keys for example, to have only one with concatenated the expressions, eg:
cut_comb["p10_BoolPass_High"] would be: "(p<=10)&&(bool)&&(T<0)"
But still I'm not sure.

Mark44 · Mar 10, 2017

ChrisVer said:
Hello, well I am not sure how to search for this online, but I raise this question here:

Suppose that I have several bins of let's say cuts, here I list 3 but the main idea is to make their numbers tunable by the user:
Python:
cut1 = { "p10" : "p <= 10" ,
        "11p20 " : "10<p && p<=20" ,
        "21p " : "p>20" ,
        }

cut2 = { "BoolPass" : "bool" ,
        "BoolFail" : "!bool" ,
        }
cut3 = { "High" : "T>=0" ,
        "Low" :  "T<0",
         }
and I want to combine all of them together
Python:
cut_comb = dict()
for cutname1,cut1 in cut1.items():
    cut_comb[cutname1] =dict()
    for cutname2,cut2 in cut2.items():
         cut_comb[cutname1][cutname2] =dict()
         for cutname3,cut3 in cut3.items():
              cut_comb[cutname1][cutname2][cutname3] = add_cuts( [cut1,cut2,cut3] , name="{}_{}_{}".format(cutname1,cutname2,cutname3) )
Where add_cuts is just a function which loops over the cut expression and concatenates the cut expressions approriately...eg:
cut_comb["p10"]["BoolPass"]["High"] would be: "(p<=10)&&(bool)&&(T<0)"
How could I make the so ugly for-in-for-in-for loop look better [and also be able to change some cuts, eg remove the cut3, without having to rewrite the code]... I was thinking about using recursion on a list that contains [cut1,cut2,cut3], but I don't know I get stuck in the logic somewhere, in particular the increasing dict type of cut_comb...maybe I could change the whole name of the key? instead of having 3 keys for example, to have only one with concatenated the expressions, eg:
cut_comb["p10_BoolPass_High"] would be: "(p<=10)&&(bool)&&(T<0)"
But still I'm not sure.

It's not clear to me what you're trying to do, especially the creation of a dictionary that you would have to drill into so deeply. Can you provide an example of a scenario that would require what you're trying to do?

ChrisVer · Mar 10, 2017

For example suppose you want to draw the efficiencies in different detector parts (eg in eta), so you'll have to have a cut that separates the detector into |\eta|< X and |\eta| >X.
Then again you might want to look into different pt ranges, so you will have the cuts p_T < Y , Y<p_T < Z , p_T>Z...
This already tells you that you will have 2*3 = 6 efficiency plots... namely the h(|\eta|<X , p_T < Y) ~,~h(|\eta|>X, p_T < Y)~,~... which are taken (in my case) by applying the cut on my ntuples.
and in a similar manner you want to keep adding or maybe removing cuts... (that's why the number of the for loops in the piece of code I wrote should be a variable).

The histograms are obtained mainly by applying the TTree::Draw() method, where you specify which variable you want to take from the tree (eg you want to calculate the efficiency vs the phi) streamed into a pre-set histogram and the cut selection... so eg:

Python:

mytree.Draw( "jet_phi>>hpass", "selection"+"&&pass" )
mytree.Draw( "jet_phi>>htotal", "selection")
h_eff = some_method_to_get_eff_from(hpass,htotal)

the selection will of course be different everytime (specified by my comb_cut).

One maybe unclear thing is that the values of my cut_i dictionaries are not strings, but objects of some type, call it Cut (pretty similar to TCut), that's why I said that add_cuts is dealing approriately with them.

Mark44 · Mar 10, 2017

ChrisVer said:

One maybe unclear thing is that the values of my cut_i dictionaries are not strings

They sure look like strings to me.

Python:

cut2 = { "BoolPass" : "bool" ,
        "BoolFail" : "!bool" ,
        }

The first key is "BoolPass", a string, and its value is "bool", another string. So cut2["BoolPass"] evaluates to "bool".

It will take me some time to digest the rest of your post.

ChrisVer · Mar 10, 2017

Mark44 said:

The first key is "BoolPass", a string, and its value is "bool", another string. So cut2["BoolPass"] evaluates to "bool".

Well, the full glory one is:

Python:

fout= ROOT.TFile("Save_Eff_Histos.root","RECREATE")

#list of cuts
cut_lowTrkMultiplicity  = Cut("low_track_multi", "jet_n_tracks<3")
cut_highTrkMultiplicity= Cut("high_track_multi", "jet_n_tracks >=3")
cut_eta_endcap      = Cut("endcap", "abs(jet_eta)>1.5")
cut_eta_barrel        = Cut("barrel", "abs(jet_eta)<1.5")
cut_low_pt              = Cut("low_pt_less100", "jet_pt<100.")
cut_med_pt            = Cut("med_pt_100_300", "jet_pt>=100. && jet_pt<300.")
cut_high_pt            = Cut("high_pt_greater300", "jet_pt>=300.")
cut_passBjet           = Cut("bjet", "jet_b_tag==1")

#group them
cuts_pt = { "pt_high" : cut_high_pt ,
        "pt_med " : cut_med_pt ,
        "pt_low" : cut_low_pt ,
        }
cuts_eta = { "Barrel" :  cut_eta_barrel,
        "Endcap" : cut_eta_endcap  ,
        }
cuts_trackMultiplicity = { "High" :  cut_highTrkMultiplicity,
        "Low" : cut_lowTrkMultiplicity  ,
        }

cuts_comb = dict()
h_pass       = dict()
h_total       = dict()
gr_eff         = dict()
for eta_name, eta in cuts_eta.items(): # Barrel and Endcap
     cuts_comb[eta_name]=dict()
     h_pass[eta_name]=dict()
     h_total[eta_name]=dict()
     gr_eff[eta_name] =dict()
     for ntracks_name, ntracks in cuts_trackMultiplicity.items(): #Low and High Ntracks
           cuts_comb[eta_name][ntracks_name]=dict()
           h_pass[eta_name][ntracks_name]=dict()
           h_total[eta_name][ntracks_name]=dict()
           gr_eff[eta_name][ntracks_name]=dict()
           for ptRange_name, ptRange in cuts_pt.items(): # Low,Med or High Pt's

                  #obtain cuts to use
                  cuts_comb[eta_name][ntracks_name][ptRange_name] = add_cuts( [eta,ntracks,ptRange] , "{}_{}_{}".format(eta_name, ntracks_name, ptRange_name) )
                  cuts_total = cuts_comb[eta_name][ntracks_name][ptRange_name]
                  cuts_pass = add_cuts( [cuts_total , cut_passBjet] , cuts_total.name + "_pass" )

                  #get histograms
                  h_pass[eta_name][ntracks_name][ptRange_name] = mysame.getHistogram(var="jet_phi", cuts=cuts_pass, name="h_"+cuts_pass.name )
                  h_total[eta_name][ntracks_name][ptRange_name] = mysame.getHistogram(var="jet_phi", cuts=cuts_total, name="h_total_"+cuts_total.name )
                  #get efficiency graph
                  gr_eff[eta_name][ntracks_name][ptRange_name] =  ROOT.TGraphAsymmErrors( h_pass[eta_name][ntracks_name][ptRange_name] , h_total[eta_name][ntracks_name][ptRange_name], "cl=0.683 b(1,1) mode" )
                  gr_eff[eta_name][ntracks_name][ptRange_name].SetTitle("h_eff_"+cuts_total.name)

                  #save them in an output file
                  fout.WriteTObject( gr_eff[eta_name][ntracks_name][ptRange_name] )

and on that suppose I want to add, or remove one of my separations (eg pt is doing nothing, so better remove it =remove a for loop and change everything in the remaining ones appropriately - or maybe the Nvertices might affect the efficiency: add bins of Nvertex=add an extra for loop )... it makes it tiring, dangerous and so inefficient.

wle · Mar 10, 2017

I didn't follow all the details, but does this do something close to what you want?

Python:

from itertools import product
from functools import reduce

def tree_set(tree, indices, value):
    """Set 'tree[i0][i1]...' to 'value' for i0, i1, ... in 'indices'."""
    deepest = reduce(lambda t, i: t[i], indices[:-1], tree)
    deepest[indices[-1]] = value

def build_tree(concatenator, *cuts):
    """Build tree of cuts using supplied concatenator function."""
    tree = {}

    for cuts_head in (cuts[:n] for n in range(1, len(cuts))):
        for keys in product(*cuts_head):
            tree_set(tree, keys, {})

    for keys in product(*cuts):
        item = concatenator(cut[key] for cut, key in zip(cuts, keys))
        tree_set(tree, keys, item)

    return tree# Example:

def concat_items(items):
    """Surround items in () and concatenate them separated by ' && '."""
    return ' && '.join('({})'.format(i) for i in items)

cut1 = {"p10": "p <= 10",
        "11p20": "10 < p && p <= 20",
        "21p": "p > 20"}

cut2 = {"BoolPass": "bool",
        "BoolFail": "!bool"}

cut3 = {"High": "T >= 0",
        "Low":  "T < 0"}

cut_tree = build_tree(concat_items, cut1, cut2, cut3)

print(cut_tree["p10"]["BoolPass"]["High"])

# ...this prints "(p <= 10) && (bool) && (T >= 0)".

This way there's nothing special in there being three "cuts" dictionaries in the example.

EDIT: Put the tree-building stuff in a function that accepts a concatenator function and an arbitrary number of "cuts" dictionaries.

EDIT 2: It's also possible to build up the tree recursively, as has been mentioned a couple of times, e.g. something like:

Python:

def build_helper(items, concatenator, cuts):
    if len(cuts) == 0:
        return concatenator(items)
    else:
        tree = {}

        for k, v in cuts[0].items():
            tree[k] = build_helper(items + [v], concatenator, cuts[1:])

        return tree

def build_tree(concatenator, *cuts):
    """Build tree of cuts using supplied concatenator function."""
    return build_helper([], concatenator, cuts)

In this case the two imports aren't necessary. You also need to accumulate the values you've found as you go deeper into the recursion, as @SlowThinker mentioned.

I don't know if one method is significantly better than the other.

SlowThinker · Mar 10, 2017

First try to make all levels look equal, i.e. instead of doing nothing on higher levels and accessing cut_comb[cut1][cut2][cut3] at the lowest level, make a temporary variable to hold dict1=cut_comb[cut1], another to hold dict2=dict1[cut2] etc. Same with the strings, don't do all of the concatenation at the bottom level.
Once the structure is more regular, it should be easier to transform it into a recursive function.

Python How Can I Improve Nested For Loops in Python for Combining Cut Expressions?

Hot Threads

Hackathon ideas?

Touch-typing for programmers

How to calculate Tension for a series of connected points?

Trying To Debug A Python File

Python Complaining About Python

Recent Insights

Insights Quantum Entanglement is a Kinematic Fact, not a Dynamical Effect

Insights What Exactly is Dirac’s Delta Function? - Insight

Insights Relativator (Circular Slide-Rule): Simulated with Desmos - Insight

Insights Fixing Things Which Can Go Wrong With Complex Numbers

Insights Fermat's Last Theorem

Insights Why Vector Spaces Explain The World: A Historical Perspective