How Can I Improve Nested For Loops in Python for Combining Cut Expressions?

ChrisVer · Mar 10, 2017

Hello, well I am not sure how to search for this online, but I raise this question here:

Suppose that I have several bins of let's say cuts, here I list 3 but the main idea is to make their numbers tunable by the user:

Python:

cut1 = { "p10" : "p <= 10" ,
        "11p20 " : "10<p && p<=20" ,
        "21p " : "p>20" ,
        }

cut2 = { "BoolPass" : "bool" ,
        "BoolFail" : "!bool" ,
        }
cut3 = { "High" : "T>=0" ,
        "Low" :  "T<0",
         }

and I want to combine all of them together

Python:

cut_comb = dict()
for cutname1,cut1 in cut1.items():
    cut_comb[cutname1] =dict()
    for cutname2,cut2 in cut2.items():
         cut_comb[cutname1][cutname2] =dict()
         for cutname3,cut3 in cut3.items():
              cut_comb[cutname1][cutname2][cutname3] = add_cuts( [cut1,cut2,cut3] , name="{}_{}_{}".format(cutname1,cutname2,cutname3) )

Where add_cuts is just a function which loops over the cut expression and concatenates the cut expressions approriately...eg:
cut_comb["p10"]["BoolPass"]["High"] would be: "(p<=10)&&(bool)&&(T<0)"
How could I make the so ugly for-in-for-in-for loop look better [and also be able to change some cuts, eg remove the cut3, without having to rewrite the code]... I was thinking about using recursion on a list that contains [cut1,cut2,cut3], but I don't know I get stuck in the logic somewhere, in particular the increasing dict type of cut_comb...maybe I could change the whole name of the key? instead of having 3 keys for example, to have only one with concatenated the expressions, eg:
cut_comb["p10_BoolPass_High"] would be: "(p<=10)&&(bool)&&(T<0)"
But still I'm not sure.

Mark44 · Mar 10, 2017

ChrisVer said:
Hello, well I am not sure how to search for this online, but I raise this question here:

Suppose that I have several bins of let's say cuts, here I list 3 but the main idea is to make their numbers tunable by the user:
Python:
cut1 = { "p10" : "p <= 10" ,
        "11p20 " : "10<p && p<=20" ,
        "21p " : "p>20" ,
        }

cut2 = { "BoolPass" : "bool" ,
        "BoolFail" : "!bool" ,
        }
cut3 = { "High" : "T>=0" ,
        "Low" :  "T<0",
         }
and I want to combine all of them together
Python:
cut_comb = dict()
for cutname1,cut1 in cut1.items():
    cut_comb[cutname1] =dict()
    for cutname2,cut2 in cut2.items():
         cut_comb[cutname1][cutname2] =dict()
         for cutname3,cut3 in cut3.items():
              cut_comb[cutname1][cutname2][cutname3] = add_cuts( [cut1,cut2,cut3] , name="{}_{}_{}".format(cutname1,cutname2,cutname3) )
Where add_cuts is just a function which loops over the cut expression and concatenates the cut expressions approriately...eg:
cut_comb["p10"]["BoolPass"]["High"] would be: "(p<=10)&&(bool)&&(T<0)"
How could I make the so ugly for-in-for-in-for loop look better [and also be able to change some cuts, eg remove the cut3, without having to rewrite the code]... I was thinking about using recursion on a list that contains [cut1,cut2,cut3], but I don't know I get stuck in the logic somewhere, in particular the increasing dict type of cut_comb...maybe I could change the whole name of the key? instead of having 3 keys for example, to have only one with concatenated the expressions, eg:
cut_comb["p10_BoolPass_High"] would be: "(p<=10)&&(bool)&&(T<0)"
But still I'm not sure.

It's not clear to me what you're trying to do, especially the creation of a dictionary that you would have to drill into so deeply. Can you provide an example of a scenario that would require what you're trying to do?

ChrisVer · Mar 10, 2017

For example suppose you want to draw the efficiencies in different detector parts (eg in eta), so you'll have to have a cut that separates the detector into [itex]|\eta|< X[/itex] and [itex]|\eta| >X[/itex].
Then again you might want to look into different pt ranges, so you will have the cuts [itex]p_T < Y[/itex] , [itex]Y<p_T < Z[/itex] , [itex]p_T>Z[/itex]...
This already tells you that you will have 2*3 = 6 efficiency plots... namely the [itex]h(|\eta|<X , p_T < Y) ~,~h(|\eta|>X, p_T < Y)~,~... [/itex] which are taken (in my case) by applying the cut on my ntuples.
and in a similar manner you want to keep adding or maybe removing cuts... (that's why the number of the for loops in the piece of code I wrote should be a variable).

The histograms are obtained mainly by applying the TTree::Draw() method, where you specify which variable you want to take from the tree (eg you want to calculate the efficiency vs the phi) streamed into a pre-set histogram and the cut selection... so eg:

Python:

mytree.Draw( "jet_phi>>hpass", "selection"+"&&pass" )
mytree.Draw( "jet_phi>>htotal", "selection")
h_eff = some_method_to_get_eff_from(hpass,htotal)

the selection will of course be different everytime (specified by my comb_cut).

One maybe unclear thing is that the values of my cut_i dictionaries are not strings, but objects of some type, call it Cut (pretty similar to TCut), that's why I said that add_cuts is dealing approriately with them.

Mark44 · Mar 10, 2017

ChrisVer said:

One maybe unclear thing is that the values of my cut_i dictionaries are not strings

They sure look like strings to me.

Python:

cut2 = { "BoolPass" : "bool" ,
        "BoolFail" : "!bool" ,
        }

The first key is "BoolPass", a string, and its value is "bool", another string. So cut2["BoolPass"] evaluates to "bool".

It will take me some time to digest the rest of your post.

ChrisVer · Mar 10, 2017

Mark44 said:

The first key is "BoolPass", a string, and its value is "bool", another string. So cut2["BoolPass"] evaluates to "bool".

Well, the full glory one is:

Python:

fout= ROOT.TFile("Save_Eff_Histos.root","RECREATE")

#list of cuts
cut_lowTrkMultiplicity  = Cut("low_track_multi", "jet_n_tracks<3")
cut_highTrkMultiplicity= Cut("high_track_multi", "jet_n_tracks >=3")
cut_eta_endcap      = Cut("endcap", "abs(jet_eta)>1.5")
cut_eta_barrel        = Cut("barrel", "abs(jet_eta)<1.5")
cut_low_pt              = Cut("low_pt_less100", "jet_pt<100.")
cut_med_pt            = Cut("med_pt_100_300", "jet_pt>=100. && jet_pt<300.")
cut_high_pt            = Cut("high_pt_greater300", "jet_pt>=300.")
cut_passBjet           = Cut("bjet", "jet_b_tag==1")

#group them
cuts_pt = { "pt_high" : cut_high_pt ,
        "pt_med " : cut_med_pt ,
        "pt_low" : cut_low_pt ,
        }
cuts_eta = { "Barrel" :  cut_eta_barrel,
        "Endcap" : cut_eta_endcap  ,
        }
cuts_trackMultiplicity = { "High" :  cut_highTrkMultiplicity,
        "Low" : cut_lowTrkMultiplicity  ,
        }

cuts_comb = dict()
h_pass       = dict()
h_total       = dict()
gr_eff         = dict()
for eta_name, eta in cuts_eta.items(): # Barrel and Endcap
     cuts_comb[eta_name]=dict()
     h_pass[eta_name]=dict()
     h_total[eta_name]=dict()
     gr_eff[eta_name] =dict()
     for ntracks_name, ntracks in cuts_trackMultiplicity.items(): #Low and High Ntracks
           cuts_comb[eta_name][ntracks_name]=dict()
           h_pass[eta_name][ntracks_name]=dict()
           h_total[eta_name][ntracks_name]=dict()
           gr_eff[eta_name][ntracks_name]=dict()
           for ptRange_name, ptRange in cuts_pt.items(): # Low,Med or High Pt's

                  #obtain cuts to use
                  cuts_comb[eta_name][ntracks_name][ptRange_name] = add_cuts( [eta,ntracks,ptRange] , "{}_{}_{}".format(eta_name, ntracks_name, ptRange_name) )
                  cuts_total = cuts_comb[eta_name][ntracks_name][ptRange_name]
                  cuts_pass = add_cuts( [cuts_total , cut_passBjet] , cuts_total.name + "_pass" )

                  #get histograms
                  h_pass[eta_name][ntracks_name][ptRange_name] = mysame.getHistogram(var="jet_phi", cuts=cuts_pass, name="h_"+cuts_pass.name )
                  h_total[eta_name][ntracks_name][ptRange_name] = mysame.getHistogram(var="jet_phi", cuts=cuts_total, name="h_total_"+cuts_total.name )
                  #get efficiency graph
                  gr_eff[eta_name][ntracks_name][ptRange_name] =  ROOT.TGraphAsymmErrors( h_pass[eta_name][ntracks_name][ptRange_name] , h_total[eta_name][ntracks_name][ptRange_name], "cl=0.683 b(1,1) mode" )
                  gr_eff[eta_name][ntracks_name][ptRange_name].SetTitle("h_eff_"+cuts_total.name)

                  #save them in an output file
                  fout.WriteTObject( gr_eff[eta_name][ntracks_name][ptRange_name] )

and on that suppose I want to add, or remove one of my separations (eg pt is doing nothing, so better remove it =remove a for loop and change everything in the remaining ones appropriately - or maybe the Nvertices might affect the efficiency: add bins of Nvertex=add an extra for loop )... it makes it tiring, dangerous and so inefficient.

wle · Mar 10, 2017

I didn't follow all the details, but does this do something close to what you want?

Python:

from itertools import product
from functools import reduce

def tree_set(tree, indices, value):
    """Set 'tree[i0][i1]...' to 'value' for i0, i1, ... in 'indices'."""
    deepest = reduce(lambda t, i: t[i], indices[:-1], tree)
    deepest[indices[-1]] = value

def build_tree(concatenator, *cuts):
    """Build tree of cuts using supplied concatenator function."""
    tree = {}

    for cuts_head in (cuts[:n] for n in range(1, len(cuts))):
        for keys in product(*cuts_head):
            tree_set(tree, keys, {})

    for keys in product(*cuts):
        item = concatenator(cut[key] for cut, key in zip(cuts, keys))
        tree_set(tree, keys, item)

    return tree# Example:

def concat_items(items):
    """Surround items in () and concatenate them separated by ' && '."""
    return ' && '.join('({})'.format(i) for i in items)

cut1 = {"p10": "p <= 10",
        "11p20": "10 < p && p <= 20",
        "21p": "p > 20"}

cut2 = {"BoolPass": "bool",
        "BoolFail": "!bool"}

cut3 = {"High": "T >= 0",
        "Low":  "T < 0"}

cut_tree = build_tree(concat_items, cut1, cut2, cut3)

print(cut_tree["p10"]["BoolPass"]["High"])

# ...this prints "(p <= 10) && (bool) && (T >= 0)".

This way there's nothing special in there being three "cuts" dictionaries in the example.

EDIT: Put the tree-building stuff in a function that accepts a concatenator function and an arbitrary number of "cuts" dictionaries.

EDIT 2: It's also possible to build up the tree recursively, as has been mentioned a couple of times, e.g. something like:

Python:

def build_helper(items, concatenator, cuts):
    if len(cuts) == 0:
        return concatenator(items)
    else:
        tree = {}

        for k, v in cuts[0].items():
            tree[k] = build_helper(items + [v], concatenator, cuts[1:])

        return tree

def build_tree(concatenator, *cuts):
    """Build tree of cuts using supplied concatenator function."""
    return build_helper([], concatenator, cuts)

In this case the two imports aren't necessary. You also need to accumulate the values you've found as you go deeper into the recursion, as @SlowThinker mentioned.

I don't know if one method is significantly better than the other.

SlowThinker · Mar 10, 2017

First try to make all levels look equal, i.e. instead of doing nothing on higher levels and accessing cut_comb[cut1][cut2][cut3] at the lowest level, make a temporary variable to hold dict1=cut_comb[cut1], another to hold dict2=dict1[cut2] etc. Same with the strings, don't do all of the concatenation at the bottom level.
Once the structure is more regular, it should be easier to transform it into a recursive function.

How Can I Improve Nested For Loops in Python for Combining Cut Expressions?

1. What is an abstract number of for loops?

2. How is an abstract number of for loops different from a fixed number of for loops?

3. What are some benefits of using an abstract number of for loops?

4. How do you implement an abstract number of for loops in code?

5. Are there any limitations to using an abstract number of for loops?

Hot Threads

Recent Insights