Python How Can I Improve Nested For Loops in Python for Combining Cut Expressions?

AI Thread Summary
The discussion revolves around improving the efficiency and readability of nested for loops in Python used for combining cut expressions. The user seeks a way to dynamically manage the number of cuts without rewriting code, suggesting recursion as a potential solution. A proposed method involves using a tree structure to organize the cuts, allowing for easier modification and concatenation of expressions. The conversation also touches on the implementation of a function that can build this tree structure and concatenate cut expressions effectively. Ultimately, the goal is to simplify the code while maintaining flexibility in managing cut combinations.
ChrisVer
Science Advisor
Messages
3,372
Reaction score
465
Hello, well I am not sure how to search for this online, but I raise this question here:

Suppose that I have several bins of let's say cuts, here I list 3 but the main idea is to make their numbers tunable by the user:
Python:
cut1 = { "p10" : "p <= 10" ,
        "11p20 " : "10<p && p<=20" ,
        "21p " : "p>20" ,
        }

cut2 = { "BoolPass" : "bool" ,
        "BoolFail" : "!bool" ,
        }
cut3 = { "High" : "T>=0" ,
        "Low" :  "T<0",
         }

and I want to combine all of them together
Python:
cut_comb = dict()
for cutname1,cut1 in cut1.items():
    cut_comb[cutname1] =dict()
    for cutname2,cut2 in cut2.items():
         cut_comb[cutname1][cutname2] =dict()
         for cutname3,cut3 in cut3.items():
              cut_comb[cutname1][cutname2][cutname3] = add_cuts( [cut1,cut2,cut3] , name="{}_{}_{}".format(cutname1,cutname2,cutname3) )

Where add_cuts is just a function which loops over the cut expression and concatenates the cut expressions approriately...eg:
cut_comb["p10"]["BoolPass"]["High"] would be: "(p<=10)&&(bool)&&(T<0)"
How could I make the so ugly for-in-for-in-for loop look better [and also be able to change some cuts, eg remove the cut3, without having to rewrite the code]... I was thinking about using recursion on a list that contains [cut1,cut2,cut3], but I don't know I get stuck in the logic somewhere, in particular the increasing dict type of cut_comb...maybe I could change the whole name of the key? instead of having 3 keys for example, to have only one with concatenated the expressions, eg:
cut_comb["p10_BoolPass_High"] would be: "(p<=10)&&(bool)&&(T<0)"
But still I'm not sure.
 
Last edited:
Technology news on Phys.org
ChrisVer said:
Hello, well I am not sure how to search for this online, but I raise this question here:

Suppose that I have several bins of let's say cuts, here I list 3 but the main idea is to make their numbers tunable by the user:
Python:
cut1 = { "p10" : "p <= 10" ,
        "11p20 " : "10<p && p<=20" ,
        "21p " : "p>20" ,
        }

cut2 = { "BoolPass" : "bool" ,
        "BoolFail" : "!bool" ,
        }
cut3 = { "High" : "T>=0" ,
        "Low" :  "T<0",
         }

and I want to combine all of them together
Python:
cut_comb = dict()
for cutname1,cut1 in cut1.items():
    cut_comb[cutname1] =dict()
    for cutname2,cut2 in cut2.items():
         cut_comb[cutname1][cutname2] =dict()
         for cutname3,cut3 in cut3.items():
              cut_comb[cutname1][cutname2][cutname3] = add_cuts( [cut1,cut2,cut3] , name="{}_{}_{}".format(cutname1,cutname2,cutname3) )

Where add_cuts is just a function which loops over the cut expression and concatenates the cut expressions approriately...eg:
cut_comb["p10"]["BoolPass"]["High"] would be: "(p<=10)&&(bool)&&(T<0)"
How could I make the so ugly for-in-for-in-for loop look better [and also be able to change some cuts, eg remove the cut3, without having to rewrite the code]... I was thinking about using recursion on a list that contains [cut1,cut2,cut3], but I don't know I get stuck in the logic somewhere, in particular the increasing dict type of cut_comb...maybe I could change the whole name of the key? instead of having 3 keys for example, to have only one with concatenated the expressions, eg:
cut_comb["p10_BoolPass_High"] would be: "(p<=10)&&(bool)&&(T<0)"
But still I'm not sure.
It's not clear to me what you're trying to do, especially the creation of a dictionary that you would have to drill into so deeply. Can you provide an example of a scenario that would require what you're trying to do?
 
For example suppose you want to draw the efficiencies in different detector parts (eg in eta), so you'll have to have a cut that separates the detector into |\eta|&lt; X and |\eta| &gt;X.
Then again you might want to look into different pt ranges, so you will have the cuts p_T &lt; Y , Y&lt;p_T &lt; Z , p_T&gt;Z...
This already tells you that you will have 2*3 = 6 efficiency plots... namely the h(|\eta|&lt;X , p_T &lt; Y) ~,~h(|\eta|&gt;X, p_T &lt; Y)~,~... which are taken (in my case) by applying the cut on my ntuples.
and in a similar manner you want to keep adding or maybe removing cuts... (that's why the number of the for loops in the piece of code I wrote should be a variable).

The histograms are obtained mainly by applying the TTree::Draw() method, where you specify which variable you want to take from the tree (eg you want to calculate the efficiency vs the phi) streamed into a pre-set histogram and the cut selection... so eg:
Python:
mytree.Draw( "jet_phi>>hpass", "selection"+"&&pass" )
mytree.Draw( "jet_phi>>htotal", "selection")
h_eff = some_method_to_get_eff_from(hpass,htotal)
the selection will of course be different everytime (specified by my comb_cut).

One maybe unclear thing is that the values of my cut_i dictionaries are not strings, but objects of some type, call it Cut (pretty similar to TCut), that's why I said that add_cuts is dealing approriately with them.
 
Last edited:
ChrisVer said:
One maybe unclear thing is that the values of my cut_i dictionaries are not strings
They sure look like strings to me.
Python:
cut2 = { "BoolPass" : "bool" ,
        "BoolFail" : "!bool" ,
        }
The first key is "BoolPass", a string, and its value is "bool", another string. So cut2["BoolPass"] evaluates to "bool".

It will take me some time to digest the rest of your post.
 
Mark44 said:
The first key is "BoolPass", a string, and its value is "bool", another string. So cut2["BoolPass"] evaluates to "bool".
Well, the full glory one is:
Python:
fout= ROOT.TFile("Save_Eff_Histos.root","RECREATE")

#list of cuts
cut_lowTrkMultiplicity  = Cut("low_track_multi", "jet_n_tracks<3")
cut_highTrkMultiplicity= Cut("high_track_multi", "jet_n_tracks >=3")
cut_eta_endcap      = Cut("endcap", "abs(jet_eta)>1.5")
cut_eta_barrel        = Cut("barrel", "abs(jet_eta)<1.5")
cut_low_pt              = Cut("low_pt_less100", "jet_pt<100.")
cut_med_pt            = Cut("med_pt_100_300", "jet_pt>=100. && jet_pt<300.")
cut_high_pt            = Cut("high_pt_greater300", "jet_pt>=300.")
cut_passBjet           = Cut("bjet", "jet_b_tag==1")

#group them
cuts_pt = { "pt_high" : cut_high_pt ,
        "pt_med " : cut_med_pt ,
        "pt_low" : cut_low_pt ,
        }
cuts_eta = { "Barrel" :  cut_eta_barrel,
        "Endcap" : cut_eta_endcap  ,
        }
cuts_trackMultiplicity = { "High" :  cut_highTrkMultiplicity,
        "Low" : cut_lowTrkMultiplicity  ,
        }

cuts_comb = dict()
h_pass       = dict()
h_total       = dict()
gr_eff         = dict()
for eta_name, eta in cuts_eta.items(): # Barrel and Endcap
     cuts_comb[eta_name]=dict()
     h_pass[eta_name]=dict()
     h_total[eta_name]=dict()
     gr_eff[eta_name] =dict()
     for ntracks_name, ntracks in cuts_trackMultiplicity.items(): #Low and High Ntracks
           cuts_comb[eta_name][ntracks_name]=dict()
           h_pass[eta_name][ntracks_name]=dict()
           h_total[eta_name][ntracks_name]=dict()
           gr_eff[eta_name][ntracks_name]=dict()
           for ptRange_name, ptRange in cuts_pt.items(): # Low,Med or High Pt's

                  #obtain cuts to use
                  cuts_comb[eta_name][ntracks_name][ptRange_name] = add_cuts( [eta,ntracks,ptRange] , "{}_{}_{}".format(eta_name, ntracks_name, ptRange_name) )
                  cuts_total = cuts_comb[eta_name][ntracks_name][ptRange_name]
                  cuts_pass = add_cuts( [cuts_total , cut_passBjet] , cuts_total.name + "_pass" )

                  #get histograms
                  h_pass[eta_name][ntracks_name][ptRange_name] = mysame.getHistogram(var="jet_phi", cuts=cuts_pass, name="h_"+cuts_pass.name )
                  h_total[eta_name][ntracks_name][ptRange_name] = mysame.getHistogram(var="jet_phi", cuts=cuts_total, name="h_total_"+cuts_total.name )
                  #get efficiency graph
                  gr_eff[eta_name][ntracks_name][ptRange_name] =  ROOT.TGraphAsymmErrors( h_pass[eta_name][ntracks_name][ptRange_name] , h_total[eta_name][ntracks_name][ptRange_name], "cl=0.683 b(1,1) mode" )
                  gr_eff[eta_name][ntracks_name][ptRange_name].SetTitle("h_eff_"+cuts_total.name)

                  #save them in an output file
                  fout.WriteTObject( gr_eff[eta_name][ntracks_name][ptRange_name] )

and on that suppose I want to add, or remove one of my separations (eg pt is doing nothing, so better remove it =remove a for loop and change everything in the remaining ones appropriately - or maybe the Nvertices might affect the efficiency: add bins of Nvertex=add an extra for loop )... it makes it tiring, dangerous and so inefficient.
 
Last edited:
I didn't follow all the details, but does this do something close to what you want?
Python:
from itertools import product
from functools import reduce

def tree_set(tree, indices, value):
    """Set 'tree[i0][i1]...' to 'value' for i0, i1, ... in 'indices'."""
    deepest = reduce(lambda t, i: t[i], indices[:-1], tree)
    deepest[indices[-1]] = value

def build_tree(concatenator, *cuts):
    """Build tree of cuts using supplied concatenator function."""
    tree = {}

    for cuts_head in (cuts[:n] for n in range(1, len(cuts))):
        for keys in product(*cuts_head):
            tree_set(tree, keys, {})

    for keys in product(*cuts):
        item = concatenator(cut[key] for cut, key in zip(cuts, keys))
        tree_set(tree, keys, item)

    return tree# Example:

def concat_items(items):
    """Surround items in () and concatenate them separated by ' && '."""
    return ' && '.join('({})'.format(i) for i in items)

cut1 = {"p10": "p <= 10",
        "11p20": "10 < p && p <= 20",
        "21p": "p > 20"}

cut2 = {"BoolPass": "bool",
        "BoolFail": "!bool"}

cut3 = {"High": "T >= 0",
        "Low":  "T < 0"}

cut_tree = build_tree(concat_items, cut1, cut2, cut3)

print(cut_tree["p10"]["BoolPass"]["High"])

# ...this prints "(p <= 10) && (bool) && (T >= 0)".
This way there's nothing special in there being three "cuts" dictionaries in the example.

EDIT: Put the tree-building stuff in a function that accepts a concatenator function and an arbitrary number of "cuts" dictionaries.

EDIT 2: It's also possible to build up the tree recursively, as has been mentioned a couple of times, e.g. something like:
Python:
def build_helper(items, concatenator, cuts):
    if len(cuts) == 0:
        return concatenator(items)
    else:
        tree = {}

        for k, v in cuts[0].items():
            tree[k] = build_helper(items + [v], concatenator, cuts[1:])

        return tree

def build_tree(concatenator, *cuts):
    """Build tree of cuts using supplied concatenator function."""
    return build_helper([], concatenator, cuts)
In this case the two imports aren't necessary. You also need to accumulate the values you've found as you go deeper into the recursion, as @SlowThinker mentioned.

I don't know if one method is significantly better than the other.
 
Last edited:
First try to make all levels look equal, i.e. instead of doing nothing on higher levels and accessing cut_comb[cut1][cut2][cut3] at the lowest level, make a temporary variable to hold dict1=cut_comb[cut1], another to hold dict2=dict1[cut2] etc. Same with the strings, don't do all of the concatenation at the bottom level.
Once the structure is more regular, it should be easier to transform it into a recursive function.
 
Back
Top