How Can I Improve Nested For Loops in Python for Combining Cut Expressions?

In summary, the programmer is trying to create a dictionary of cut expressions to combine together, and also to modify existing cut expressions without having to rewrite the code.
  • #1
ChrisVer
Gold Member
3,378
464
Hello, well I am not sure how to search for this online, but I raise this question here:

Suppose that I have several bins of let's say cuts, here I list 3 but the main idea is to make their numbers tunable by the user:
Python:
cut1 = { "p10" : "p <= 10" ,
        "11p20 " : "10<p && p<=20" ,
        "21p " : "p>20" ,
        }

cut2 = { "BoolPass" : "bool" ,
        "BoolFail" : "!bool" ,
        }
cut3 = { "High" : "T>=0" ,
        "Low" :  "T<0",
         }

and I want to combine all of them together
Python:
cut_comb = dict()
for cutname1,cut1 in cut1.items():
    cut_comb[cutname1] =dict()
    for cutname2,cut2 in cut2.items():
         cut_comb[cutname1][cutname2] =dict()
         for cutname3,cut3 in cut3.items():
              cut_comb[cutname1][cutname2][cutname3] = add_cuts( [cut1,cut2,cut3] , name="{}_{}_{}".format(cutname1,cutname2,cutname3) )

Where add_cuts is just a function which loops over the cut expression and concatenates the cut expressions approriately...eg:
cut_comb["p10"]["BoolPass"]["High"] would be: "(p<=10)&&(bool)&&(T<0)"
How could I make the so ugly for-in-for-in-for loop look better [and also be able to change some cuts, eg remove the cut3, without having to rewrite the code]... I was thinking about using recursion on a list that contains [cut1,cut2,cut3], but I don't know I get stuck in the logic somewhere, in particular the increasing dict type of cut_comb...maybe I could change the whole name of the key? instead of having 3 keys for example, to have only one with concatenated the expressions, eg:
cut_comb["p10_BoolPass_High"] would be: "(p<=10)&&(bool)&&(T<0)"
But still I'm not sure.
 
Last edited:
Technology news on Phys.org
  • #2
ChrisVer said:
Hello, well I am not sure how to search for this online, but I raise this question here:

Suppose that I have several bins of let's say cuts, here I list 3 but the main idea is to make their numbers tunable by the user:
Python:
cut1 = { "p10" : "p <= 10" ,
        "11p20 " : "10<p && p<=20" ,
        "21p " : "p>20" ,
        }

cut2 = { "BoolPass" : "bool" ,
        "BoolFail" : "!bool" ,
        }
cut3 = { "High" : "T>=0" ,
        "Low" :  "T<0",
         }

and I want to combine all of them together
Python:
cut_comb = dict()
for cutname1,cut1 in cut1.items():
    cut_comb[cutname1] =dict()
    for cutname2,cut2 in cut2.items():
         cut_comb[cutname1][cutname2] =dict()
         for cutname3,cut3 in cut3.items():
              cut_comb[cutname1][cutname2][cutname3] = add_cuts( [cut1,cut2,cut3] , name="{}_{}_{}".format(cutname1,cutname2,cutname3) )

Where add_cuts is just a function which loops over the cut expression and concatenates the cut expressions approriately...eg:
cut_comb["p10"]["BoolPass"]["High"] would be: "(p<=10)&&(bool)&&(T<0)"
How could I make the so ugly for-in-for-in-for loop look better [and also be able to change some cuts, eg remove the cut3, without having to rewrite the code]... I was thinking about using recursion on a list that contains [cut1,cut2,cut3], but I don't know I get stuck in the logic somewhere, in particular the increasing dict type of cut_comb...maybe I could change the whole name of the key? instead of having 3 keys for example, to have only one with concatenated the expressions, eg:
cut_comb["p10_BoolPass_High"] would be: "(p<=10)&&(bool)&&(T<0)"
But still I'm not sure.
It's not clear to me what you're trying to do, especially the creation of a dictionary that you would have to drill into so deeply. Can you provide an example of a scenario that would require what you're trying to do?
 
  • #3
For example suppose you want to draw the efficiencies in different detector parts (eg in eta), so you'll have to have a cut that separates the detector into [itex]|\eta|< X[/itex] and [itex]|\eta| >X[/itex].
Then again you might want to look into different pt ranges, so you will have the cuts [itex]p_T < Y[/itex] , [itex]Y<p_T < Z[/itex] , [itex]p_T>Z[/itex]...
This already tells you that you will have 2*3 = 6 efficiency plots... namely the [itex]h(|\eta|<X , p_T < Y) ~,~h(|\eta|>X, p_T < Y)~,~... [/itex] which are taken (in my case) by applying the cut on my ntuples.
and in a similar manner you want to keep adding or maybe removing cuts... (that's why the number of the for loops in the piece of code I wrote should be a variable).

The histograms are obtained mainly by applying the TTree::Draw() method, where you specify which variable you want to take from the tree (eg you want to calculate the efficiency vs the phi) streamed into a pre-set histogram and the cut selection... so eg:
Python:
mytree.Draw( "jet_phi>>hpass", "selection"+"&&pass" )
mytree.Draw( "jet_phi>>htotal", "selection")
h_eff = some_method_to_get_eff_from(hpass,htotal)
the selection will of course be different everytime (specified by my comb_cut).

One maybe unclear thing is that the values of my cut_i dictionaries are not strings, but objects of some type, call it Cut (pretty similar to TCut), that's why I said that add_cuts is dealing approriately with them.
 
Last edited:
  • #4
ChrisVer said:
One maybe unclear thing is that the values of my cut_i dictionaries are not strings
They sure look like strings to me.
Python:
cut2 = { "BoolPass" : "bool" ,
        "BoolFail" : "!bool" ,
        }
The first key is "BoolPass", a string, and its value is "bool", another string. So cut2["BoolPass"] evaluates to "bool".

It will take me some time to digest the rest of your post.
 
  • #5
Mark44 said:
The first key is "BoolPass", a string, and its value is "bool", another string. So cut2["BoolPass"] evaluates to "bool".
Well, the full glory one is:
Python:
fout= ROOT.TFile("Save_Eff_Histos.root","RECREATE")

#list of cuts
cut_lowTrkMultiplicity  = Cut("low_track_multi", "jet_n_tracks<3")
cut_highTrkMultiplicity= Cut("high_track_multi", "jet_n_tracks >=3")
cut_eta_endcap      = Cut("endcap", "abs(jet_eta)>1.5")
cut_eta_barrel        = Cut("barrel", "abs(jet_eta)<1.5")
cut_low_pt              = Cut("low_pt_less100", "jet_pt<100.")
cut_med_pt            = Cut("med_pt_100_300", "jet_pt>=100. && jet_pt<300.")
cut_high_pt            = Cut("high_pt_greater300", "jet_pt>=300.")
cut_passBjet           = Cut("bjet", "jet_b_tag==1")

#group them
cuts_pt = { "pt_high" : cut_high_pt ,
        "pt_med " : cut_med_pt ,
        "pt_low" : cut_low_pt ,
        }
cuts_eta = { "Barrel" :  cut_eta_barrel,
        "Endcap" : cut_eta_endcap  ,
        }
cuts_trackMultiplicity = { "High" :  cut_highTrkMultiplicity,
        "Low" : cut_lowTrkMultiplicity  ,
        }

cuts_comb = dict()
h_pass       = dict()
h_total       = dict()
gr_eff         = dict()
for eta_name, eta in cuts_eta.items(): # Barrel and Endcap
     cuts_comb[eta_name]=dict()
     h_pass[eta_name]=dict()
     h_total[eta_name]=dict()
     gr_eff[eta_name] =dict()
     for ntracks_name, ntracks in cuts_trackMultiplicity.items(): #Low and High Ntracks
           cuts_comb[eta_name][ntracks_name]=dict()
           h_pass[eta_name][ntracks_name]=dict()
           h_total[eta_name][ntracks_name]=dict()
           gr_eff[eta_name][ntracks_name]=dict()
           for ptRange_name, ptRange in cuts_pt.items(): # Low,Med or High Pt's

                  #obtain cuts to use
                  cuts_comb[eta_name][ntracks_name][ptRange_name] = add_cuts( [eta,ntracks,ptRange] , "{}_{}_{}".format(eta_name, ntracks_name, ptRange_name) )
                  cuts_total = cuts_comb[eta_name][ntracks_name][ptRange_name]
                  cuts_pass = add_cuts( [cuts_total , cut_passBjet] , cuts_total.name + "_pass" )

                  #get histograms
                  h_pass[eta_name][ntracks_name][ptRange_name] = mysame.getHistogram(var="jet_phi", cuts=cuts_pass, name="h_"+cuts_pass.name )
                  h_total[eta_name][ntracks_name][ptRange_name] = mysame.getHistogram(var="jet_phi", cuts=cuts_total, name="h_total_"+cuts_total.name )
                  #get efficiency graph
                  gr_eff[eta_name][ntracks_name][ptRange_name] =  ROOT.TGraphAsymmErrors( h_pass[eta_name][ntracks_name][ptRange_name] , h_total[eta_name][ntracks_name][ptRange_name], "cl=0.683 b(1,1) mode" )
                  gr_eff[eta_name][ntracks_name][ptRange_name].SetTitle("h_eff_"+cuts_total.name)

                  #save them in an output file
                  fout.WriteTObject( gr_eff[eta_name][ntracks_name][ptRange_name] )

and on that suppose I want to add, or remove one of my separations (eg pt is doing nothing, so better remove it =remove a for loop and change everything in the remaining ones appropriately - or maybe the Nvertices might affect the efficiency: add bins of Nvertex=add an extra for loop )... it makes it tiring, dangerous and so inefficient.
 
Last edited:
  • #6
I didn't follow all the details, but does this do something close to what you want?
Python:
from itertools import product
from functools import reduce

def tree_set(tree, indices, value):
    """Set 'tree[i0][i1]...' to 'value' for i0, i1, ... in 'indices'."""
    deepest = reduce(lambda t, i: t[i], indices[:-1], tree)
    deepest[indices[-1]] = value

def build_tree(concatenator, *cuts):
    """Build tree of cuts using supplied concatenator function."""
    tree = {}

    for cuts_head in (cuts[:n] for n in range(1, len(cuts))):
        for keys in product(*cuts_head):
            tree_set(tree, keys, {})

    for keys in product(*cuts):
        item = concatenator(cut[key] for cut, key in zip(cuts, keys))
        tree_set(tree, keys, item)

    return tree# Example:

def concat_items(items):
    """Surround items in () and concatenate them separated by ' && '."""
    return ' && '.join('({})'.format(i) for i in items)

cut1 = {"p10": "p <= 10",
        "11p20": "10 < p && p <= 20",
        "21p": "p > 20"}

cut2 = {"BoolPass": "bool",
        "BoolFail": "!bool"}

cut3 = {"High": "T >= 0",
        "Low":  "T < 0"}

cut_tree = build_tree(concat_items, cut1, cut2, cut3)

print(cut_tree["p10"]["BoolPass"]["High"])

# ...this prints "(p <= 10) && (bool) && (T >= 0)".
This way there's nothing special in there being three "cuts" dictionaries in the example.

EDIT: Put the tree-building stuff in a function that accepts a concatenator function and an arbitrary number of "cuts" dictionaries.

EDIT 2: It's also possible to build up the tree recursively, as has been mentioned a couple of times, e.g. something like:
Python:
def build_helper(items, concatenator, cuts):
    if len(cuts) == 0:
        return concatenator(items)
    else:
        tree = {}

        for k, v in cuts[0].items():
            tree[k] = build_helper(items + [v], concatenator, cuts[1:])

        return tree

def build_tree(concatenator, *cuts):
    """Build tree of cuts using supplied concatenator function."""
    return build_helper([], concatenator, cuts)
In this case the two imports aren't necessary. You also need to accumulate the values you've found as you go deeper into the recursion, as @SlowThinker mentioned.

I don't know if one method is significantly better than the other.
 
Last edited:
  • #7
First try to make all levels look equal, i.e. instead of doing nothing on higher levels and accessing cut_comb[cut1][cut2][cut3] at the lowest level, make a temporary variable to hold dict1=cut_comb[cut1], another to hold dict2=dict1[cut2] etc. Same with the strings, don't do all of the concatenation at the bottom level.
Once the structure is more regular, it should be easier to transform it into a recursive function.
 

1. What is an abstract number of for loops?

An abstract number of for loops refers to the concept of using a variable or expression to control the number of times a for loop is executed. This allows for greater flexibility and scalability in coding, as the number of iterations can be changed without having to modify the code itself.

2. How is an abstract number of for loops different from a fixed number of for loops?

In a fixed number of for loops, the exact number of iterations is specified in the code. This means that if the number needs to be changed, the code must be modified. In contrast, an abstract number of for loops uses a variable or expression to control the number of iterations, making it easier to modify without changing the code.

3. What are some benefits of using an abstract number of for loops?

Using an abstract number of for loops can make code more efficient and scalable. It allows for easier modification of the number of iterations without changing the code, which can be particularly useful in situations where the number of iterations may vary. It also allows for more complex and dynamic operations to be performed within the loop.

4. How do you implement an abstract number of for loops in code?

To implement an abstract number of for loops, a variable or expression is used as the terminating condition for the loop. This variable or expression should be updated within the loop to ensure the correct number of iterations. The loop will continue to execute until the condition is met, and the number of iterations can be changed by modifying the value of the variable or expression.

5. Are there any limitations to using an abstract number of for loops?

One limitation of using an abstract number of for loops is that it may make the code more complex and difficult to understand for others. It also requires careful management of the terminating condition variable or expression to ensure the correct number of iterations. Additionally, in some cases, using a fixed number of for loops may be more efficient than using an abstract number of for loops.

Back
Top