Python How to calculate rows where it has values in at least 3 columns

  • Thread starter Thread starter msn009
  • Start date Start date
  • Tags Tags
    Columns Python
AI Thread Summary
To count the number of rows with values in at least three specified columns (a, b, c, d) while ignoring others, a user attempted to use a DataFrame filtering method but encountered issues with counting and indexing. The initial code produced incorrect results due to not properly filtering the columns of interest. Suggestions included explicitly defining the columns and using a loop to count occurrences of "no_label" while ensuring the correct indexing of DataFrame rows. The discussion highlighted the importance of clearly defining the task and using appropriate methods to access DataFrame elements without errors. Ultimately, the user sought clarification on how to effectively implement the column selection while avoiding confusion with additional columns.
msn009
Messages
53
Reaction score
6
I am trying to count number of rows that has values in at least 3 columns so the output based on the image shared should be 4. I tried using the code below but it is resulting in the same shape as the whole table which is 7.

Python:
counts = df[(df[['a', 'b', 'c', 'd]] != 'no_label').count(axis=1) >= 3]
 

Attachments

  • p3.png
    p3.png
    1.5 KB · Views: 451
Technology news on Phys.org
sum(row.count("no_label")<2 for row in df)
If your table includes the header but you don't want to count that: subtract 1.
 
thank you for this suggestion. but what if i only want to check from these columns a,b,c, and d? as I have other columns e,f,g that also has the 'no_label' value but I don't want to consider them,
 
sum(row[0:4].count("no_label")<2 for row in df)
 
i wanted to do it this way so that i can select the columns that could be in other position in the dataframe:

Python:
cols = ['a', 'b', 'c', 'd'']

sum(row[cols].count("no_label") < 2 for row in df)

but it gives me this error :

TypeError: string indices must be integers, not list

what can i do to use the column names explicitly instead. thanks
 
Where do the column labels come from? Are they the first row of you array? I wouldn't do that, it is bad style. Select them based on integers.
If you absolutely need strings convert them to integers for the sum.
 
yes they are but in this case there are more than 10 columns which makes it difficult to determine the location of the column using integers.
 
You keep changing the task. Why don't you show what exactly you want to do?

cols={'a':0,'b':1,'c':2,'d':3}
sum(row[cols['a']:cols['c']].count("no")>0 for row in df)

Or, if the first row of df has the column names:
Code:
cols2={}
for x,y in enumerate(df[0]):
  cols2[y]=x
 
  • Like
Likes msn009 and Tom.G
thanks. sorry for not explaining the scenario in detail.
 

Similar threads

Back
Top