How to calculate rows where it has values in at least 3 columns

  • Context: Python 
  • Thread starter Thread starter msn009
  • Start date Start date
  • Tags Tags
    Columns Python
Click For Summary

Discussion Overview

The discussion revolves around how to count the number of rows in a DataFrame that contain values in at least three specified columns ('a', 'b', 'c', and 'd'), while excluding other columns that may also contain the value 'no_label'. Participants are sharing code snippets and seeking clarification on how to implement this in a programming context.

Discussion Character

  • Technical explanation
  • Mathematical reasoning
  • Homework-related

Main Points Raised

  • One participant attempts to count rows with values in at least three columns using a specific code snippet but encounters issues with the output shape.
  • Another participant suggests a different approach using a sum function to count occurrences of 'no_label', but notes the need to adjust for headers.
  • A participant seeks to limit the count to specific columns ('a', 'b', 'c', and 'd') and expresses concern about other columns affecting the count.
  • Another code suggestion is made to count 'no_label' occurrences in a specified range of columns, but it does not address the need for explicit column names.
  • A participant proposes a method to define column positions using a dictionary but encounters a TypeError when trying to use a list of column names.
  • There is a suggestion to convert column names to integers for indexing, but this is met with resistance due to the complexity of the DataFrame structure.
  • One participant expresses frustration over changing requirements and requests clarification on the desired outcome.
  • Another participant apologizes for not providing enough detail about the scenario initially.

Areas of Agreement / Disagreement

Participants are exploring various methods to achieve the desired row count, but there is no consensus on the best approach. Different strategies are proposed, and some participants express confusion or frustration over the task requirements.

Contextual Notes

Some participants mention the presence of additional columns that may interfere with the counting process, and there are unresolved issues regarding the handling of column indices and the structure of the DataFrame.

msn009
Messages
53
Reaction score
6
I am trying to count number of rows that has values in at least 3 columns so the output based on the image shared should be 4. I tried using the code below but it is resulting in the same shape as the whole table which is 7.

Python:
counts = df[(df[['a', 'b', 'c', 'd]] != 'no_label').count(axis=1) >= 3]
 

Attachments

  • p3.png
    p3.png
    1.5 KB · Views: 474
Technology news on Phys.org
sum(row.count("no_label")<2 for row in df)
If your table includes the header but you don't want to count that: subtract 1.
 
thank you for this suggestion. but what if i only want to check from these columns a,b,c, and d? as I have other columns e,f,g that also has the 'no_label' value but I don't want to consider them,
 
sum(row[0:4].count("no_label")<2 for row in df)
 
i wanted to do it this way so that i can select the columns that could be in other position in the dataframe:

Python:
cols = ['a', 'b', 'c', 'd'']

sum(row[cols].count("no_label") < 2 for row in df)

but it gives me this error :

TypeError: string indices must be integers, not list

what can i do to use the column names explicitly instead. thanks
 
Where do the column labels come from? Are they the first row of you array? I wouldn't do that, it is bad style. Select them based on integers.
If you absolutely need strings convert them to integers for the sum.
 
yes they are but in this case there are more than 10 columns which makes it difficult to determine the location of the column using integers.
 
You keep changing the task. Why don't you show what exactly you want to do?

cols={'a':0,'b':1,'c':2,'d':3}
sum(row[cols['a']:cols['c']].count("no")>0 for row in df)

Or, if the first row of df has the column names:
Code:
cols2={}
for x,y in enumerate(df[0]):
  cols2[y]=x
 
  • Like
Likes   Reactions: msn009 and Tom.G
thanks. sorry for not explaining the scenario in detail.
 

Similar threads

  • · Replies 2 ·
Replies
2
Views
2K
  • · Replies 17 ·
Replies
17
Views
2K
  • · Replies 3 ·
Replies
3
Views
2K
  • · Replies 2 ·
Replies
2
Views
1K
  • · Replies 7 ·
Replies
7
Views
2K
  • · Replies 7 ·
Replies
7
Views
5K
  • · Replies 5 ·
Replies
5
Views
2K
  • · Replies 10 ·
Replies
10
Views
2K
  • · Replies 5 ·
Replies
5
Views
5K
  • · Replies 4 ·
Replies
4
Views
2K