How to compute the percentage of values based on multiple columns

Click For Summary
SUMMARY

This discussion focuses on calculating the percentage of occurrences based on values in two columns of a Pandas DataFrame. The initial code provided by the user incorrectly utilized the bitwise AND operator (&), leading to unexpected zero values. The solution involves using the correct logical operations and ensuring that the counts are converted to floats before performing calculations. The final code correctly computes the expected output of 0.5 by adjusting the data type and logic used in the calculations.

PREREQUISITES
  • Understanding of Pandas DataFrame operations
  • Familiarity with logical operators in Python
  • Knowledge of string methods in Pandas
  • Basic knowledge of percentage calculations
NEXT STEPS
  • Explore Pandas DataFrame filtering techniques
  • Learn about logical operations in Python, specifically using 'and' vs '&'
  • Investigate the use of value_counts() in Pandas for counting occurrences
  • Study data type conversions in Python to avoid ambiguity in calculations
USEFUL FOR

Data analysts, data scientists, and Python developers who are working with Pandas for data manipulation and analysis.

msn009
Messages
53
Reaction score
6
I have a dataframe as shown in the picture and what I am trying to do is to calculate the number of occurrences based on the values in 2 columns and then calculate the percentage of the occurrences. I have tried the following code but it gives me a zero value in the end and i don't know why.

Code:
count_a2_x = (df['a1'].str.contains('b') & df['a2'].str.contains('x')).value_counts()[True]
count_a2_y = (df['a1'].str.contains('b') & df['a2'].str.contains('y')).value_counts()[True]
acc  = float(count_a2_x/ (count_a2_x + count_a2_y))

the expected output should be 3/6 = 0.5
 

Attachments

  • p2.png
    p2.png
    726 bytes · Views: 457
Technology news on Phys.org
msn009 said:
I have a dataframe as shown in the picture and what I am trying to do is to calculate the number of occurrences based on the values in 2 columns and then calculate the percentage of the occurrences. I have tried the following code but it gives me a zero value in the end and i don't know why.

Code:
count_a2_x = (df['a1'].str.contains('b') & df['a2'].str.contains('x')).value_counts()[True]
count_a2_y = (df['a1'].str.contains('b') & df['a2'].str.contains('y')).value_counts()[True]
acc  = float(count_a2_x/ (count_a2_x + count_a2_y))

the expected output should be 3/6 = 0.5
Use and instead of &. The & operator is the bitwise and operator.
 
when i changed it to and it gave me this error:
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all()

and I tried now to add float and it seems to work when float is assigned in the beginning.

Code:
count_a2_x = float((df['a1'].str.contains('b') & df['a2'].str.contains('x')).value_counts()[True])
count_a2_y = float((df['a1'].str.contains('b') & df['a2'].str.contains('y')).value_counts()[True])
acc  = count_a2_x/ (count_a2_x + count_a2_y)
 

Similar threads

  • · Replies 8 ·
Replies
8
Views
2K
  • · Replies 5 ·
Replies
5
Views
2K
  • · Replies 5 ·
Replies
5
Views
5K
  • · Replies 7 ·
Replies
7
Views
2K
  • · Replies 2 ·
Replies
2
Views
2K
  • · Replies 17 ·
Replies
17
Views
2K
Replies
7
Views
3K
Replies
11
Views
4K
  • · Replies 0 ·
Replies
0
Views
2K
  • · Replies 8 ·
Replies
8
Views
2K