Testing whether one mean is higher than another

  • Context: Undergrad 
  • Thread starter Thread starter Tom McCurdy
  • Start date Start date
  • Tags Tags
    Mean Testing
Click For Summary
SUMMARY

The discussion centers on testing the hypothesis that the average quantity sold with a promotion is significantly higher than without one, using regression analysis. The professor suggested a linear regression approach with promotion as the independent variable and quantity sold as the dependent variable, but the participant identified flaws in this method, particularly the need to categorize promotion levels into discrete groups. The sample sizes were 201 for the promotion group and 17 for the non-promotion group, raising concerns about population variance equality and the validity of the regression results. The participant questioned whether the professor's graphical approach using Excel could adequately represent the data.

PREREQUISITES
  • Understanding of hypothesis testing, specifically for two sample means
  • Familiarity with linear regression analysis and its assumptions
  • Knowledge of Excel for statistical calculations and graphing
  • Basic concepts of population variance and sample size implications
NEXT STEPS
  • Learn about hypothesis testing for two sample means using R or Python
  • Study the assumptions of linear regression and how to check them
  • Explore methods for categorizing continuous variables for regression analysis
  • Investigate the use of ANOVA for comparing means across different groups
USEFUL FOR

Statisticians, data analysts, and anyone involved in sales performance analysis who needs to understand the implications of regression analysis and hypothesis testing in promotional contexts.

Tom McCurdy
Messages
1,021
Reaction score
1
There was a problem that was talked about in class where we had the amount of quantity sold in one column and the promotion level in another column. The promotion took values between 0 and 0.88 with a number of values being zero.

The problem discussed was to test the following hypothesis
The average quantity sold by Bob when there is a promotion (p>0) is significantly higher than when there is no promotion.

My professor claimed the problem should be solved using regression. He had the independent column as the promotion value and the dependent column as the quantity sold. Then his plan was to use the regression results in excel to test whether or not the slope was equal to zero.

Now the biggest mistake I see right away is that he would need to categorize it into a discrete system where you have group 1) promotion, and group 2) no promotion. The second problem I see is then what arbitrary value do you give the group 1) of promotion


Now it's been awhile since I have taken a statistics class but from what I remember when you are doing hypothesis testing for two sample means and you have
[tex]H_0 : \mu_{promo} = \mu_{no-promo}[/tex]
[tex]H_1 : \mu_{promo} > \mu_{no-promo}[/tex]

Now the sample sizes are different
the promo category had 201 samples
the non promo category had 17 samples

Now you would need to decide if you could consider the population variances to be equal.
If they were equal you would test the means in one fashion, and if they weren't equal you had to test the means in another fashion... which was a substantial amount of work.

Is my professor right... can you just bypass this all by simply putting the numbers into two categories and doing a linear regression and checking the p-value for their slope?
 
Physics news on Phys.org
For the professor's approach, which is essentially a graphics approach, but with Excel doing the calculations, it's not clear to me how you would plot the points representing the different samples. For the 17 samples where there was no promotion, you would have points scattered along the vertical axis.

If by "promotion" you mean something like "10% price drop" and "20% price drop," the graph would have many points scattered along vertical lines at the locations on the horizontal axis for the various price drops. I suppose you could find the line of best fit (i.e., use a regression line), but you'd want to take the calculated slope of the line with a grain of salt if the correlation coefficient R was large.
 

Similar threads

Replies
20
Views
3K
  • · Replies 7 ·
Replies
7
Views
3K
  • · Replies 9 ·
Replies
9
Views
2K
  • · Replies 5 ·
Replies
5
Views
4K
  • · Replies 1 ·
Replies
1
Views
2K
  • · Replies 6 ·
Replies
6
Views
2K
  • · Replies 2 ·
Replies
2
Views
12K
  • · Replies 1 ·
Replies
1
Views
3K
  • · Replies 3 ·
Replies
3
Views
3K
  • · Replies 2 ·
Replies
2
Views
2K