Solving Discrepency Events - What Statistical Measure to Use?

  • Context: Undergrad 
  • Thread starter Thread starter BiologyGirl
  • Start date Start date
  • Tags Tags
    Events
Click For Summary

Discussion Overview

The discussion revolves around selecting an appropriate statistical measure to quantify the discrepancy between ideal and actual production percentages in a factory setting. Participants explore various metrics and their implications, focusing on the context of statistical analysis rather than specific applications or conclusions.

Discussion Character

  • Exploratory
  • Technical explanation
  • Debate/contested
  • Mathematical reasoning

Main Points Raised

  • One participant suggests using the average discrepancy by taking absolute values of differences, but questions if this approach is too simplistic.
  • Another participant proposes the sum of squared differences (SSE) as an alternative measure, noting that it accounts for the fact that percentages must sum to 100% and that only three values are needed.
  • A participant asks about the advantages of using the sum of squared differences over the sum of absolute differences.
  • Responses highlight that the sum of squared differences is mathematically easier to analyze, weights larger discrepancies more heavily, and is more standard in statistical practice.
  • Conversely, the sum of absolute differences is noted for its simplicity in calculation and equal weighting of all values.
  • One participant visualizes the production data in a 3-dimensional space, relating the production errors to geometric metrics, comparing SSE to the Euclidean metric and the sum of absolute distances to the Manhattan metric.
  • A later reply suggests a method to normalize the SSE to fit a scale where perfect production equals 1 and poor production equals 0.

Areas of Agreement / Disagreement

Participants express differing opinions on the best statistical measure to use, with some favoring the sum of squared differences and others supporting the sum of absolute differences. The discussion remains unresolved regarding which metric is definitively superior.

Contextual Notes

Participants acknowledge the need to consider the constraint that production percentages must sum to 100%, which influences the choice of statistical measures. There are also discussions about the implications of weighting discrepancies differently.

BiologyGirl
Messages
3
Reaction score
0
I am having problems deciding on which statistical measure to use. Although this problem is of the simplest type, none of my books seem to address exactly what I need.

Let me describe a typical example:

Suppose that I have a factory that produces four types of products, say boats, cars, planes, and trains. (Big factory, I know). For ideal production, I want 20% of the factory's output to be boats, 25% to be cars, 40% to be planes, and 15% to be trains.

Suppose the factory instead produces 15% boats, 20% cars, 40% planes, and 25% trains. How would I express, using one statistical measure, how far off the factory is producing from the ideal?

I want the answer to be a percentage so that 100% is perfect alignment with the targeted goals, 0% would be no alignment (the factory produces spaceships instead).

My first inclination was to use simply find the average discrepency, that is, take the absolute values of each difference and average them. If needed, I could weight the result to produce a result between 0 and 100%, but something tells me that my plan is too unsophisticated. Is there a form of the linear regression that I could use on data that is not described by a function but represented in terms of finite data? What about weighting the standard deviation?

As you can tell, I am not a statistician (all of my experience is using statistics on functional data), but if I was just told the name of the statistical measure to use I could figure out the rest on my own.

Thanks in advance.
 
Physics news on Phys.org
You are looking for a metric (distance function) so the sum of absolute differences is fine. An alternative measure is the sum of squared differences (or errors, i.e. SSE). You may have to take into account that the percentages always add up to 100%, so you only need to know 3 out of 4. In a regression the errors are based on the difference between an actual value and a projected (estimated) value -- as far as I understand, you are not trying to project anything; whether or how regression might help is not obvious.
 
Thanks for the response. What advantage does the sum of squared differences have over the sum of absolute differences?
 
BiologyGirl said:
Thanks for the response. What advantage does the sum of squared differences have over the sum of absolute differences?

1. It's often easier to analyze mathematically
2. The worst values are weighted more heavily, so "0% boats, 20% cars" is worse than "10% boats, 15% cars" in your example.
3. There aren't continuous of values that are considered 'equally bad', which makes it hard to decide what to prefer.
4. It's more standard -- you can find more information about it

There are some advantages of absolute differences vs sum of squared differences:
1. It's easier to calculate by hand
2. All values are equally weighted -- the opposite of #2 above
 
Visually, think of each day's observed production as a point P in a 3-dimensional space, located according to the 3 coordinates "boats," "cars," "planes." Put the origin (of the coordinates) at "the ideal production" (20%, 25%, 40%). In this setup, the three "production errors" with respect to the ideal (= the origin) along the 3 coordinates exactly describe the location of P. Suppose Monday's production was 10%, 10%, 60% (implicitly, trains = 20%). Then the 3 coordinates are -10%, -15%, 20%. The (square root of) SSE is the equivalent of the Euclidian metric. The sum of absolute distances is the equivalent of the Manhattan metric.
 
Last edited:
Thanks for the responses. I think the SSE is the best measure and I will go with it.
 
If you take the euclidian style, you won't get a perfect production = 1, awful production = 0 though. So what you need to do is take what your metric value is, and divide it by the value of whatever the worst possible production is (square root of .152 + .22, etc.). Then take 1 - that value, and 1 is the best possible production, 0 is the worst.

That's how I would do it anyway. I'm sure someboy knows a better way
 

Similar threads

  • · Replies 11 ·
Replies
11
Views
3K
  • · Replies 1 ·
Replies
1
Views
1K
  • · Replies 3 ·
Replies
3
Views
3K
  • · Replies 5 ·
Replies
5
Views
2K
  • · Replies 21 ·
Replies
21
Views
3K
  • · Replies 23 ·
Replies
23
Views
4K
  • · Replies 1 ·
Replies
1
Views
2K
  • · Replies 15 ·
Replies
15
Views
2K
  • · Replies 4 ·
Replies
4
Views
2K
  • · Replies 9 ·
Replies
9
Views
2K