Solving Discrepency Events - What Statistical Measure to Use?

  • Thread starter BiologyGirl
  • Start date
  • Tags
    Events
In summary, the speaker is looking for a statistical measure to use in evaluating the discrepancy between their ideal production goals and the actual production output of their factory. They are considering using the sum of squared differences or the sum of absolute differences, and are not sure which one would be more suitable. They also mention the possibility of using linear regression, but are unsure how it would apply in this scenario. They ultimately decide to go with the sum of squared differences, but mention that they will have to adjust for the worst possible production value in order to get a perfect score of 1 and a worst score of 0.
  • #1
BiologyGirl
3
0
I am having problems deciding on which statistical measure to use. Although this problem is of the simplest type, none of my books seem to address exactly what I need.

Let me describe a typical example:

Suppose that I have a factory that produces four types of products, say boats, cars, planes, and trains. (Big factory, I know). For ideal production, I want 20% of the factory's output to be boats, 25% to be cars, 40% to be planes, and 15% to be trains.

Suppose the factory instead produces 15% boats, 20% cars, 40% planes, and 25% trains. How would I express, using one statistical measure, how far off the factory is producing from the ideal?

I want the answer to be a percentage so that 100% is perfect alignment with the targeted goals, 0% would be no alignment (the factory produces spaceships instead).

My first inclination was to use simply find the average discrepency, that is, take the absolute values of each difference and average them. If needed, I could weight the result to produce a result between 0 and 100%, but something tells me that my plan is too unsophisticated. Is there a form of the linear regression that I could use on data that is not described by a function but represented in terms of finite data? What about weighting the standard deviation?

As you can tell, I am not a statistician (all of my experience is using statistics on functional data), but if I was just told the name of the statistical measure to use I could figure out the rest on my own.

Thanks in advance.
 
Physics news on Phys.org
  • #2
You are looking for a metric (distance function) so the sum of absolute differences is fine. An alternative measure is the sum of squared differences (or errors, i.e. SSE). You may have to take into account that the percentages always add up to 100%, so you only need to know 3 out of 4. In a regression the errors are based on the difference between an actual value and a projected (estimated) value -- as far as I understand, you are not trying to project anything; whether or how regression might help is not obvious.
 
  • #3
Thanks for the response. What advantage does the sum of squared differences have over the sum of absolute differences?
 
  • #4
BiologyGirl said:
Thanks for the response. What advantage does the sum of squared differences have over the sum of absolute differences?

1. It's often easier to analyze mathematically
2. The worst values are weighted more heavily, so "0% boats, 20% cars" is worse than "10% boats, 15% cars" in your example.
3. There aren't continuous of values that are considered 'equally bad', which makes it hard to decide what to prefer.
4. It's more standard -- you can find more information about it

There are some advantages of absolute differences vs sum of squared differences:
1. It's easier to calculate by hand
2. All values are equally weighted -- the opposite of #2 above
 
  • #5
Visually, think of each day's observed production as a point P in a 3-dimensional space, located according to the 3 coordinates "boats," "cars," "planes." Put the origin (of the coordinates) at "the ideal production" (20%, 25%, 40%). In this setup, the three "production errors" with respect to the ideal (= the origin) along the 3 coordinates exactly describe the location of P. Suppose Monday's production was 10%, 10%, 60% (implicitly, trains = 20%). Then the 3 coordinates are -10%, -15%, 20%. The (square root of) SSE is the equivalent of the Euclidian metric. The sum of absolute distances is the equivalent of the Manhattan metric.
 
Last edited:
  • #6
Thanks for the responses. I think the SSE is the best measure and I will go with it.
 
  • #7
If you take the euclidian style, you won't get a perfect production = 1, awful production = 0 though. So what you need to do is take what your metric value is, and divide it by the value of whatever the worst possible production is (square root of .152 + .22, etc.). Then take 1 - that value, and 1 is the best possible production, 0 is the worst.

That's how I would do it anyway. I'm sure someboy knows a better way
 

Related to Solving Discrepency Events - What Statistical Measure to Use?

1. What is a discrepency event?

A discrepency event is a situation where there is a difference or discrepancy between the expected/ideal outcome and the actual outcome. This can occur in any type of data or experiment, and it is important to identify and address these events in order to improve the accuracy and reliability of results.

2. How do you identify discrepency events?

Discrepency events can be identified by comparing the expected or ideal outcome to the actual outcome. This can be done by analyzing data, conducting experiments, or performing statistical tests. Any difference between the expected and actual results can be considered a discrepency event.

3. What is a statistical measure?

A statistical measure is a numerical value or calculation that summarizes a set of data. It is used to describe the characteristics or patterns of the data, and can be used to compare different datasets or make predictions. Common statistical measures include mean, median, mode, standard deviation, and correlation coefficient.

4. How do you choose the appropriate statistical measure to use for discrepency events?

The appropriate statistical measure to use for discrepency events depends on the type of data and the research question being addressed. For example, if the discrepency event involves numerical data, measures such as mean or standard deviation may be used. If the discrepency event involves categorical data, measures such as mode or chi-square may be used. It is important to carefully consider the data and research question in order to choose the most appropriate statistical measure.

5. How can statistical measures help to solve discrepency events?

Statistical measures can help to solve discrepency events by providing quantitative evidence to identify and understand the cause of the discrepency. They can also be used to make predictions and evaluate the effectiveness of potential solutions. By using statistical measures, scientists can analyze and interpret data to identify patterns and trends, and make informed decisions to address discrepency events.

Similar threads

  • Set Theory, Logic, Probability, Statistics
Replies
1
Views
637
  • Set Theory, Logic, Probability, Statistics
Replies
5
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
21
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
23
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
1
Views
967
  • Set Theory, Logic, Probability, Statistics
Replies
1
Views
1K
Replies
2
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
15
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
4
Views
1K
  • Quantum Interpretations and Foundations
Replies
15
Views
2K
Back
Top