How to read Z-score histogram for variability?

  • Context: Undergrad 
  • Thread starter Thread starter merci
  • Start date Start date
  • Tags Tags
    Histogram
Click For Summary

Discussion Overview

The discussion revolves around interpreting Z-score normalized histograms of nutritional values from various brands of muesli bars. Participants explore how to analyze variability in the data represented by these histograms, addressing issues such as gaps in the data, outliers, and the appropriateness of normalization methods.

Discussion Character

  • Exploratory
  • Technical explanation
  • Debate/contested
  • Mathematical reasoning

Main Points Raised

  • One participant seeks guidance on interpreting histograms created from Z-score normalized nutritional data, questioning how to assess variability.
  • Another participant notes that gaps in histograms are acceptable and indicate unlikely values, suggesting that the shape of the histogram can provide insights into variance.
  • A different viewpoint emphasizes the importance of context when standardizing data, cautioning against losing dimensionality and scale information.
  • Some participants discuss the appropriateness of using Z-scores for normalization, with one expressing confusion about whether to revert to original data or continue with the normalized approach.
  • There are suggestions that box plots might be a better choice for comparison, although the original poster is required to use histograms.
  • Concerns are raised about the interpretation of outliers and their relationship to variance, with a need for case-by-case analysis.

Areas of Agreement / Disagreement

Participants express differing opinions on the effectiveness of normalization methods and the interpretation of histograms. There is no consensus on whether the Z-score normalization is the best approach or if reverting to original data would be more appropriate.

Contextual Notes

Participants highlight limitations related to the interpretation of histograms, including the potential loss of context when normalizing data and the challenges posed by varying scales of measurement across different nutritional variables.

merci
Messages
3
Reaction score
0
Hi All

Would be most grateful if there are some pointers given on this question.
Ques: There is a range of different brands of museli bars with information of nutritional values. E.g Museli bars A with variables of Vitamin, Fat, Potassium values and so on. I have been asked to plot a histograms according to the variables. Since the variables have different scales, I normalised them by using Z-score. I got a few histograms with outliners, 1 or 2 with gaps between the columns in the graphs, and a few jumbled up charts. I have tried to reclassify the gaps columns charts to no avail.
Next, how do I read the graphs for largest variability based on these plotted histograms?

My understanding is that a data point above Z-score 0 means refers better than above average, and having outliners in the charts would probably mean that the standard deviation is great? Beside these guidelines, how do I interpret the graphs correctly for variablility?

Many thanks
 
Physics news on Phys.org
merci said:
Hi All

Would be most grateful if there are some pointers given on this question.
Ques: There is a range of different brands of museli bars with information of nutritional values. E.g Museli bars A with variables of Vitamin, Fat, Potassium values and so on. I have been asked to plot a histograms according to the variables. Since the variables have different scales, I normalised them by using Z-score. I got a few histograms with outliners, 1 or 2 with gaps between the columns in the graphs, and a few jumbled up charts. I have tried to reclassify the gaps columns charts to no avail.
Next, how do I read the graphs for largest variability based on these plotted histograms?

My understanding is that a data point above Z-score 0 means refers better than above average, and having outliners in the charts would probably mean that the standard deviation is great? Beside these guidelines, how do I interpret the graphs correctly for variablility?

Many thanks

Hi Merci,

I'm not sure if I quite follow you but anyway, if I understand correctly you just normalized some data and you want to study the resulting histograms.

  1. You say you tried to reclassify the gaps in your histogram. Why would you do that? if you histograms have gaps that's OK, there is nothing wrong with gaps in a histogram. It just means those values are not possible or unlikely.
  2. If you want to have a rough idea about variability in a histogram just check how flat or sharp is the "bell" the flatter the more variance.
  3. An outlier is the weird data in the set, it does not mean is wrong and you cannot immediately conclude either that the outlier means a bigger variance, you need to analyze them per case.

But anyway, since you are talking about brands of muesli, and I don't think you have a large number of those, the histogram might be not very informative to begin with.
 
merci said:
Hi All

Would be most grateful if there are some pointers given on this question.
Ques: There is a range of different brands of museli bars with information of nutritional values. E.g Museli bars A with variables of Vitamin, Fat, Potassium values and so on. I have been asked to plot a histograms according to the variables. Since the variables have different scales, I normalised them by using Z-score. I got a few histograms with outliners, 1 or 2 with gaps between the columns in the graphs, and a few jumbled up charts. I have tried to reclassify the gaps columns charts to no avail.
Next, how do I read the graphs for largest variability based on these plotted histograms?

My understanding is that a data point above Z-score 0 means refers better than above average, and having outliners in the charts would probably mean that the standard deviation is great? Beside these guidelines, how do I interpret the graphs correctly for variablility?

Many thanks

Hey merci and welcome to the forums.

In terms of raw variance of the sample, you calculate this using the standard way. You can then if your assumptions are correct, get an estimate for the underlying population variance using either classical methods (chi-square distribution) or bayesian methods. To understand the difference you need to consider ideas from hypothesis testing which deal with getting the right answer as well as Type I and Type II errors and what they mean qualitatively and quantitatively.

If all variables are considered independent and can't be related to each other then you will just have to deal in the natural units for Vitamins, Fat and Potassium and there is nothing else you can really do since you really have an apples and oranges comparison.

The general standardization procedure (subtracting mean and dividing this whole thing by standard deviation) will always normalize your distribution no matter what and you can show this by using the properties of expectation and variance. It's not just used for normally distributed distributions.

Because you have standardized the new mean to be zero, then what this means is that zero corresponds to 'average' and thus things greater than this are greater than average just as you suspected.

I would be careful though about standardizing your data because you want to interpret your data for what it is and keep the context of your dimensionality and scale in a way that you don't lose this important information. If you transform your data in the wrong way or use transformed data in the wrong way, then your analysis will get screwed up.
 
Hi Viraltux & Chiro

Thanks for helping. =)
Viraltux: I have to use histogram. No other choice based on the question. Supposely, box plot would be a better choice to do the comparsion? The flatter bell means the lower in height of the curve correct (the lowest no. of frequency, spread is wider )?

Chiro: Yes. The thought has occurred to me before. Am I using the right method to normalised it first? or should instead leave the data alone. The problem is that after plotting one or 2 brands, I noticed that the scales are all different. E.g. Vitamin 1 histogram, Calories 1 histogram. From these, I have to analyse for variablility. Following, I read somewhere in the net that if do encounter different scales to compare. It is better to normalise using z-score method to do easy reference. Am I doing the incorrect thing now? Should I revert to original data & plot them? ~ confused.
 
merci said:
Hi Viraltux & Chiro
Chiro: Yes. The thought has occurred to me before. Am I using the right method to normalised it first? or should instead leave the data alone. The problem is that after plotting one or 2 brands, I noticed that the scales are all different. E.g. Vitamin 1 histogram, Calories 1 histogram. From these, I have to analyse for variablility. Following, I read somewhere in the net that if do encounter different scales to compare. It is better to normalise using z-score method to do easy reference. Am I doing the incorrect thing now? Should I revert to original data & plot them? ~ confused.

Standardizing is useful for plotting when you want to get some of the features of your distribution visually.

In terms of the scaling used though, you would need to put your data into context. For example you might want to say standardize your results by instead using means and variances that correspond to say particular characteristics like say 'average' intake per day for some class of people (children, adults, atheletes etc) and the use a variance that corresponds to these classes.

Doing the above will put your data into perspective relative to something which in the above case is relative to something like the recommended intake or actual intake for a class of people of different types.

Before you actually uses this data in further analysis you need to check with someone with statistical knowledge how you use these in that analyses. You could ask here for some advice on this but ultimately you will have to make sure that your analyses, your data, and assumptions and interpretations of analsyses are sound.
 

Similar threads

  • · Replies 3 ·
Replies
3
Views
2K
  • · Replies 1 ·
Replies
1
Views
2K
  • · Replies 12 ·
Replies
12
Views
2K
  • · Replies 7 ·
Replies
7
Views
2K
  • · Replies 4 ·
Replies
4
Views
3K
  • · Replies 1 ·
Replies
1
Views
2K
  • · Replies 1 ·
Replies
1
Views
3K
  • · Replies 80 ·
3
Replies
80
Views
8K
Replies
3
Views
2K
Replies
3
Views
5K