How to read Z-score histogram for variability?

  • Thread starter merci
  • Start date
  • Tags
    Histogram
In summary, the conversation discusses the use of histograms to plot nutritional values of different brands of muesli bars. The variables were normalized using Z-score due to different scales, resulting in some histograms with outliers and gaps between columns. The discussion also touches on interpreting histograms for variability and the appropriateness of using standardized data.
  • #1
merci
3
0
Hi All

Would be most grateful if there are some pointers given on this question.
Ques: There is a range of different brands of museli bars with information of nutritional values. E.g Museli bars A with variables of Vitamin, Fat, Potassium values and so on. I have been asked to plot a histograms according to the variables. Since the variables have different scales, I normalised them by using Z-score. I got a few histograms with outliners, 1 or 2 with gaps between the columns in the graphs, and a few jumbled up charts. I have tried to reclassify the gaps columns charts to no avail.
Next, how do I read the graphs for largest variability based on these plotted histograms?

My understanding is that a data point above Z-score 0 means refers better than above average, and having outliners in the charts would probably mean that the standard deviation is great? Beside these guidelines, how do I interpret the graphs correctly for variablility?

Many thanks
 
Physics news on Phys.org
  • #2
merci said:
Hi All

Would be most grateful if there are some pointers given on this question.
Ques: There is a range of different brands of museli bars with information of nutritional values. E.g Museli bars A with variables of Vitamin, Fat, Potassium values and so on. I have been asked to plot a histograms according to the variables. Since the variables have different scales, I normalised them by using Z-score. I got a few histograms with outliners, 1 or 2 with gaps between the columns in the graphs, and a few jumbled up charts. I have tried to reclassify the gaps columns charts to no avail.
Next, how do I read the graphs for largest variability based on these plotted histograms?

My understanding is that a data point above Z-score 0 means refers better than above average, and having outliners in the charts would probably mean that the standard deviation is great? Beside these guidelines, how do I interpret the graphs correctly for variablility?

Many thanks

Hi Merci,

I'm not sure if I quite follow you but anyway, if I understand correctly you just normalized some data and you want to study the resulting histograms.

  1. You say you tried to reclassify the gaps in your histogram. Why would you do that? if you histograms have gaps that's OK, there is nothing wrong with gaps in a histogram. It just means those values are not possible or unlikely.
  2. If you want to have a rough idea about variability in a histogram just check how flat or sharp is the "bell" the flatter the more variance.
  3. An outlier is the weird data in the set, it does not mean is wrong and you cannot immediately conclude either that the outlier means a bigger variance, you need to analyze them per case.

But anyway, since you are talking about brands of muesli, and I don't think you have a large number of those, the histogram might be not very informative to begin with.
 
  • #3
merci said:
Hi All

Would be most grateful if there are some pointers given on this question.
Ques: There is a range of different brands of museli bars with information of nutritional values. E.g Museli bars A with variables of Vitamin, Fat, Potassium values and so on. I have been asked to plot a histograms according to the variables. Since the variables have different scales, I normalised them by using Z-score. I got a few histograms with outliners, 1 or 2 with gaps between the columns in the graphs, and a few jumbled up charts. I have tried to reclassify the gaps columns charts to no avail.
Next, how do I read the graphs for largest variability based on these plotted histograms?

My understanding is that a data point above Z-score 0 means refers better than above average, and having outliners in the charts would probably mean that the standard deviation is great? Beside these guidelines, how do I interpret the graphs correctly for variablility?

Many thanks

Hey merci and welcome to the forums.

In terms of raw variance of the sample, you calculate this using the standard way. You can then if your assumptions are correct, get an estimate for the underlying population variance using either classical methods (chi-square distribution) or bayesian methods. To understand the difference you need to consider ideas from hypothesis testing which deal with getting the right answer as well as Type I and Type II errors and what they mean qualitatively and quantitatively.

If all variables are considered independent and can't be related to each other then you will just have to deal in the natural units for Vitamins, Fat and Potassium and there is nothing else you can really do since you really have an apples and oranges comparison.

The general standardization procedure (subtracting mean and dividing this whole thing by standard deviation) will always normalize your distribution no matter what and you can show this by using the properties of expectation and variance. It's not just used for normally distributed distributions.

Because you have standardized the new mean to be zero, then what this means is that zero corresponds to 'average' and thus things greater than this are greater than average just as you suspected.

I would be careful though about standardizing your data because you want to interpret your data for what it is and keep the context of your dimensionality and scale in a way that you don't lose this important information. If you transform your data in the wrong way or use transformed data in the wrong way, then your analysis will get screwed up.
 
  • #4
Hi Viraltux & Chiro

Thanks for helping. =)
Viraltux: I have to use histogram. No other choice based on the question. Supposely, box plot would be a better choice to do the comparsion? The flatter bell means the lower in height of the curve correct (the lowest no. of frequency, spread is wider )?

Chiro: Yes. The thought has occurred to me before. Am I using the right method to normalised it first? or should instead leave the data alone. The problem is that after plotting one or 2 brands, I noticed that the scales are all different. E.g. Vitamin 1 histogram, Calories 1 histogram. From these, I have to analyse for variablility. Following, I read somewhere in the net that if do encounter different scales to compare. It is better to normalise using z-score method to do easy reference. Am I doing the incorrect thing now? Should I revert to original data & plot them? ~ confused.
 
  • #5
merci said:
Hi Viraltux & Chiro
Chiro: Yes. The thought has occurred to me before. Am I using the right method to normalised it first? or should instead leave the data alone. The problem is that after plotting one or 2 brands, I noticed that the scales are all different. E.g. Vitamin 1 histogram, Calories 1 histogram. From these, I have to analyse for variablility. Following, I read somewhere in the net that if do encounter different scales to compare. It is better to normalise using z-score method to do easy reference. Am I doing the incorrect thing now? Should I revert to original data & plot them? ~ confused.

Standardizing is useful for plotting when you want to get some of the features of your distribution visually.

In terms of the scaling used though, you would need to put your data into context. For example you might want to say standardize your results by instead using means and variances that correspond to say particular characteristics like say 'average' intake per day for some class of people (children, adults, atheletes etc) and the use a variance that corresponds to these classes.

Doing the above will put your data into perspective relative to something which in the above case is relative to something like the recommended intake or actual intake for a class of people of different types.

Before you actually uses this data in further analysis you need to check with someone with statistical knowledge how you use these in that analyses. You could ask here for some advice on this but ultimately you will have to make sure that your analyses, your data, and assumptions and interpretations of analsyses are sound.
 

1. What is a Z-score histogram?

A Z-score histogram is a type of graph used to visualize the distribution of data and identify any outliers or extreme values. It plots the frequency of data points against their corresponding Z-scores, which are standardized values that represent how many standard deviations a data point is from the mean.

2. How do I interpret a Z-score histogram?

To interpret a Z-score histogram, you should look at the shape, center, and spread of the distribution. The shape can be symmetric, skewed left, or skewed right. The center is represented by the mean and the spread is represented by the standard deviation. Outliers are identified as data points that fall outside of the typical range of values.

3. What does a high variability in a Z-score histogram indicate?

A high variability in a Z-score histogram indicates that the data points are spread out over a wide range of values. This means that there is a large difference between the highest and lowest values in the dataset. It could also suggest that there are significant differences among the data points, making it difficult to identify any patterns or trends.

4. How can I use a Z-score histogram to compare two datasets?

To compare two datasets using a Z-score histogram, you can look at the shape, center, and spread of each distribution. If the shapes are similar, you can compare the means and standard deviations to see which dataset has a higher or lower average value and a wider or narrower spread. This can help identify which dataset has more variability or outliers.

5. Why is it important to consider variability when analyzing data using Z-scores?

Variability is important to consider when analyzing data using Z-scores because it tells us how much the data points deviate from the mean. This can provide important insights into the distribution of the data and help identify any unusual or extreme values. Additionally, variability is a key component in calculating the probability of certain events occurring within a dataset.

Similar threads

  • Set Theory, Logic, Probability, Statistics
Replies
1
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
7
Views
2K
Replies
80
Views
3K
  • Set Theory, Logic, Probability, Statistics
Replies
1
Views
2K
  • MATLAB, Maple, Mathematica, LaTeX
Replies
1
Views
3K
  • Introductory Physics Homework Help
Replies
1
Views
3K
  • Precalculus Mathematics Homework Help
Replies
1
Views
2K
  • General Math
Replies
1
Views
1K
  • Quantum Interpretations and Foundations
Replies
1
Views
490
Back
Top