How do I compare a model to logarithmic data?

In summary: If you are intending to do a hypothesis test, is the purpose to make a decision, or is the purpose to communicate to humans about how good the model is?In summary, the conversation discusses the best way to measure the fit of a model to a large set of data on galaxy cluster masses. The model in question is quadratic, and the data spans several Log10 decades. The conversation also considers the role of measurement errors in both the x and y values, and whether to use a chi-square or r-square method when converting the errors and values using a Log10 operation. There is no universally agreed upon method for measuring fit, so it is suggested to look at how other papers in the field report fit and to consider multiple ways of
  • #1
Jules Winnfield
16
0
I have a model which is quadratic (e.g. ##y = k x^2##). I'm comparing it against a large set of data (galaxy cluster masses) which spans several Log10 decades (e.g. ##10^{11}## to ##10^{15}## solar masses). What is the right way to say how good the data fits the model? Obviously the errors in the heavier galaxy clusters are going to count for more than the errors in the lighter clusters if I compare on a linear scale.

Is it valid to convert the errors and values using a Log10 operation and perform a chi-square or r-square on those values?
 
Last edited:
Astronomy news on Phys.org
  • #2
Jules Winnfield said:
I have a model which is quadratic (e.g. ##y = k x^2##).
Is ##k## a given constant or is part of the question how to find ##k## by fitting that form of a model to the data ?

I'm comparing it against a large set of data (galaxy cluster masses) which spans several Log10 decades (e.g. ##10^{11}## to ##10^{15}## solar masses). What is the right way to say how good the data fits the model?
That isn't well posed mathematical question unless you also specify a model for how data deviates from the model ##y = kx^2##.

For example, one model is that the ##y = kx^2## is always correct and that deviations of the model ##y = kx^2## from the data are due to measurement errors in ##x## or ##y##. If you use the usual sort of regression to fit the model to the data, you assume there are errors in measuring ##y##, but no errors in measuring ##x##.

Another model would be that there are no errors in measuring ##x## and ##y## and that the deviations of ##y## from the observed data are due to the fact that the model is designed to predict the mean value of a population. So the difference of the observed values of y from the predicted mean values of y are not "errors", they are just a "deviations" in the population that are actually present.

A simplified model of the "errors" or "deviations" in ##y## would be that they have a normal distribution with mean 0 and a standard deviation ##\sigma## that is the same for all values of ##x##. A different model would be that the logs of the "errors" or "deviations" have a normal distribution with similar parameters. There is obviously a difference between those two models. What does the science behind the model say about the nature of the errors or deviations ?
 
  • #3
Stephen Tashi said:
Is ##k## a given constant or is part of the question how to find ##k## by fitting that form of a model to the data ?
That isn't well posed mathematical question unless you also specify a model for how data deviates from the model ##y = kx^2##.
The exact form of the model is:$$M(r)=\frac{k r^2}{G}$$##k## and ##G## are known. I have a bunch of data from galaxy clusters that gives me the observed mass. I want to know how my model compares to the observations. The observed masses cover a range of ##10^{11}## to ##10^{15}## solar masses so ##\chi^2## won't work because the more massive clusters will weigh more than the lighter ones. There are errors both in the measured ##r## and ##Mass##. What is the best way to say how good a fit my model is to the observed data?
 
Last edited:
  • #4
Jules Winnfield said:
What is the best way to say how good a fit my model is to the observed data?

How do you define "good"? If you can't define "good" precisely, perhaps you can give examples of how one way can be "better" than another.

The discipline of mathematical statistics does not define or provide a universally "good" or best way to measure the fit of a data to a model.

If you are asking for a way to measure the fit that is good in the sense of being effective human-to-human communication then look in scientific journals that are read by the audience for your work and see how papers in those journals report the fit of data to models. If the purpose is human-to-human communication, you can report the fit of the data to the model in ways that are traditional. For example, you can report the mean square error between M-observed and M-predicted. You can report the fit of the model to data in more than one way. You can report the mean square error in M and also report the mean square error in log(M).

Since you mention chi-square, are you intending to do a hypothesis test of some sort?
 

1. How do I compare a model to logarithmic data?

To compare a model to logarithmic data, you first need to plot the logarithmic data on a graph. Then, you can overlay your model's predicted values on the same graph. This will allow you to visually compare the fit of your model to the actual data. Additionally, you can use statistical methods such as calculating the correlation coefficient or performing a regression analysis to quantify the relationship between the model and the logarithmic data.

2. What is the best way to represent logarithmic data?

The best way to represent logarithmic data is by using a logarithmic scale on a graph. This allows for a more accurate representation of the data, as it shows the magnitude of change on a consistent scale. It also helps to visualize any patterns or trends in the data that may not be apparent on a linear scale.

3. Can I use a linear model to compare to logarithmic data?

No, a linear model is not appropriate for comparing to logarithmic data. This is because the relationship between the variables in logarithmic data is non-linear, and a linear model would not accurately reflect this relationship. Instead, it is recommended to use a logarithmic model or a non-linear regression model to compare to logarithmic data.

4. How do I determine if my model is a good fit for logarithmic data?

There are several ways to determine if your model is a good fit for logarithmic data. One way is to visually compare the predicted values of your model to the actual data on a graph. Another way is to calculate the mean squared error (MSE) or root mean squared error (RMSE) between the predicted values and the actual data. A lower MSE or RMSE indicates a better fit for the model.

5. What are the limitations of comparing a model to logarithmic data?

One limitation of comparing a model to logarithmic data is that it can be challenging to interpret the results. This is because the relationship between the variables in logarithmic data is non-linear, making it more difficult to understand and explain. Additionally, extrapolating beyond the range of the data can be unreliable and should be done with caution.

Similar threads

  • Set Theory, Logic, Probability, Statistics
Replies
5
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
1
Views
916
  • Astronomy and Astrophysics
Replies
6
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
7
Views
2K
Replies
1
Views
969
  • Set Theory, Logic, Probability, Statistics
Replies
28
Views
3K
  • Set Theory, Logic, Probability, Statistics
Replies
16
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
5
Views
7K
Replies
2
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
24
Views
2K
Back
Top