I How do I compare a model to logarithmic data?

Tags:
1. Dec 7, 2016

Jules Winnfield

I have a model which is quadratic (e.g. $y = k x^2$). I'm comparing it against a large set of data (galaxy cluster masses) which spans several Log10 decades (e.g. $10^{11}$ to $10^{15}$ solar masses). What is the right way to say how good the data fits the model? Obviously the errors in the heavier galaxy clusters are going to count for more than the errors in the lighter clusters if I compare on a linear scale.

Is it valid to convert the errors and values using a Log10 operation and perform a chi-square or r-square on those values?

Last edited: Dec 7, 2016
2. Dec 7, 2016

Stephen Tashi

Is $k$ a given constant or is part of the question how to find $k$ by fitting that form of a model to the data ?

That isn't well posed mathematical question unless you also specify a model for how data deviates from the model $y = kx^2$.

For example, one model is that the $y = kx^2$ is always correct and that deviations of the model $y = kx^2$ from the data are due to measurement errors in $x$ or $y$. If you use the usual sort of regression to fit the model to the data, you assume there are errors in measuring $y$, but no errors in measuring $x$.

Another model would be that there are no errors in measuring $x$ and $y$ and that the deviations of $y$ from the observed data are due to the fact that the model is designed to predict the mean value of a population. So the difference of the observed values of y from the predicted mean values of y are not "errors", they are just a "deviations" in the population that are actually present.

A simplified model of the "errors" or "deviations" in $y$ would be that they have a normal distribution with mean 0 and a standard deviation $\sigma$ that is the same for all values of $x$. A different model would be that the logs of the "errors" or "deviations" have a normal distribution with similar parameters. There is obviously a difference between those two models. What does the science behind the model say about the nature of the errors or deviations ?

3. Dec 8, 2016

Jules Winnfield

The exact form of the model is:$$M(r)=\frac{k r^2}{G}$$$k$ and $G$ are known. I have a bunch of data from galaxy clusters that gives me the observed mass. I want to know how my model compares to the observations. The observed masses cover a range of $10^{11}$ to $10^{15}$ solar masses so $\chi^2$ won't work because the more massive clusters will weigh more than the lighter ones. There are errors both in the measured $r$ and $Mass$. What is the best way to say how good a fit my model is to the observed data?

Last edited: Dec 8, 2016
4. Dec 8, 2016

Stephen Tashi

How do you define "good"? If you can't define "good" precisely, perhaps you can give examples of how one way can be "better" than another.

The discipline of mathematical statistics does not define or provide a universally "good" or best way to measure the fit of a data to a model.

If you are asking for a way to measure the fit that is good in the sense of being effective human-to-human communication then look in scientific journals that are read by the audience for your work and see how papers in those journals report the fit of data to models. If the purpose is human-to-human communication, you can report the fit of the data to the model in ways that are traditional. For example, you can report the mean square error between M-observed and M-predicted. You can report the fit of the model to data in more than one way. You can report the mean square error in M and also report the mean square error in log(M).

Since you mention chi-square, are you intending to do a hypothesis test of some sort?