How do I compare a model to logarithmic data?

Click For Summary

Discussion Overview

The discussion revolves around comparing a quadratic model of galaxy cluster masses to observational data that spans several logarithmic decades. Participants explore methods for assessing the goodness of fit between the model and the data, considering the implications of measurement errors and the definition of "good" fit.

Discussion Character

  • Exploratory
  • Technical explanation
  • Debate/contested
  • Mathematical reasoning

Main Points Raised

  • Some participants question whether the constant ##k## in the model is predetermined or if it needs to be determined through fitting the model to the data.
  • There is a discussion about the nature of errors in the measurements of ##x## and ##y##, with some proposing that deviations from the model could be due to measurement errors, while others suggest they may represent natural deviations in the population.
  • One participant suggests that a normal distribution of errors in ##y## could be a simplified model, while another proposes considering the logarithm of the errors as potentially having a normal distribution.
  • Concerns are raised about the applicability of chi-square tests due to the varying weights of errors across different mass scales, particularly for heavier galaxy clusters.
  • Participants discuss the ambiguity of defining what constitutes a "good" fit, noting that mathematical statistics does not provide a universal standard for this measurement.
  • Suggestions are made to report the fit of the model using traditional methods found in scientific literature, such as mean square error, both in the mass and in logarithmic scale.

Areas of Agreement / Disagreement

Participants express differing views on how to define and measure the goodness of fit for the model against the data. There is no consensus on a single method or definition of "good" fit, and multiple competing approaches are discussed.

Contextual Notes

The discussion highlights limitations in defining measurement errors and the assumptions underlying different models of deviations from the quadratic model. The applicability of statistical methods like chi-square is also questioned due to the nature of the data.

Jules Winnfield
Messages
16
Reaction score
0
I have a model which is quadratic (e.g. ##y = k x^2##). I'm comparing it against a large set of data (galaxy cluster masses) which spans several Log10 decades (e.g. ##10^{11}## to ##10^{15}## solar masses). What is the right way to say how good the data fits the model? Obviously the errors in the heavier galaxy clusters are going to count for more than the errors in the lighter clusters if I compare on a linear scale.

Is it valid to convert the errors and values using a Log10 operation and perform a chi-square or r-square on those values?
 
Last edited:
Astronomy news on Phys.org
Jules Winnfield said:
I have a model which is quadratic (e.g. ##y = k x^2##).
Is ##k## a given constant or is part of the question how to find ##k## by fitting that form of a model to the data ?

I'm comparing it against a large set of data (galaxy cluster masses) which spans several Log10 decades (e.g. ##10^{11}## to ##10^{15}## solar masses). What is the right way to say how good the data fits the model?
That isn't well posed mathematical question unless you also specify a model for how data deviates from the model ##y = kx^2##.

For example, one model is that the ##y = kx^2## is always correct and that deviations of the model ##y = kx^2## from the data are due to measurement errors in ##x## or ##y##. If you use the usual sort of regression to fit the model to the data, you assume there are errors in measuring ##y##, but no errors in measuring ##x##.

Another model would be that there are no errors in measuring ##x## and ##y## and that the deviations of ##y## from the observed data are due to the fact that the model is designed to predict the mean value of a population. So the difference of the observed values of y from the predicted mean values of y are not "errors", they are just a "deviations" in the population that are actually present.

A simplified model of the "errors" or "deviations" in ##y## would be that they have a normal distribution with mean 0 and a standard deviation ##\sigma## that is the same for all values of ##x##. A different model would be that the logs of the "errors" or "deviations" have a normal distribution with similar parameters. There is obviously a difference between those two models. What does the science behind the model say about the nature of the errors or deviations ?
 
Stephen Tashi said:
Is ##k## a given constant or is part of the question how to find ##k## by fitting that form of a model to the data ?
That isn't well posed mathematical question unless you also specify a model for how data deviates from the model ##y = kx^2##.
The exact form of the model is:$$M(r)=\frac{k r^2}{G}$$##k## and ##G## are known. I have a bunch of data from galaxy clusters that gives me the observed mass. I want to know how my model compares to the observations. The observed masses cover a range of ##10^{11}## to ##10^{15}## solar masses so ##\chi^2## won't work because the more massive clusters will weigh more than the lighter ones. There are errors both in the measured ##r## and ##Mass##. What is the best way to say how good a fit my model is to the observed data?
 
Last edited:
Jules Winnfield said:
What is the best way to say how good a fit my model is to the observed data?

How do you define "good"? If you can't define "good" precisely, perhaps you can give examples of how one way can be "better" than another.

The discipline of mathematical statistics does not define or provide a universally "good" or best way to measure the fit of a data to a model.

If you are asking for a way to measure the fit that is good in the sense of being effective human-to-human communication then look in scientific journals that are read by the audience for your work and see how papers in those journals report the fit of data to models. If the purpose is human-to-human communication, you can report the fit of the data to the model in ways that are traditional. For example, you can report the mean square error between M-observed and M-predicted. You can report the fit of the model to data in more than one way. You can report the mean square error in M and also report the mean square error in log(M).

Since you mention chi-square, are you intending to do a hypothesis test of some sort?
 

Similar threads

  • · Replies 2 ·
Replies
2
Views
3K
  • · Replies 5 ·
Replies
5
Views
2K
  • · Replies 6 ·
Replies
6
Views
2K
  • · Replies 5 ·
Replies
5
Views
9K
Replies
28
Views
4K
Replies
1
Views
3K
  • · Replies 2 ·
Replies
2
Views
3K
  • · Replies 2 ·
Replies
2
Views
3K
  • · Replies 5 ·
Replies
5
Views
4K
  • · Replies 6 ·
Replies
6
Views
5K