Mathematica: Listplot & Non linear fitting

Verdict · May 20, 2012

Good day,
I am relatively new to mathematica, and I am using it to fit a formula to a dataset that I have. To begin with, I am having issues with a very simple part, namely with a ListPlot.

I have an xls file with 2 columns and 26 rows, corresponding to an x and a y value. Using import, I neatly get these into mathematica. However, when I use listplot, all I get is an empty plot, ranging between -1 and 1 for both the x and the y scale. I must be doing something simple wrong, but I honestly can't figure out what it is.
If needed, the xls can be uploaded somewhere, but I am not allowed to post links yet.

Now, the next part I haven't really tried too much with yet, but it is giving me cringes even trying to think about it. I have a formula in the form of
W^((A/a)*(1-e^(-a*t))), where A and a are parameters that have to be fitted, t is the x from the datalist and the output is the y from the datalist. W is a constant, that is determined outside of mathematica. Could anyone maybe point me to a tutorial or guide on how to get me started with this?

Thanks in advance

Bill Simpson · May 20, 2012

Verdict said:

I have an xls file with 2 columns and 26 rows, corresponding to an x and a y value. Using import, I neatly get these into mathematica. However, when I use listplot, all I get is an empty plot, ranging between -1 and 1 for both the x and the y scale. I must be doing something simple wrong, but I honestly can't figure out what it is.
If needed, the xls can be uploaded somewhere, but I am not allowed to post links yet.

Since you only have a small number of data points and I assume your notebook is small and simple, just evaluate the notebook so that it shows your Import and the resulting data points and your listplot command (but highlight and delete the empty graph).

Then select everything in the notebook, copy it to the clipboard and paste it into a reply to this. That should show me exactly what you have and in a few seconds you should have an answer showing you where to add one command or where to get rid of a comma.

Then we can look at the Fit problem

Verdict · May 20, 2012

I hope this is what you mean!

In[2]:= tumorgrowthdata=Import["C:\Users\Arno\Documents\Tumor Growth Data.xls"]
Out[2]= {{{"11.3", "0.95"}, {"13.0", "0.96"}, {"16.0", "1.25"}, {"30.1",
"3.88"}, {"31.3", "3.6"}, {"32.0", "3.7"}, {"34.3",
"5.13"}, {"50.0", "9.1"}, {"51.0", "8.3"}, {"51.2",
"8.54"}, {"51.5", "9.2"}, {"68.8", "20.4"}, {"71.1",
"16.1"}, {"71.1", "20.6"}, {"72.5", "18.4"}, {"92.8",
"16.8"}, {"96.0", "22.1"}, {"97.3", "29.7"}, {"100.3",
"39.6"}, {"115.3", "49.1"}, {"116.5", "43.2"}, {"119.6",
"37.4"}, {"128.0", "69.8"}, {"144.0", "73.5"}, {"148.7",
"62.8"}, {"192.0", "103.1"}}}

In[4]:= ListPlot[tumorgrowthdata]

Edit: I don't know where all the " are coming from, they are not in the actual output from [2]

Bill Simpson · May 20, 2012

Perfect! That provides exactly what is needed.

OK, first Import thinks you are importing strings and not numbers. Mathematica will hide those quotes, sometimes that is a good thing, sometimes that hides problems, just like this.

Next, if you look at the help for ListPlot it wants a list of {{x1,y1},{x2,y2}...]]}. Notice you have {{{x1,y1},{x2,y2}...}}} so we need to get rid of one layer of {}.

So let's fix both those:

First[ToExpression[tumorgrowthdata]]

The First[] gets rid of the extra {} and the ToExpression turns your strings into numbers.

Then ListPlot of that shows your data points.

Verify this works and then we will deal with fitting

Verdict · May 20, 2012

Amazing, that did it! I feel a bit silly now though, using strings instead of numbers. I just tried to do it the same way as they do in their tutorial, but I guess excel treated my numbers as text, or something of that sort.

If you could assist me with the fitting, that would be amazing.

The formula I will use to fit the data is the one I described above, which I put into mathematica as 0.5^{{A/a}*{1 -
\!\(\*SuperscriptBox[\(\[ExponentialE]\), \({\(-a\)*t}\)]\) when I made a simple Manipulate Plot (so W = 0.5 here). That does seem to work, although it is a very different curve than the one in the paper, so I might have done something wrong there.

The correct formula, according to the paper, is

Where again Wo is a predefined constant, A and a are parameters that I need to fit to the datapoints, and t is the time (in days). The output is a weight, in grams.

Bill Simpson · May 20, 2012

Your Import didn't say anything about the file format, Mathematica guessed based on the .XLS, maybe it made a mistake, maybe there was a blank character somewhere, and it just ended up with strings. We were actually luckier than we should have been to spot that so quickly. (Hint, if you cannot figure out what is going on then FullForm[expression] tries to show you what the expression really is and that can often help spot things)

Now fitting. Fit is good for linear regression. Not so good for nonlinear. But FindFit can do some nonlinear. I did this

z = First[ToExpression[tumorgrowthdata]];
FindFit[z, (1/2)^((A/a)*(1 - E^(-a*t))), {A, a}, t]

and I get a list of warnings. Those are usually hints that something is very wrong.

FindFit[z, p*E^(q*t), {p, q}, t]

produces no warnings, but p is fabulously small, far too small.

Look at the help for FindFit. Craft up some very simple fake data. Start with a simpler equation, something like a*x^2. You are trying to understand the way that FindFit needs to have data and equations given to it so it correctly finds your parameters.

I'll fool with this for a couple of minutes and see if I can see what I'm doing wrong.

Here is Fit, but I don't think that will be powerful enough for the equation you want to fit

In[18]:= Fit[z,{1,x,x^2},x]

Out[18]= -2.312 + 0.119 x + 0.002 x^2

In[20]:= g1=ListPlot[z];
In[23]:= g2 = Plot[-2.312 + 0.119 x + 0.002 x^2, {x, 0, 200}];
In[24]:= Show[g1,g2]

Verdict · May 20, 2012

Hm, that sounds a bit problematic indeed. If it would help to take a look at the paper, it is the 2nd link you find if you google for
Construction of a Growth Curve for Mammary Tumors of the Rat, from Cancer Res 1967;27:1341-1347.

The dataset I used is the one used in Chart 3, where they combine all 4 sets from table 2. If you take a look at Table 3, they give values for A and a, the corresponding one for this would be 1, 2, 3 & 4 combined.
The values for A and a were rather small, 0.077 and 0.0135 respectively.

I see what you mean by a very small value for p though. For me it just lists 0., so I guess that means it is extremely close to 0, which is.. not good.

Maybe http://reference.wolfram.com/legacy/v5_2/Add-onsLinks/StandardPackages/Statistics/NonlinearFit.html
would be useful? It is from 5.2 though, so it is probably an outdated version of what you are thinking of. Does the method you mentioned use a least squares method?

After doing your plots, and looking at the actual paper, I get the feeling I am doing something inherently wrong somewhere. The list plot has an entirely different form than the graphs in the paper, but I am certain I am using the right dataset. I'll look into that, see what I did wrong there.

Bill Simpson · May 20, 2012

Verdict said:

Hm, that sounds a bit problematic indeed. If it would help to take a look at the paper, it is the 2nd link you find if you google for
Construction of a Growth Curve for Mammary Tumors of the Rat, from Cancer Res 1967;27:1341-1347.

I'm not finding the original paper and it won't let me in.

I did find
http://hwmaint.cancerres.aacrjournals.org/cgi/content/abstract/27/8_Part_1/1341
which shows
W = W0 e^(A/a (1-epsilon^(a t)))

And they say
"The initial weight was fixed at Wo = 0.5 gm, and the parameters A and alpha were machine-calculated from the original growth data by a least-squares fitting method. The computed values were A = 0.077 day^-1 and = 0.0135 day^-1"

Is that "epsilon" in that supposed to be another "e" or is that a different variable?

So "calculated by least squares" says we might be able use Mathematica's Fit but I do not see a way to force that into the format demanded by Fit.

If I assume the epsilon is really another "e" that was eaten by desktop publishing then I do this

In[37]:= s=Plus@@Map[((1/2)E^((A/a)*(1-E^(-a*First[#])))-Last[#])^2&,z]

In[38]:= FindMinimum[s,{A,.07},{a,.01}]

Out[38]= {782.496,{A->0.0740722,a->0.0126738}}

That doesn't exactly match their quoted parameters, but I cannot see their raw data, however it isn't wildly different.

In[39]:= g1=ListPlot[z]
In[42]:= g2=Plot[(1/2)E^((A/a)*(1-E^(-a*t)))/. {A->0.07407221767712636`, a->0.012673801510563793`}, {t,0,200}]
In[43]:= Show[g1,g2]

Verdict · May 20, 2012

That epsilon should certainly be an e, yes. It is strange that it will not let you in, as I am not on a university network, I'll send you the link in a private message.

What you are quoting however, is indeed the summary of the paper. It is interesting to see, because the formula they quote is the gompertz equation that I was thinking of, but later they mention the formula that I copied earlier. Maybe they accidentally left out the first e in that formula? That would make 'sense', as it seems like a very strange modification to just leave the exponential out, looking at the original formula found on the Gomperz wikipedia page.

The raw data is in the private message I sent you; the reason the parameters are slightly different might be/is because I didn't know how to add the uncertainties of the data into the datafile (they all had a certain +- value).
You've already been a great help, but if you could also tell me how that is done, I think that that might actually be it.

Edit: Using the inputs from your previous post(s), I do indeed get a formula for a line that seems to fit the datapoints really well!

Bill Simpson · May 20, 2012

On +/-, if they are being very careful about their claims and the use of least squares then the variance should not change with t and the distributions should be symmetric. If that is true then they should be quoting means for each data point or or something to that effect. If all that is true then fitting to the means should be as good as you are going to get. If any of this is not true then you are going towards the deep end of the pool.

On desktop publishing, it is so easy for little errors to be included and often people don't spend the effort to make certain that there are no errors introduced at any point in the chain.

On the quality of the fit, I routinely overlay those plots just to try and flush out any obviously silly mistakes.

So I think you have a good start. Use what you can from our exchange and see if you can get where you need to go. If you get stuck then put up another post and I'll see if I can give you the hint you need.

I hope it works

One other caution, Note that I just used FindMinimum on the square of the error between your prediction and your actual. I didn't differentiate with respect to each of your parameters, etc, etc, etc, the usual old school method of least squares. I suspect that might be the correct way of doing this via least squares. What I did was a quick and dirty method of trying to get minimize the squared difference between the model and the data to see if we were anywhere near correct. I can't see how to directly drop your model into Fit, which is Mathematica's least squares function. But often it takes me an afternoon of working on something else before I suddenly realize what I should have done. For all this to be carefully correct you should spend much more time thinking about this to make certain that I have not violated something deep and important.

Verdict · May 20, 2012

The plots do overlay rather nicely. The only thing that is bothering me is that the shape of the graphs is completely different than the shapes in the paper, but the datapoints do align. I don't know what they did there, maybe it's because it is an old paper (1960's) and they used different methods. Maybe not. I'll try importing the dataset again and look for any changes.

They are indeed quoting means for each datapoint, providing the standard deviations along with them. But taking what you say into account, it does indeed seem unnecessary to include the standard deviations in the model itself. Maybe I'll add in some extra plots with the standard deviations added and substracted, but that should be no problem.

All in all you have have given me more than a good start, and thank you once again!

I just had one very small question, in the FindMinimum you use minimum values for A and a, did you 'guess' those by looking at what the parameters were according to the paper? And if so, could you recommend an educated guess when the parameters are still unknown? Just take the smallest values that don't give an overflow error?

Bill Simpson · May 20, 2012

I have not studied the paper. That would take me time and I was trying to get you up to speed as quickly as possible.

On the starting points for A and a you are correct, I picked values not far from their quoted values because I didn't spend the time studying your equation to tell whether there would only be one minimum or whether there might be several local minima which might confuse FindMinimum. For problems where you don't know what the answer might likely be I would do as I previously recommended, carefully check that all the required assumptions of least squares regression are satisfied, do all the work to develop those calculations and verify that for your problem there will be exactly one global minimum. (And then because I don't trust anything, let alone me or my code, I might start from a few hundred or even a few tens of thousands of "plausible" starting points and see if they all end up converging on approximately the same solution. What is "plausible" I would eyeball from a plot of the data and from a few experimental calculations.

Verdict · May 20, 2012

Thank you, that answers my question well. I didn't mean to ask of you to read the paper, you have been of great value and I am sure that I can continue to work with this for quite a while now.

Edit: I have also 'solved' the different shape issue; I didn't see that they used a logarithmic scale. My bad.

Mathematica: Listplot & Non linear fitting

High School Ant on a stretchy rope puzzle

High School Potato paradox

Geometric Game: Fun With Matches (Safe!)

Undergrad Three Circle Problem

High School Three Squares Problem

Insights Revisiting the Velocity-Time Function

Insights Remote Operated Gate Control System

Insights AI Enriched Problem Solving

Insights Thinking Outside The Box Versus Knowing What’s In The Box

Insights Why Entangled Photon-Polarization Qubits Violate Bell’s Inequality

Insights Quantum Entanglement is a Kinematic Fact, not a Dynamical Effect

Mathematica: Listplot & Non linear fitting

Similar threads