Dismiss Notice
Join Physics Forums Today!
The friendliest, high quality science and math community on the planet! Everyone who loves science is here!

Mathematica: Listplot & Non linear fitting

  1. May 20, 2012 #1
    Good day,
    I am relatively new to mathematica, and I am using it to fit a formula to a dataset that I have. To begin with, I am having issues with a very simple part, namely with a ListPlot.

    I have an xls file with 2 columns and 26 rows, corresponding to an x and a y value. Using import, I neatly get these into mathematica. However, when I use listplot, all I get is an empty plot, ranging between -1 and 1 for both the x and the y scale. I must be doing something simple wrong, but I honestly can't figure out what it is.
    If needed, the xls can be uploaded somewhere, but I am not allowed to post links yet.

    Now, the next part I haven't really tried too much with yet, but it is giving me cringes even trying to think about it. I have a formula in the form of
    W^((A/a)*(1-e^(-a*t))), where A and a are parameters that have to be fitted, t is the x from the datalist and the output is the y from the datalist. W is a constant, that is determined outside of mathematica. Could anyone maybe point me to a tutorial or guide on how to get me started with this?

    Thanks in advance
  2. jcsd
  3. May 20, 2012 #2
    Since you only have a small number of data points and I assume your notebook is small and simple, just evaluate the notebook so that it shows your Import and the resulting data points and your listplot command (but highlight and delete the empty graph).

    Then select everything in the notebook, copy it to the clipboard and paste it into a reply to this. That should show me exactly what you have and in a few seconds you should have an answer showing you where to add one command or where to get rid of a comma.

    Then we can look at the Fit problem
  4. May 20, 2012 #3
    I hope this is what you mean!

    In[2]:= tumorgrowthdata=Import["C:\Users\Arno\Documents\Tumor Growth Data.xls"]
    Out[2]= {{{"11.3", "0.95"}, {"13.0", "0.96"}, {"16.0", "1.25"}, {"30.1",
    "3.88"}, {"31.3", "3.6"}, {"32.0", "3.7"}, {"34.3",
    "5.13"}, {"50.0", "9.1"}, {"51.0", "8.3"}, {"51.2",
    "8.54"}, {"51.5", "9.2"}, {"68.8", "20.4"}, {"71.1",
    "16.1"}, {"71.1", "20.6"}, {"72.5", "18.4"}, {"92.8",
    "16.8"}, {"96.0", "22.1"}, {"97.3", "29.7"}, {"100.3",
    "39.6"}, {"115.3", "49.1"}, {"116.5", "43.2"}, {"119.6",
    "37.4"}, {"128.0", "69.8"}, {"144.0", "73.5"}, {"148.7",
    "62.8"}, {"192.0", "103.1"}}}

    In[4]:= ListPlot[tumorgrowthdata]

    Edit: I don't know where all the " are coming from, they are not in the actual output from [2]
  5. May 20, 2012 #4
    Perfect! That provides exactly what is needed.

    OK, first Import thinks you are importing strings and not numbers. Mathematica will hide those quotes, sometimes that is a good thing, sometimes that hides problems, just like this.

    Next, if you look at the help for ListPlot it wants a list of {{x1,y1},{x2,y2}...]]}. Notice you have {{{x1,y1},{x2,y2}...}}} so we need to get rid of one layer of {}.

    So lets fix both those:


    The First[] gets rid of the extra {} and the ToExpression turns your strings into numbers.

    Then ListPlot of that shows your data points.

    Verify this works and then we will deal with fitting
  6. May 20, 2012 #5
    Amazing, that did it! I feel a bit silly now though, using strings instead of numbers. I just tried to do it the same way as they do in their tutorial, but I guess excel treated my numbers as text, or something of that sort.

    If you could assist me with the fitting, that would be amazing.

    The formula I will use to fit the data is the one I described above, which I put into mathematica as 0.5^{{A/a}*{1 -
    \!\(\*SuperscriptBox[\(\[ExponentialE]\), \({\(-a\)*t}\)]\) when I made a simple Manipulate Plot (so W = 0.5 here). That does seem to work, although it is a very different curve than the one in the paper, so I might have done something wrong there.

    The correct formula, according to the paper, is Naamloos.jpg
    Where again Wo is a predefined constant, A and a are parameters that I need to fit to the datapoints, and t is the time (in days). The output is a weight, in grams.
  7. May 20, 2012 #6
    Your Import didn't say anything about the file format, Mathematica guessed based on the .XLS, maybe it made a mistake, maybe there was a blank character somewhere, and it just ended up with strings. We were actually luckier than we should have been to spot that so quickly. (Hint, if you cannot figure out what is going on then FullForm[expression] tries to show you what the expression really is and that can often help spot things)

    Now fitting. Fit is good for linear regression. Not so good for nonlinear. But FindFit can do some nonlinear. I did this

    z = First[ToExpression[tumorgrowthdata]];
    FindFit[z, (1/2)^((A/a)*(1 - E^(-a*t))), {A, a}, t]

    and I get a list of warnings. Those are usually hints that something is very wrong.

    FindFit[z, p*E^(q*t), {p, q}, t]

    produces no warnings, but p is fabulously small, far too small.

    Look at the help for FindFit. Craft up some very simple fake data. Start with a simpler equation, something like a*x^2. You are trying to understand the way that FindFit needs to have data and equations given to it so it correctly finds your parameters.

    I'll fool with this for a couple of minutes and see if I can see what I'm doing wrong.

    Here is Fit, but I don't think that will be powerful enough for the equation you want to fit

    In[18]:= Fit[z,{1,x,x^2},x]

    Out[18]= -2.312 + 0.119 x + 0.002 x^2

    In[20]:= g1=ListPlot[z];
    In[23]:= g2 = Plot[-2.312 + 0.119 x + 0.002 x^2, {x, 0, 200}];
    In[24]:= Show[g1,g2]
    Last edited: May 20, 2012
  8. May 20, 2012 #7
    Hm, that sounds a bit problematic indeed. If it would help to take a look at the paper, it is the 2nd link you find if you google for
    Construction of a Growth Curve for Mammary Tumors of the Rat, from Cancer Res 1967;27:1341-1347.

    The dataset I used is the one used in Chart 3, where they combine all 4 sets from table 2. If you take a look at Table 3, they give values for A and a, the corresponding one for this would be 1, 2, 3 & 4 combined.
    The values for A and a were rather small, 0.077 and 0.0135 respectively.

    I see what you mean by a very small value for p though. For me it just lists 0., so I guess that means it is extremely close to 0, which is.. not good.

    Maybe http://reference.wolfram.com/legacy/v5_2/Add-onsLinks/StandardPackages/Statistics/NonlinearFit.html
    would be useful? It is from 5.2 though, so it is probably an outdated version of what you are thinking of. Does the method you mentioned use a least squares method?

    After doing your plots, and looking at the actual paper, I get the feeling I am doing something inherently wrong somewhere. The list plot has an entirely different form than the graphs in the paper, but I am certain I am using the right dataset. I'll look into that, see what I did wrong there.
    Last edited: May 20, 2012
  9. May 20, 2012 #8
    I'm not finding the original paper and it won't let me in.

    I did find
    which shows
    W = W0 e^(A/a (1-epsilon^(a t)))

    And they say
    "The initial weight was fixed at Wo = 0.5 gm, and the parameters A and alpha were machine-calculated from the original growth data by a least-squares fitting method. The computed values were A = 0.077 day^-1 and = 0.0135 day^-1"

    Is that "epsilon" in that supposed to be another "e" or is that a different variable?

    So "calculated by least squares" says we might be able use Mathematica's Fit but I do not see a way to force that into the format demanded by Fit.

    If I assume the epsilon is really another "e" that was eaten by desktop publishing then I do this

    In[37]:= s=Plus@@Map[((1/2)E^((A/a)*(1-E^(-a*First[#])))-Last[#])^2&,z]

    In[38]:= FindMinimum[s,{A,.07},{a,.01}]

    Out[38]= {782.496,{A->0.0740722,a->0.0126738}}

    That doesn't exactly match their quoted parameters, but I cannot see their raw data, however it isn't wildly different.

    In[39]:= g1=ListPlot[z]
    In[42]:= g2=Plot[(1/2)E^((A/a)*(1-E^(-a*t)))/. {A->0.07407221767712636`, a->0.012673801510563793`}, {t,0,200}]
    In[43]:= Show[g1,g2]
    Last edited: May 20, 2012
  10. May 20, 2012 #9
    That epsilon should certainly be an e, yes. It is strange that it will not let you in, as I am not on a university network, I'll send you the link in a private message.

    What you are quoting however, is indeed the summary of the paper. It is interesting to see, because the formula they quote is the gompertz equation that I was thinking of, but later they mention the formula that I copied earlier. Maybe they accidentally left out the first e in that formula? That would make 'sense', as it seems like a very strange modification to just leave the exponential out, looking at the original formula found on the Gomperz wikipedia page.

    The raw data is in the private message I sent you; the reason the parameters are slightly different might be/is because I didn't know how to add the uncertainties of the data into the datafile (they all had a certain +- value).
    You've already been a great help, but if you could also tell me how that is done, I think that that might actually be it.

    Edit: Using the inputs from your previous post(s), I do indeed get a formula for a line that seems to fit the datapoints really well!
    Last edited: May 20, 2012
  11. May 20, 2012 #10
    On +/-, if they are being very careful about their claims and the use of least squares then the variance should not change with t and the distributions should be symmetric. If that is true then they should be quoting means for each data point or or something to that effect. If all that is true then fitting to the means should be as good as you are going to get. If any of this is not true then you are going towards the deep end of the pool.

    On desktop publishing, it is so easy for little errors to be included and often people don't spend the effort to make certain that there are no errors introduced at any point in the chain.

    On the quality of the fit, I routinely overlay those plots just to try and flush out any obviously silly mistakes.

    So I think you have a good start. Use what you can from our exchange and see if you can get where you need to go. If you get stuck then put up another post and I'll see if I can give you the hint you need.

    I hope it works

    One other caution, Note that I just used FindMinimum on the square of the error between your prediction and your actual. I didn't differentiate with respect to each of your parameters, etc, etc, etc, the usual old school method of least squares. I suspect that might be the correct way of doing this via least squares. What I did was a quick and dirty method of trying to get minimize the squared difference between the model and the data to see if we were anywhere near correct. I can't see how to directly drop your model into Fit, which is Mathematica's least squares function. But often it takes me an afternoon of working on something else before I suddenly realize what I should have done. For all this to be carefully correct you should spend much more time thinking about this to make certain that I have not violated something deep and important.
    Last edited: May 20, 2012
  12. May 20, 2012 #11
    The plots do overlay rather nicely. The only thing that is bothering me is that the shape of the graphs is completely different than the shapes in the paper, but the datapoints do align. I don't know what they did there, maybe it's because it is an old paper (1960's) and they used different methods. Maybe not. I'll try importing the dataset again and look for any changes.

    They are indeed quoting means for each datapoint, providing the standard deviations along with them. But taking what you say into account, it does indeed seem unnecessary to include the standard deviations in the model itself. Maybe I'll add in some extra plots with the standard deviations added and substracted, but that should be no problem.

    All in all you have have given me more than a good start, and thank you once again!

    I just had one very small question, in the FindMinimum you use minimum values for A and a, did you 'guess' those by looking at what the parameters were according to the paper? And if so, could you recommend an educated guess when the parameters are still unknown? Just take the smallest values that don't give an overflow error?
  13. May 20, 2012 #12
    I have not studied the paper. That would take me time and I was trying to get you up to speed as quickly as possible.

    On the starting points for A and a you are correct, I picked values not far from their quoted values because I didn't spend the time studying your equation to tell whether there would only be one minimum or whether there might be several local minima which might confuse FindMinimum. For problems where you don't know what the answer might likely be I would do as I previously recommended, carefully check that all the required assumptions of least squares regression are satisfied, do all the work to develop those calculations and verify that for your problem there will be exactly one global minimum. (And then because I don't trust anything, let alone me or my code, I might start from a few hundred or even a few tens of thousands of "plausible" starting points and see if they all end up converging on approximately the same solution. What is "plausible" I would eyeball from a plot of the data and from a few experimental calculations.
  14. May 20, 2012 #13
    Thank you, that answers my question well. I didn't mean to ask of you to read the paper, you have been of great value and I am sure that I can continue to work with this for quite a while now.

    Edit: I have also 'solved' the different shape issue; I didn't see that they used a logarithmic scale. My bad.
    Last edited: May 20, 2012
Share this great discussion with others via Reddit, Google+, Twitter, or Facebook