# Data analysis

1. Jan 16, 2009

Maybe this counts as computational physics, or statistics I don't know where this topic belongs.

Ok so I have this nice high quality Thermodynamic measurement data and I have to start the "data raping" for results. Let's call the series A(t), and B(t) and I have a few more.
So for different times I have different measurement points. And I am very much interested in $$\frac{\mathrm{d}A}{\mathrm{d}B}$$ A(B) undergoes a phase transition and will show kinks or jumps, and it is measured with high accuracy. B(t) is very smooth, but the measuring equipment introduces an error that is too large for my taste.

What are good methods or standard methods that are used in calculating derivatives. If you go from one point to the next and just divide differences, then the results are dominated by the noise. Then there might be some oversampling going on, where the value has hardly changed from one point to the next.

My ideas are so far:
(By the way I am using R for the evaluaton, so any cool statistics trick you can recommend can probably be found somewhere as a routine.)

1) take points [t-delta,t+delta] of (B(t),A(t)) do a linear fit, take the slope, do that for each point (this means an artificial correlation is introduced between neighboring points, because their linear fit goes through all the same points except for two). Is there a name for this? How do you choose a good value for delta.

2) somehow smooth the data of B(t), to get rid of noise. There is no exact theory that could be fitted. One idea was averaging neighbors. Maybe using the Simpson 3/8. When does the use of polynomials make sense? I know there is something called smoothing with cubic splines for example in gnuplot. Does this really smooth the data, or does it simply make cubic splines that run through all the data points (instead of averaging around them). What is the best way to include a lot of points like 10? Still integrating over a polynomial?

3) This is madness/Sparta/Fourier transforms: Somehow do a Fourier Transform over (A(B)) for example with a FFT over interpolated points, then multiply by $$\omega[\tex] and transforming back. Maybe I could also do something to the high frequency coefficients at the same time to reduce noise. Any tips? How do you make derivatives? 2. Jan 18, 2009 ### mercurial Hi. These are good questions. I encountered a similar problem once of having to estimate derivates from noisy data (for a commercial venture). One issue there was that we did not really know how noisy the data was. Based on the things you've suggested, I can share some of my own experience. First off: are the points [tex]t_i$$ at which you've taken the measurements uniformly spaced? That is, is talking about an interval $$[t-\delta, t+\delta]$$ the same as talking about some finite set $$t_{i_1},\ldots,t_{i_n}$$, where n is always the same?

You mentioned approximating A(B) at each point by linearly interpolating through all points in a $$\delta$$-neighborhood. Why not try higher order polynomials (e.g., quadratic, cubic) too? These might potentially be better for fitting "kinks," as you are trying to locate phase transitions. To fit data with a polynomial of degree d, you need at least d+1 sample points. This is why I asked about the $$\delta$$-neighborhood: you need $$n\geq d+1$$ for each point.

You asked about the cubic splines to "smooth" data. This is just the same as using linear fitting but with a polynomial of degree 3.

I've never used R, so I won't go there. I used Matlab instead. Basically I wrote a script to assemble the normal equations for the interpolation problem at each point. See the following link for terminology:

http://en.wikipedia.org/wiki/Normal_equations

The solution to the normal equations is the vector of coefficients of your polynomial, letting you compute its derivatives using the formula from calculus.

You asked about choosing a good value for $$\delta$$, and if you follow my prescription, you can also worry about choosing a degree d. This is where the art enters. In the problem that I was working on, I noticed that, taking better and better -- or "finer" -- interpolations, there is a certain threshold where the computed results don't change much. For instance, there is essentially no difference between using polynomials of degree 5, 6, 10, etc. I then chose the degree and $$\delta$$ that were minimal within this threshold.

3. Jan 18, 2009