jaderberg
- 28
- 0
I am trying to work out the most efficient way of updating the mean and standard deviation of a 1 dimensional set of data. The data points change frequently and by a small amount each time, but I do not want to do a complete recalculation of the mean and sd after each change, as this is computationally expensive on a big data set!
Instead I am trying to just update the mean and sd, rather than fully recalculate it. I can do that for one change, but I need to be able to batch changes together and update the mean and sd approximately.
E.g.
data: 2,3,3,3,5,6,1,7 with mean1 and sd1
changes to: 2,4,3,3,4,6,1,7 (two changes of 3->4 and 5->4)
how would i use the existing values mean1, sd1, and the old and new values to update the mean and sd of the set?
I can do this for one change (i.e. mean2=mean1 + (new_val-old_val)/N and similarly for sd2) but how would i do it for multiple changes?
Instead I am trying to just update the mean and sd, rather than fully recalculate it. I can do that for one change, but I need to be able to batch changes together and update the mean and sd approximately.
E.g.
data: 2,3,3,3,5,6,1,7 with mean1 and sd1
changes to: 2,4,3,3,4,6,1,7 (two changes of 3->4 and 5->4)
how would i use the existing values mean1, sd1, and the old and new values to update the mean and sd of the set?
I can do this for one change (i.e. mean2=mean1 + (new_val-old_val)/N and similarly for sd2) but how would i do it for multiple changes?
Last edited: