I Filling in Missing Values in a string of data

Brian82784
Messages
19
Reaction score
0
TL;DR Summary
Finding a missing value for trending and analysis
hello,

I’m trying to figure out if I’m doing this correctly or if there’s a different way that I should be finding a missing value. I’m trending data for an automatic transformer. Every month I collect the operations counter value and at the end of the year sum the number of tap changes. Sometimes I can’t get to the transformer so there are months where I don’t Collect the count.

For example:

Operations count
Jan = 654104
Feb = Missing
Mar = 659176

To find approximate missing value take 654104 + 659176 = 1313280 and dividing 1313280/2 = 656640

Now it would look like
Jan = 654104
Feb = 656640
Mar = 659176

I will get into a bit more down below.

Operations Counter

Jan = 654104
Feb = missing (populate with calculated 656640)
Mar = 659176
April= 661476
May= 664376
June= missing
July = missing
August = 671976
September= 675076

I then find the total changes month to month

Total tap Changes
Jan - Feb = 2536
Feb- March = 2536
March-April= 2300
April-May= 2900

After I have a years worth of data I sum the total changes. To see how much it operated in a year. So above from Jan - May would be a total of 10,272 operations (I know it’s not. Full year just showing an example).
 
Mathematics news on Phys.org
It's not clear what your question is. Are you asking if there is a better way to estimate the missing values versus simple interpolation?
 
berkeman said:
It's not clear what your question is. Are you asking if there is a better way to estimate the missing values versus simple interpolation?
Just wondering if this is the best/right way or if I’m completely wrong.
 
What you did to estimate the missing Feb data is called linear interpolation, which amounts to getting the average of the two surrounding months. I'm not sure what you did to find the missing data for June and July.

Given that the counts are increasing month by month, another option would be to fit the data points to a curve rather than a straight line. In this case a second-degree polynomial (a parabola) might be a good choice. That is, the curve whose equation is ##y = ax^2 + bx + c## that minimizes the vertical distance between points on the curve and your data.
 
  • Like
Likes Vanadium 50 and FactChecker
I would be inclined to keep it simple (linear interpolation) unless there was a significant reason to get more complicated (higher-degree polynomial). The linear interpolation would also be safer if there is missing data at either end. That would require extrapolation of the known data, which can give more extreme estimates from a higher-degree approximation.
 
Mark44 said:
What you did to estimate the missing Feb data is called linear interpolation, which amounts to getting the average of the two surrounding months. I'm not sure what you did to find the missing data for June and July.

Given that the counts are increasing month by month, another option would be to fit the data points to a curve rather than a straight line. In this case a second-degree polynomial (a parabola) might be a good choice. That is, the curve whose equation is ##y = ax^2 + bx + c## that minimizes the vertical distance between points on the curve and your data.

I’m not sure how to calculate that could you show me an example?
 
What I described is called quadratic regression. Here's an article that shows how to do this in Excel - https://www.statology.org/quadratic-regression-excel/. You need to have Analysis Toolpack installed to do this (it's a free download). If you have some other spreadsheet, there likely is an equivalent set of tools available.
 
You need to decide how to attribute the total change over February and March to those two months. The simplest way is to apportion it equally, which is what you did. Alternatively, you can apportion them according to the number of (work)days in each month.

A more accurate approach is to apportion operations to months according to the actual dates of the meter readings, if you have that information.

Regardless of the method, you need to ensure that people who use this data can find out (1) which figures were exact and which had to be estimated, and (2) how the estimation was done.
 
Back
Top