Filling in Missing Values in a string of data

Brian82784 · Aug 25, 2022

hello,

I’m trying to figure out if I’m doing this correctly or if there’s a different way that I should be finding a missing value. I’m trending data for an automatic transformer. Every month I collect the operations counter value and at the end of the year sum the number of tap changes. Sometimes I can’t get to the transformer so there are months where I don’t Collect the count.

For example:

Operations count
Jan = 654104
Feb = Missing
Mar = 659176

To find approximate missing value take 654104 + 659176 = 1313280 and dividing 1313280/2 = 656640

Now it would look like
Jan = 654104
Feb = 656640
Mar = 659176

I will get into a bit more down below.

Operations Counter

Jan = 654104
Feb = missing (populate with calculated 656640)
Mar = 659176
April= 661476
May= 664376
June= missing
July = missing
August = 671976
September= 675076

I then find the total changes month to month

Total tap Changes
Jan - Feb = 2536
Feb- March = 2536
March-April= 2300
April-May= 2900

After I have a years worth of data I sum the total changes. To see how much it operated in a year. So above from Jan - May would be a total of 10,272 operations (I know it’s not. Full year just showing an example).

berkeman · Aug 25, 2022

It's not clear what your question is. Are you asking if there is a better way to estimate the missing values versus simple interpolation?

Brian82784 · Aug 25, 2022

berkeman said:

It's not clear what your question is. Are you asking if there is a better way to estimate the missing values versus simple interpolation?

Just wondering if this is the best/right way or if I’m completely wrong.

Mark44 · Aug 25, 2022

What you did to estimate the missing Feb data is called linear interpolation, which amounts to getting the average of the two surrounding months. I'm not sure what you did to find the missing data for June and July.

Given that the counts are increasing month by month, another option would be to fit the data points to a curve rather than a straight line. In this case a second-degree polynomial (a parabola) might be a good choice. That is, the curve whose equation is ##y = ax^2 + bx + c## that minimizes the vertical distance between points on the curve and your data.

FactChecker · Aug 25, 2022

I would be inclined to keep it simple (linear interpolation) unless there was a significant reason to get more complicated (higher-degree polynomial). The linear interpolation would also be safer if there is missing data at either end. That would require extrapolation of the known data, which can give more extreme estimates from a higher-degree approximation.

Brian82784 · Aug 25, 2022

Mark44 said:

What you did to estimate the missing Feb data is called linear interpolation, which amounts to getting the average of the two surrounding months. I'm not sure what you did to find the missing data for June and July.

Given that the counts are increasing month by month, another option would be to fit the data points to a curve rather than a straight line. In this case a second-degree polynomial (a parabola) might be a good choice. That is, the curve whose equation is ##y = ax^2 + bx + c## that minimizes the vertical distance between points on the curve and your data.

I’m not sure how to calculate that could you show me an example?

Mark44 · Aug 25, 2022

What I described is called quadratic regression. Here's an article that shows how to do this in Excel - https://www.statology.org/quadratic-regression-excel/. You need to have Analysis Toolpack installed to do this (it's a free download). If you have some other spreadsheet, there likely is an equivalent set of tools available.

pasmith · Aug 25, 2022

You need to decide how to attribute the total change over February and March to those two months. The simplest way is to apportion it equally, which is what you did. Alternatively, you can apportion them according to the number of (work)days in each month.

A more accurate approach is to apportion operations to months according to the actual dates of the meter readings, if you have that information.

Regardless of the method, you need to ensure that people who use this data can find out (1) which figures were exact and which had to be estimated, and (2) how the estimation was done.

Filling in Missing Values in a string of data

Discussion Overview

Discussion Character

Main Points Raised

Areas of Agreement / Disagreement

Contextual Notes

Similar threads

Undergrad Please Explain (actually explain) The Monty Hall Problem

Undergrad A variant of the Monty Hall problem

Graduate Hypothesis testing: Defining H0, HA hypotheses so that ( H_A)_A' makes sense

Undergrad My basic understanding of set theory

Undergrad How do E[X] and E[|X|] relate?

Insights Remote Operated Gate Control System

Insights AI Enriched Problem Solving

Insights Thinking Outside The Box Versus Knowing What’s In The Box

Insights Why Entangled Photon-Polarization Qubits Violate Bell’s Inequality

Insights Quantum Entanglement is a Kinematic Fact, not a Dynamical Effect

Insights What Exactly is Dirac’s Delta Function? - Insight