# Weighting calculation to convert weather data from 6 stations into one

1. Jun 24, 2013

### mdhastings

I currently have hard-coded in my forecasting model, 6 weightings (totaling 100%) for 6 weather stations and wish to determine a methodology to produce these weighting or conversion factors % to form an artificial single weather station. This is part of forecasting the electricity load in my city:- the data from 6 weather stations (observations of temperature (C), dew point (C) and rel. humidity (%)) is then weighted by the specific weightings and used further in the load's model equation.

The request is for a conversion factor methodology that must capture the relevance of any of the 6 weather stations to the overall load. It must somehow combine temperature and dew point (since rel. humidity is equivalent here) within the weighting. To this end I have applied various regressions of electricity load against these station datasets (say temperature) without success and believe I need to scale or otherwise change my thinking.

Sample data and current weightings in text file.

2. Jun 24, 2013

### D H

Staff Emeritus
Why are you using temperature, dew point, and relative humidity? You might well get better results if you use but two of them, e.g., temperature and humidity. Temperature, dew point, and relative humidity are related, and for relative humidity > 50% the relationship is close to linear. See http://journals.ametsoc.org/doi/pdf/10.1175/BAMS-86-2-225.

Throwing correlated independent variables at a regression model is not a good idea. Correlations between your independent variables mean those variables aren't independent. It is a downright bad idea if the independent variables are linearly related to one another.

3. Jun 24, 2013

### D H

Staff Emeritus
You might want to add percent cloud cover during daytime hours to your model, at least during summertime. My AC runs a good deal more on sunny days than on cloudy ones.

4. Jun 24, 2013

### mdhastings

Thanks D.H.
Your points on dew point and rel.humidity are well made and in the second part of my question ("It must somehow combine temperature and dew point (since rel. humidity is equivalent here) within the weighting.")

I have not explained about the load modelling because quite frankly our market is not real time and this adds significant restriction. We actually incorporate the artificially weighted weather station with Fourier series in our load model. Anyway the specific need is to determine a methodology of weighting the 6 weather stations to provide impact with the observed electricity load.

5. Jun 24, 2013

### Stephen Tashi

Empircally, is the load an approximately linear function of the weather variables? It would be best if you can answer this "emprically" in the sense of examining actual data, but it would also be of interest to know if the model results are.

As I visualize the situation, you don't have the option to rewrite the current model and the model accepts only 1 set of weather variables. That doesn't imply a restriction that the each variable that you input must be a linear function of the measurements from the 6 weather stations, but you may want it to be a "weighted sum" for the sake of simplicity.

6. Jun 24, 2013

### mdhastings

Yes. We model the load as a linear function of the weather variables (amongst others). Can you explain what you want with "It would be best if you can answer this "emprically" in the sense of examining actual data, but it would also be of interest to know if the model results are."

All else you say is good

7. Jun 25, 2013

### Stephen Tashi

I think the problem amounts to fitting a linear regression model with some contraints on the model's coefficients.

To illustrate this, suppose there are only 2 weather stations A and B and they each measure 2 variables.

Station A measures data $(X_A, Y_B]$
Station B measure data $(X_B, Y_B$

The load data is $L$.
There is data $Z$ that doesn't come from the stations

The model has the form $L = K_x X + K_y Y + K_z Z + K$
where the$K$'s are constants.

The variable $X = \lambda_A X_A + \lambda_B X_B$
The variable $Y = \theta_A Y_A + \theta_B Y_B$

With the constraints on the constants:
$\lambda_A + \lambda_B = 1$
$\theta_A + \theta_B = 1$

and possibly the constraints
$\lambda_A = \theta_A, \ \lambda_B = \theta_B$

If we substitute for $X$ and $Y$ in the model, it becomes

$$L = K_x \lambda_A X_A + K_x \lambda_B X_B + K_y \theta_A Y_A + K_y \theta_B Y_B + K_z Z + K$$

This amounts to a linear regression model in 5 variables, $X_A, X_B, Y_A,Y_B,Z$

$$L = C_a X_A + C_B X_B + D_a Y_A + D_b Y_B + K_z Z + K$$

If we fit such a model to data by least squares, without any constraints on the coefficients, we can find the constants in that model. However, then we need to express them as other constants that satisfy the equations

$C_A = K_x \lambda_A$
$C_B = K_x \lambda_B$
$D_A = K_y \theta_A$
$D_B = K_y \theta_B$
$\lambda_A + \lambda_B = 1$
$\theta_A + \theta_B = 1$

So the question is whether the above equations have solutions for the several unknowns. If I have understood the problem correctly, we can think about that.

It would be easier to solve the equations if you drop on the last two constraints. The term "weighted average" has a comforting sound, but I see no reason why those constraints make the model more reliable.

8. Jun 27, 2013

### mdhastings

Thanks Stephen,

You have provided what I wanted exactly. So how do we work out the lambda coefficients? I believe bot restrictions are necessary

9. Jun 27, 2013

### Stephen Tashi

If you want constraints such as $\lambda_A + \lambda_B = 1$ and you know the values of the $K$'s, you need to solve the least squares problem involving the $C$'s with constraints instead of solving it by simple least squares fitting.

For example
$\lambda_A = C_A/K_x$
$\lambda_B = C_B/K_x$
So $\lambda_A + \lambda_B = 1$ is equivalent to $C_A/K_x + C_B/K_x = 1$ which is a linear equality constraint of the coefficents $C_A,C_B$.

Not knowing the details of how to do regression with constraints , I searched the web using the phrase "linear regression constraints coefficients" and found this PDF of slides http://folk.uio.no/inf9540/CLS.pdf that (supposedly) explains how to solve such a problem (see page 19). It uses matrix notation, which I suppose we can interpret eventually.

Apparently there are also computer packages that solve problems of linear regression with constraint. How are you going to do the computer work on it?

10. Jun 28, 2013

### mdhastings

Stephen, Again thanks for your advice. I think I have simplified the process. Since the coefficient is the same for both temp and Dew Point of each station, I can combine the data (scaleable??) and then run a regression as per your L=C[XA+XB]+D[YA+YB]+KzZ+K above. The C and D coefficients must add to 1 so if I divide both sides of the equation through by [C+D], I get the weighting out of 1. What do you think?

11. Jun 28, 2013

### Stephen Tashi

Do you mean the coefficient $K$'s in the original model?

My equation gives XA and XB possibly different coefficients.

I thought the idea was to weight the measurements of the same quantity from different weather stations differently. So we want to weight XA and XB differently - if we are using "X" to represent the physical quantity and the "A" and "B" to denote the two different weather stations.

12. Jun 30, 2013

### mdhastings

Sorry, Stephen, I remarked in my opening that "It must somehow combine temperature and dew point (since rel. humidity is equivalent here) within the weighting." I wasn't sure they could be combined within the methodology, so left the thought there. I am still unsure if you can simply add the two numbers (say 21.5C and 10.0C) and use regression as we are doing. I thought there might be some scaling to consider. But clearly that is what the current modelling uses. This, my first go, is really to get the methodology right.

My previous comment meant to combine data of the station (T(C) and DP(C)) as I have said above. I rushed your equation into my comment. My bad.

Hence now correct to L=C[XA+YA]+D[XB+YB]+KzZ+K. [For some reason I keep misinterpreting you notation]

Your first Q. This means it is not the K's. It is simpler. The above obviously refers to your constraints λA=θA, λB=θB.
Now the final constraint is that C and D (and other 4 stations) add to 1.

To find C and D (and other 4 stations) .
I have to use these 6 stations with the parent modelling (i.e. from where I get the Z) to obtain C and D (and other 4 stations) and this has some peculiarity. In my forecasting I need to reference the artificial station above which then produces other terms used in the parent of Z. Would you be able to assure me that if instead of parent of Z, I substitute out of it all these other terms and replace with the 6 weather stations this will give C and D etc.

For example the simplified model Z looks like (in R code)
(wt1+wt2+wd1+wd2)*(sd1+cd1+sd2+cd2+sd3+cd3+sd4+cd4)+
(wt1+wt2+wd1+wd2)*(sy1+cy1)+(wt1+wt2)*(ph1+ph2)+
etc....,data=dataframe, na.action = na.exclude)

where wt1, wt2, wd1 and wd2 is constructed from the artificial weather station
e.g wd1 <- pmax(wd1-17,0)

and replace with L=C[XA+YA]+D[XB+YB]...H[XF+YF]+KzZ+K to look like

([XA+YA]+[XB+YB]+...+[XF+YF])*(sd1+cd1+sd2+cd2+sd3+cd3+sd4+cd4)+
([XA+YA]+[XB+YB]+...+[XF+YF])*(sy1+cy1)+(wt1+wt2)*(ph1+ph2) +
etc....,data=dataframe, na.action = na.exclude)

where the sd1 etc and cd1 etc are Fourier terms creating interactions with the 6 weather stations

Or go one further and remove all interactions and just use the 6 weather stations.
Which then this matches the L=C[XA+YA]+D[XB+YB]...H[XF+YF]+KzZ+K
[XA+YA]+[XB+YB]+...+[XF+YF]+
etc....,data=dataframe, na.action = na.exclude)
For simplicity I like the last one - does it work?

My complete thanks for your support on this Stephen. Hope you can help further.

13. Jul 1, 2013

### Stephen Tashi

I can't understand questions about 6 weather stations unles they are posed precisely. The simplest way to do that will be to use appropriate notation..

Designate the N weather stations whose measurements are to be somehow weighted, by indexes $"1","2","3"..."N"$ instead of $"A","B","C",..$.

Use the notation $X[j]$ to be the $j$-th type of measurement taken at the $i$th weather station. The types of measurements are indexed by $j = 1,2,..M$.

I don't know about the wisdom of combining two types of measurement into a single number. I think that debate is a matter of physics, not pure math. I'm assuming that the measurements "types" are the final set of numbers, after you have done all the combining that's going to be done.

Let the variables representing $S$ other measurements, not in the above list be $Z[1],Z[2]...Z$.

Let the unconstrained regression model be

$L_u =\sum_{j=1}^M \sum_{i=1}^N C[j] X[j] + \sum_{i=1}^S A Z_i + P$

where the $C[j], A, P$ are constants.

There are at least two interpetations of what it means to weight the data from weather stations.

On interpretation is that you must assign a set of non-negative weights $w[1],w[2]...w[N]$ with the constraint $\sum_{i=1}^N w = 1$. i.e. one weight value per weather station.

Another interpretation is that you may assign a set of non-negative weights $w[j][j] , j = 1,2..M, i = 1,2,..N$ , with the constrain that $\sum_{i=1}^N w[j] = 1$ for each $j = 1,2,...M$ i.e. that you can have one weight per each type of measurement and each weather station.

It is unclear to me what the situation is with the company's current model. I'll guess it is of the form

$$L_c = \sum_{j=1}^M K[m] Y[m] + \sum_{i=1}^S B Z + Q$$

where $Y[j]$ is a (single) value for a measurement of type $j$ and $K[m],B,Q$ are constants.

I don't know if you can look into the code and data for this model and read the specific values of the constants ( for example, determine that $K[3] = 29.85677$) or whether you can't do things like that.

Can you clarify the above ambiguities and pose your questions in the framework of the notation or suggest a different notation? (I don't care if you use the forums LaTex. It's interesting to learn, but that can be a big distraction.)

I might be able to read R-code with documentation - a dictionary of the variables.

14. Jul 1, 2013

### mdhastings

Stephen I did some Latex 21 years ago, so latex it is and please ignore the R code.

Yes the idea was to work backwards to find a methodology to find a weighting for each station. It may be that we should not combine the two sets of numbers (Temp and Dew point). Maybe we run each alone and simply average the two sets of coefficients. So (cause I get notation confusion) let me ask explicitly about the Temp set of numbers: I want $Y[m]$ to represent the temp measurement of each station, thus $j = 1$ then this:

$$L_c = \sum_{i=1}^M K[m] Y[m] + \sum_{i=1}^S B Z + Q$$

where $Y[j]$ is a (single) value for a temp $j$ measurement of station $i$ and $K[m],B,Q$ are constants.

When I run this I get $K[m]$ coefficients of 10.20,-10.63, 5.75, 9.53,4.78,11.74 for station temperatures 1 to 6 but firstly these do not align to the current weightings and secondly no matter what I do I always get a negative value and do not know how I build the weighting so they total 1.

Please let me know how this aligns with your comments. These last two difficulties are one big headache and that is why I tried to combine temp with either dew point or rel. humid.

Now if I turn my attention to explaining the company's current load modelling (i.e. forecasting the load) :

The regression model would be

$L_u =\sum_{i=1}^S A Z_i + P$

where the $A\ and \ P$ are constants.

In my modelling the only $Z[1],Z[2]...Z$ terms that are actual observable data is from the artificial weather station (which I am seeking to build from 6 stations). All up, I have 700+ interactive terms to help forecast a very true curve. Most if not all are interactions with Fourier terms (sd1,cd1 etc are sin(daily1), cos(daily1)..or yearly terms sy1, cy1 etc) and these construct the load curve. (The daily shape of the electricity load is like a sine wave). I did upload a sample of data but with no comments I think it did not do as I expected.

When in my previous answer I showed (wt1+wt2+wd1+wd2)*sd1+cd1+sd2+cd2+sd3+cd3+sd4+cd4) I implied that this artificial weather station is interacted with the Fourier terms such that
A[1](wt1*sd1) + A[2](wt1*cd1) + A[3](wt1*sd2) + ..... + A(wd2*sd4) for 32 of 700+ terms.
Where the $A\$ are constants and the wt1*sd1 is a new interactive term made by multiplication.

Hence unlike your company's model there isn't a weather term standing alone - thus no the specific values of the constants - they are all interactions.

15. Jul 2, 2013

### Stephen Tashi

(I don't mind code if it is documented.)

Perhaps some forum member who is an expert on linear regressions knows how to do two regressions separately and combine the results, but I don't. As far as I know you can't average two least squares regressions that predict the same variable $L$ with different sets of variables and claim the average is a least squares regression.

Ok, I understand that that $K[2]$ is a negative value. When you say you "run this", I assume this means you use data you have to do a least squares fit to the measurements. Is that correct? I don't understand what "the current weightings" are.

My thought is that you would have to do a linear regression "with contraints on the coefficients". This is a known method (but not well known to me!). We would have to find software to do this or find a detailed explanation of the technique if we want to implement it ourselves. I think this is possible.

I don't understand what "interactive terms" means. Does it mean "non-linear"?

I'm confused by the mention of a model for a curve that uses a discrete fourier series versus the earlier discussion of doing a linear regression.

I'll make a guess at what the model is.

It predicts a curve of electricity usage as:

$$L(t) = C[0] + \sum_{i=1}^{700+} C\ \cos(\omega t)$$

where the $\omega$ are constants and the $C= C(....)$ are functions (possibly non-linear) of the observable data, including the weather data.

From the model for $L(t)$ you can compute the predicted mean daily load for each day $L = \frac{1}{b-a}\int_{a}^{b} t\ L(t) dt$ where the $i$th day begins at time $a$ and ends at time $b$.

I don't know if you also have actual measured mean daily load data for the days.

The input data to this model does not have a variable for a given type of measurement (e.g. mean daily temperature) from N weather stations. It only has 1 variable representing mean daily temperature, say , $Z[1]$. You wish to find non-negative weights $w[1]$ that sum to one and you wish to set $Z[1] = \sum_{i=1}^N w[1][i] X[1][i].$

The problem of finding the optimal set of weights $w[1][i]$ to fit the model's predicted mean load to observed daily mean load data is not a problem of linear regression. It is a problem of non-linear regression. Are we assuming the model is adequately approximated by a linear function?[/i][/i][/i]

16. Jul 2, 2013

### mdhastings

Stephen,
Perhaps some background... you are been very patient with my explanations ..Thanks
The company's electricity load model is designed to provide a load forecast for just 1 day (but made up of 48 1/2 hour intervals that need to be forecast) and the shape of this day's load curve is like a sine wave. All data used in this modelling needs to be in the same 1/2 hour intervals We have years of these types of data. Apart from the load and weather data we make the other terms up. Thus we prepare sin and cos series that range evenly between -1 and 1 for daily and yearly terms. We have 10 sine daily terms and 10 cos daily terms ((labelled sd1, ...,sd10 and cd1, ..., cd10) capturing slightly offsetting day sized waves. The yearly terms (sy/cy) are set the same way but provide 8 sine and 8 cos offset curves over a year ranging between -1 and 1. We also have "day of the week" and public holiday dummy variables.

The six weather station's data is formed into the artificial station by hard-coded current weightings. These weights were provided 7 years ago and my only aim is to work out a methodology to up-date these. To be honest nothing else matters to me.

We refer to interactions in the regression model as combinations of these terms (E.g. a sd1*sy1 combines the two datum by multiplication and is now a new term in the model). In all cases the 700+ interactions take various combinations of the above. This complicates the finding of a methodology

The following you gave should include interations
$$L(t) = C[0] + \sum_{i=1}^{700+} C\ \cos(\omega t)$$

like
$$\cos(\omega t). \sin(\omega t) \ or$$
$\cos(\omega t). dow1 \ or \ even$
$wt1. \sin(\omega t). dow1. \sin(\lambda t)$

where wt1 is a temperature, omega is daily, dow1 represents Monday and lambda is year

But these are now very difficult to add in.

17. Jul 2, 2013

### Stephen Tashi

I have a better understanding of the complexity of the company's model now. I think you use the word "interactions" of variables to mean products (in the sense of multiplications) or, more generally "products of functions of the variables".

Since the company's model is not a linear regression, you can't expect to find the best weights to use in the company's model by finding the best weights to use in a linear regression model. I understand that finding the best weights to use in a linear regression may provide some hint about the best weights to use in the company's model. However, the most reliable solution would be to use the non-linear model itself. Another possibility is to approximate the company's model by a non-linear function that is simpler than the company's model.

If it takes a long time to run the company's model, it may not be practical to use the company's model to determine the best weights. If the model runs quickly, I think (in theory) you should approach the problem as the scenario of minimizing a non-linear function (= mean square error of forecast) with respect to a set of variables (the weights) subject to some given contraints on the variables ( - that the weights are non-negative and sum to 1). There are various numerical methods for doing this. They amount to systematic forms of triial-and-error but they produce practical results.

Have I understood the situation?

18. Jul 2, 2013

### mdhastings

Stephen,

In my econometrics course we talk about linear in respect to the parameters (oefficients). Hence this is a linear model - we use the lm (linear model) function in R to solve and it takes about 4 minutes to take database input and produce a forecast in a csv output. One of the difficulties is understanding the meaning in using sin and cos terms - but clearly they just build the shape with interactions with day of the week (major component) and artificial weather station. The interaction term wt1. $\sin(\omega t). dow1. \sin(\lambda t)$ whilst complicated in meaning is still collecting a specific variation in the load.

The model has its problems but under our market rules we probably wouldn't be able to do better. We must forecast with a weather forecast that must be 24 hours old. That is we run the model daily and it generates the forecast for the same time tomorrow plus 48 intervals.

In a sense that is why the weights hard-coded into the program need to be changed - our city has grown.

19. Jul 2, 2013

### Staff: Mentor

I've been in utilities for a long time - too long probably - I understand our company models and how we forecast consumption.

We use wind, temperature, insolation, humidity and all kinds of consumption and transmission data/history to forecast requirements, which we integrate with nominations (gas) for our transportation customers. We have way more weather station reading sets than you appear to have. All this matters naught.

I have stayed away because your answers are not. They are sort of indirect descriptions of what you think Stephen needs. It would be fun to help if I had a prayer of understanding what you want.

Take one of Stephen's questions. Provide a direct answer. You appear to have done that above: One run generates 48 interval estimates. I'm staying away until I can understand.

Your model output cannot be solely based on weather, you have to have historical consumption data. Unless you are solely employing degree days and using some company factor. But that will not deal with load forecasting. That depends entirely on historical data vs current estimates.

US degree days == A degree day is computed as the integral of a function of time that generally varies from an arbitrary temperature base like 20 degrees C. ...Whatever that mensuration method is called in Great Britain, or wherever you are and using British English. Most of the EU has degree day maps and zones. That I have seen anyway.

Plus, working in this field I've never encountered the constraints you mention.

Pardon this comment if it is out bounds --- It sounds like your boss is pretending he needs to be sure you do not try to think. Are these completely regulatory constraints? If they are, then your regulators are worse than ours. And two of them went to jail in the past two years. (New Mexico, USA and not proud of our Public Regulatory Commission)

Are you private, IOU, Municipal (Gov't owned), or some kind of consumer owned cooperative?

I am giving you these questions to see what direct answers, if any, I get back.

20. Jul 2, 2013

### mdhastings

Thanks Jim,

The way we forecast is very different to most since we are not real time - we have to forecast a day ahead. We are a Government retailer that needs to buy it's energy under unusual market regulations (delay (weeks) in receipt of load data).

This is an in-house program that was produced by a mathematician who is no longer part of our of us. We create an equation using R code which can be written like this

$$L(t) = C[0] + \sum_{i=1}^{700+} C\ Z$$

where the C are coefficients/parameters and the Z[1],Z[2]...Z terms can be made up as interaction terms e.g. $\cos(\omega t). \sin(\omega t) \$ .
The only data we have is load (dependent variable), and independent variables: weather (T(C), Dew Pt(C) and Rel. humid(%)), weekdays and public holidays. Once we have the load equation we use the weather forecast and run a predict function on the equation. This econometric modeling is different to most but the errors are reasonable for our purposes.

For the linear Q I have referenced Greene's "Econometric Analysis" 4th Ed. He states on p327 [referring to interaction terms (p326)] , "Despite their complex functional forms, these models are intrinsically linear.... a distinguishing feature of the linear model is not the relationship among the variables as such but the way the parameters enter the equation". I have to admit I am yet to understand this.

Unfortunately the difficulties with the methodology I'm seeking are mine alone. This is in the sense that after 6 years running with the weights hard-coded into the program and with the changing demographics of our city I feel an update requires understanding of how they were derived. I thought I could work backwards since the weight for a given station applies to all measures (T(C), Dew Pt(C) and Rel. humid(%), so the constraints are based upon that.

Most of what Stephen and I have covered has focused on getting the weights through the modelling we have described below. The trouble is how to work a way through the sin and cos interaction terms in the Z's where for example I always get negative coefficients on some stations temp terms where the restriction $\ \sum_{i=1}^{6} A\ = 1$ should apply.

$$L(t) = C[0]\ + \ \sum_{i=1}^{6} A\ X + C\ Z$$
with Station A etc. measuring data $\ (X_A, Y_B, H_C)\$ and X, Y and H are temp, Dew Pt and Rel. humid - though we only use 1 of the last 2.

or if we combine the Temp and Dew Pt (or Rel. H) as discussed previously with same restriction

$$L(t) = C[0]\ + \ \sum_{i=1}^{6} A\ (X+Y) + C\ Z$$

Q is are we on the right tract - what other ways can we orchestrate the stations weights relative to load?

Thanks for been involved. Again hope all this helps