Linear regression with two data sets?

Josh Terrill · May 9, 2016

I want to try to predict the USA summer highs using a linear regression. I know I can probably take data from the last 10 summers and plug that in, and use that to predict, but I'd like to use two data sources. 1 data source from the historical highs from past summers in the USA, and the 2nd data source from the historical highs from past summers from another country like Australia who has opposite seasons from us in the USA. Is it possible to do a linear regression from two data sources, and use both of them to predict a number?

Do you think this is a more accurate way of doing this? Or is it just as accurate as using one data source like historical USA highs.

Svein · May 9, 2016

Linear regression will give you a number - but that number will have little or nothing to to with what you are looking for. Natural temperature variations (day/night, summer/winter) are cyclic and extremes are more or less random.

Combining temperatures in Australia with temperatures in the USA is meaningless. Think about it - northern Australia is quite close to the equator, a large part of central Australia is desert, etc...

chiro · May 13, 2016

Hey Josh Terrill.

You can do it - but you have to specify the regression model to combine them.

The simplest way to combine information is a weighted sum where you have w1*x1 + w2*x2 where w1 + w2 = 1 but there are many ways to combine information (based on the different kinds of functions you can think of).

You will have to have an understanding of the differences and context between the different data sets and how they standardize against each other with respect to the variable you are making inferences on.

This will probably mean adjusting for things like season, and other geographical factors.

Without any context or domain knowledge for your data, an extended response is not possible.

Stephen Tashi · May 14, 2016

Josh Terrill said:

I want to try to predict the USA summer highs using a linear regression.

You should explain clearly what you want to do. You use the plural "highs". This suggests there is some aspect of time involved. For example, perhaps you are tying to predict the maximum temperature on each day of the summer. You say "USA". It isn't clear whether you mean to get a single high temperature for the entire USA or whether you are interested in one particular city - or perhaps you want to predict the daily max temperature for each major city in the US.

When you speak of "using a linear regression" this suggests using a single model that consists of a single linear equation. However, perhaps you'd also consider using a different model for different situation. For example, you might use an equation to predict the high temperature in Greensboro NC on June 12 based on historical high temperatures for other cities and use a different equation to predict the high temperature for a different date or for a different US city.

Is it possible to do a linear regression from two data sources, and use both of them to predict a number?

It may be mathematically possible, depending on what the data and the equation actually are.

A more general question is whether we can increase the reliability of predictions by using information that seems, at first sight, to be irrelevant or only indirectly relevant to what is being predicted. A lot has been written about this problem, but I can't summarize it as a simple set of instructions.

Svein · May 14, 2016

I have official temperature data for a long period (6 times a day for one year) and you cannot use linear regression for anything. The best predictor I found was to take the Fourier transform of the data and throw out the higher frequencies. I could then transform back to the time domain.

Be aware that meteorologists use large computers, a large network of weather stations and satellite images - and the still have problems predicting the weather one week ahead.

Linear regression with two data sets?

Discussion Overview

Discussion Character

Main Points Raised

Areas of Agreement / Disagreement

Contextual Notes

Similar threads

Graduate Hypothesis testing: Defining H0, HA hypotheses so that ( H_A)_A' makes sense

Undergrad My basic understanding of set theory

Undergrad The problem of points

Graduate Expected numbers of cards of a last color remaining

Undergrad How does axiom of foundation prevent infinite sequence of elements?

Insights Revisiting the Velocity-Time Function

Insights Remote Operated Gate Control System

Insights AI Enriched Problem Solving

Insights Thinking Outside The Box Versus Knowing What’s In The Box

Insights Why Entangled Photon-Polarization Qubits Violate Bell’s Inequality

Insights Quantum Entanglement is a Kinematic Fact, not a Dynamical Effect