Linear regression with two data sets?

Click For Summary

Discussion Overview

The discussion revolves around the feasibility and methodology of using linear regression to predict summer high temperatures in the USA by incorporating data from two different sources: historical highs from the USA and Australia. Participants explore the implications of using multiple data sets and the potential accuracy of such predictions.

Discussion Character

  • Exploratory
  • Technical explanation
  • Debate/contested
  • Mathematical reasoning

Main Points Raised

  • One participant proposes using linear regression with two data sources to predict USA summer highs, questioning if this method could be more accurate than using a single data source.
  • Another participant argues that combining temperature data from Australia and the USA is meaningless due to significant geographical and climatic differences.
  • A different participant suggests that while it is possible to combine data from two sources, it requires careful specification of the regression model and consideration of contextual factors such as seasonality and geographical differences.
  • One participant emphasizes the need for clarity in the prediction goal, questioning whether the aim is to predict a single high temperature for the entire USA or for specific cities, and whether a single linear model is appropriate.
  • Another participant shares their experience with temperature data, stating that linear regression may not be suitable and suggesting the use of Fourier transforms as a better method for prediction.
  • A participant notes the complexity of weather prediction, highlighting that meteorologists face challenges even with advanced technology and extensive data.

Areas of Agreement / Disagreement

Participants express differing views on the validity and methodology of using multiple data sources for linear regression. There is no consensus on whether combining the data sets would yield more accurate predictions, and the discussion remains unresolved regarding the best approach to take.

Contextual Notes

Participants highlight the importance of understanding the context and differences between data sets, as well as the limitations of linear regression in predicting complex phenomena like weather.

Josh Terrill
Messages
1
Reaction score
0
I want to try to predict the USA summer highs using a linear regression. I know I can probably take data from the last 10 summers and plug that in, and use that to predict, but I'd like to use two data sources. 1 data source from the historical highs from past summers in the USA, and the 2nd data source from the historical highs from past summers from another country like Australia who has opposite seasons from us in the USA. Is it possible to do a linear regression from two data sources, and use both of them to predict a number?

Do you think this is a more accurate way of doing this? Or is it just as accurate as using one data source like historical USA highs.
 
Physics news on Phys.org
Linear regression will give you a number - but that number will have little or nothing to to with what you are looking for. Natural temperature variations (day/night, summer/winter) are cyclic and extremes are more or less random.

Combining temperatures in Australia with temperatures in the USA is meaningless. Think about it - northern Australia is quite close to the equator, a large part of central Australia is desert, etc...
 
Hey Josh Terrill.

You can do it - but you have to specify the regression model to combine them.

The simplest way to combine information is a weighted sum where you have w1*x1 + w2*x2 where w1 + w2 = 1 but there are many ways to combine information (based on the different kinds of functions you can think of).

You will have to have an understanding of the differences and context between the different data sets and how they standardize against each other with respect to the variable you are making inferences on.

This will probably mean adjusting for things like season, and other geographical factors.

Without any context or domain knowledge for your data, an extended response is not possible.
 
Josh Terrill said:
I want to try to predict the USA summer highs using a linear regression.

You should explain clearly what you want to do. You use the plural "highs". This suggests there is some aspect of time involved. For example, perhaps you are tying to predict the maximum temperature on each day of the summer. You say "USA". It isn't clear whether you mean to get a single high temperature for the entire USA or whether you are interested in one particular city - or perhaps you want to predict the daily max temperature for each major city in the US.

When you speak of "using a linear regression" this suggests using a single model that consists of a single linear equation. However, perhaps you'd also consider using a different model for different situation. For example, you might use an equation to predict the high temperature in Greensboro NC on June 12 based on historical high temperatures for other cities and use a different equation to predict the high temperature for a different date or for a different US city.

Is it possible to do a linear regression from two data sources, and use both of them to predict a number?

It may be mathematically possible, depending on what the data and the equation actually are.

A more general question is whether we can increase the reliability of predictions by using information that seems, at first sight, to be irrelevant or only indirectly relevant to what is being predicted. A lot has been written about this problem, but I can't summarize it as a simple set of instructions.
 
I have official temperature data for a long period (6 times a day for one year) and you cannot use linear regression for anything. The best predictor I found was to take the Fourier transform of the data and throw out the higher frequencies. I could then transform back to the time domain.

Be aware that meteorologists use large computers, a large network of weather stations and satellite images - and the still have problems predicting the weather one week ahead.
 

Similar threads

  • · Replies 4 ·
Replies
4
Views
2K
  • · Replies 6 ·
Replies
6
Views
2K
  • · Replies 1 ·
Replies
1
Views
2K
  • · Replies 30 ·
2
Replies
30
Views
5K
  • · Replies 1 ·
Replies
1
Views
2K
  • · Replies 8 ·
Replies
8
Views
3K
  • · Replies 64 ·
3
Replies
64
Views
6K
  • · Replies 2 ·
Replies
2
Views
2K
  • · Replies 6 ·
Replies
6
Views
3K
  • · Replies 4 ·
Replies
4
Views
2K