Linear regression with two data sets?

In summary, the conversation discusses using linear regression to predict summer highs in the USA. The speaker suggests using data from the last 10 summers and possibly data from another country, like Australia, to improve accuracy. However, the expert explains that linear regression may not be the best method for this type of prediction and suggests considering other factors, such as season and geographical differences, when combining data from different sources. They also mention the limitations of relying on a single model for prediction and the challenges of predicting weather accurately.
  • #1
Josh Terrill
1
0
I want to try to predict the USA summer highs using a linear regression. I know I can probably take data from the last 10 summers and plug that in, and use that to predict, but I'd like to use two data sources. 1 data source from the historical highs from past summers in the USA, and the 2nd data source from the historical highs from past summers from another country like Australia who has opposite seasons from us in the USA. Is it possible to do a linear regression from two data sources, and use both of them to predict a number?

Do you think this is a more accurate way of doing this? Or is it just as accurate as using one data source like historical USA highs.
 
Mathematics news on Phys.org
  • #2
Linear regression will give you a number - but that number will have little or nothing to to with what you are looking for. Natural temperature variations (day/night, summer/winter) are cyclic and extremes are more or less random.

Combining temperatures in Australia with temperatures in the USA is meaningless. Think about it - northern Australia is quite close to the equator, a large part of central Australia is desert, etc...
 
  • #3
Hey Josh Terrill.

You can do it - but you have to specify the regression model to combine them.

The simplest way to combine information is a weighted sum where you have w1*x1 + w2*x2 where w1 + w2 = 1 but there are many ways to combine information (based on the different kinds of functions you can think of).

You will have to have an understanding of the differences and context between the different data sets and how they standardize against each other with respect to the variable you are making inferences on.

This will probably mean adjusting for things like season, and other geographical factors.

Without any context or domain knowledge for your data, an extended response is not possible.
 
  • #4
Josh Terrill said:
I want to try to predict the USA summer highs using a linear regression.

You should explain clearly what you want to do. You use the plural "highs". This suggests there is some aspect of time involved. For example, perhaps you are tying to predict the maximum temperature on each day of the summer. You say "USA". It isn't clear whether you mean to get a single high temperature for the entire USA or whether you are interested in one particular city - or perhaps you want to predict the daily max temperature for each major city in the US.

When you speak of "using a linear regression" this suggests using a single model that consists of a single linear equation. However, perhaps you'd also consider using a different model for different situation. For example, you might use an equation to predict the high temperature in Greensboro NC on June 12 based on historical high temperatures for other cities and use a different equation to predict the high temperature for a different date or for a different US city.

Is it possible to do a linear regression from two data sources, and use both of them to predict a number?

It may be mathematically possible, depending on what the data and the equation actually are.

A more general question is whether we can increase the reliability of predictions by using information that seems, at first sight, to be irrelevant or only indirectly relevant to what is being predicted. A lot has been written about this problem, but I can't summarize it as a simple set of instructions.
 
  • #5
I have official temperature data for a long period (6 times a day for one year) and you cannot use linear regression for anything. The best predictor I found was to take the Fourier transform of the data and throw out the higher frequencies. I could then transform back to the time domain.

Be aware that meteorologists use large computers, a large network of weather stations and satellite images - and the still have problems predicting the weather one week ahead.
 

1. What is linear regression?

Linear regression is a statistical method used to model the relationship between two variables. It assumes that there is a linear relationship between the independent variable (x) and the dependent variable (y), and uses this relationship to make predictions about future data points.

2. How is linear regression different from correlation?

While both linear regression and correlation measure the relationship between two variables, they are different in terms of their goals and outputs. Linear regression aims to predict the value of the dependent variable based on the independent variable, while correlation measures the strength and direction of the relationship between the two variables.

3. What are the assumptions of linear regression?

There are several assumptions that must be met in order for linear regression to be a valid method of analysis. These include linearity, independence, normality, and homoscedasticity. Linearity assumes that the relationship between the variables is linear, independence assumes that the data points are not influenced by each other, normality assumes that the data follows a normal distribution, and homoscedasticity assumes that the variance of the errors is constant.

4. How do you interpret the results of a linear regression analysis?

The results of a linear regression analysis typically include a coefficient of determination (R-squared), which measures the amount of variation in the dependent variable that can be explained by the independent variable. Additionally, the slope coefficient (beta) indicates the direction and strength of the relationship between the variables, and the intercept coefficient (alpha) shows the value of the dependent variable when the independent variable is equal to zero.

5. How do you know if a linear regression model is a good fit for the data?

There are several measures that can be used to evaluate the fit of a linear regression model, including R-squared, root mean squared error (RMSE), and residual plots. A high R-squared value and a low RMSE indicate a good fit, while a visual examination of the residual plot can help identify any patterns or trends that may suggest the model is not a good fit for the data.

Similar threads

  • General Math
Replies
1
Views
798
Replies
8
Views
2K
Replies
4
Views
1K
  • STEM Educators and Teaching
Replies
11
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
30
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
2
Views
465
  • Engineering and Comp Sci Homework Help
Replies
7
Views
957
  • Introductory Physics Homework Help
Replies
10
Views
1K
  • Set Theory, Logic, Probability, Statistics
2
Replies
64
Views
3K
  • Set Theory, Logic, Probability, Statistics
Replies
1
Views
1K
Back
Top