1. Limited time only! Sign up for a free 30min personal tutor trial with Chegg Tutors
    Dismiss Notice
Dismiss Notice
Join Physics Forums Today!
The friendliest, high quality science and math community on the planet! Everyone who loves science is here!

B Linear regression with two data sets?

  1. May 9, 2016 #1
    I want to try to predict the USA summer highs using a linear regression. I know I can probably take data from the last 10 summers and plug that in, and use that to predict, but I'd like to use two data sources. 1 data source from the historical highs from past summers in the USA, and the 2nd data source from the historical highs from past summers from another country like Australia who has opposite seasons from us in the USA. Is it possible to do a linear regression from two data sources, and use both of them to predict a number?

    Do you think this is a more accurate way of doing this? Or is it just as accurate as using one data source like historical USA highs.
     
  2. jcsd
  3. May 9, 2016 #2

    Svein

    User Avatar
    Science Advisor

    Linear regression will give you a number - but that number will have little or nothing to to with what you are looking for. Natural temperature variations (day/night, summer/winter) are cyclic and extremes are more or less random.

    Combining temperatures in Australia with temperatures in the USA is meaningless. Think about it - northern Australia is quite close to the equator, a large part of central Australia is desert, etc...
     
  4. May 13, 2016 #3

    chiro

    User Avatar
    Science Advisor

    Hey Josh Terrill.

    You can do it - but you have to specify the regression model to combine them.

    The simplest way to combine information is a weighted sum where you have w1*x1 + w2*x2 where w1 + w2 = 1 but there are many ways to combine information (based on the different kinds of functions you can think of).

    You will have to have an understanding of the differences and context between the different data sets and how they standardize against each other with respect to the variable you are making inferences on.

    This will probably mean adjusting for things like season, and other geographical factors.

    Without any context or domain knowledge for your data, an extended response is not possible.
     
  5. May 14, 2016 #4

    Stephen Tashi

    User Avatar
    Science Advisor

    You should explain clearly what you want to do. You use the plural "highs". This suggests there is some aspect of time involved. For example, perhaps you are tying to predict the maximum temperature on each day of the summer. You say "USA". It isn't clear whether you mean to get a single high temperature for the entire USA or whether you are interested in one particular city - or perhaps you want to predict the daily max temperature for each major city in the US.

    When you speak of "using a linear regression" this suggests using a single model that consists of a single linear equation. However, perhaps you'd also consider using a different model for different situation. For example, you might use an equation to predict the high temperature in Greensboro NC on June 12 based on historical high temperatures for other cities and use a different equation to predict the high temperature for a different date or for a different US city.

    It may be mathematically possible, depending on what the data and the equation actually are.

    A more general question is whether we can increase the reliability of predictions by using information that seems, at first sight, to be irrelevant or only indirectly relevant to what is being predicted. A lot has been written about this problem, but I can't summarize it as a simple set of instructions.
     
  6. May 14, 2016 #5

    Svein

    User Avatar
    Science Advisor

    I have official temperature data for a long period (6 times a day for one year) and you cannot use linear regression for anything. The best predictor I found was to take the Fourier transform of the data and throw out the higher frequencies. I could then transform back to the time domain.

    Be aware that meteorologists use large computers, a large network of weather stations and satellite images - and the still have problems predicting the weather one week ahead.
     
Know someone interested in this topic? Share this thread via Reddit, Google+, Twitter, or Facebook

Have something to add?
Draft saved Draft deleted