# Time series and why would we remove seasonality and trend?

• I
• fog37
fog37
TL;DR Summary
Time series and why removing of seasonality and trend
Hello,
I understand a few things about time series but I am unclear on other main concepts. Hope you can help me get on the right track.
• A time series is simply a 1D signal with the variable time ##t## on the horizontal axis and another variable of choice ##X## on the vertical axis. The time implies a precise order of the samples of the variable ##X## (sequence).
• I understand that the time signal ##X(t)## can be viewed as the sum of 3 components which are a) seasonality, b) trend, c) random component. Seasonality means a there is a periodic component (not matter its functional shape, sine, etc.). Trend is another functional shape (linear, curvilinear, etc.). The random component is obvious.
• Signals can be stationary or not. Stationarity simply means if we take a segment of the ##X(t)##, say from 5s to 8s, and another sample from 10-13s, the two segments are not identical but statistically similar (mean, correlation, etc.): the statistical properties of ##X(t)## don't change over time.
My question:

The goal in time series analysis is generally coming up with a model that predicts future values using past values. Why would we want to remove seasonality and/or trend from ##X(t)##? That would seem to change the identity of the signal....I get that removing them would make the signal stationary if it is not...But I am thinking how two different signals ##X(t)## are indeed different because they are holistically different in their seasonality, trend, etc.

If a signal is truly ##X(t) = seasonality+trend+random component##, removing the first two leaves us with only the random part...

I see how removing seasonality may make sense sometimes. For example, the earnings of a company may go up and down over the course of a year simply due to what generally happens during a specific month. That is useful to know even if it makes the time series not stationary....

Thank you!

trend and seasonality are removed by differencing which does not lose information

Take trend - one you difference by taking the log of the value change it’s easy to recreate by reversing the process

fog37 said:
TL;DR Summary: Time series and why removing of seasonality and trend

Why would we want to remove seasonality and/or trend
Usually because we have some specific application in mind and for that application we are uninterested in the variation due to seasonality or trend.

For example, right now in my location the temperatures are dropping. A time series analysis shows strong seasonal effects and a smaller trend.

My neighbor, a meteorologist, wants to include both the trend and the seasonal variation in his advice whether to wear a jacket tomorrow.

My other neighbor, a climate scientist, wants to remove the seasonal variation to show that the planet is warming despite the fact that it is colder today than yesterday.

My other other neighbor, a store owner, wants to remove the trend to figure when to place an order for a bunch of swim suits based on the seasonal variation.

The decision about removing one thing or another is based on the application rather than the data. All three had the same data and model.

FactChecker and fog37
I see. Thank you Dale. That makes a lot of sense....I guess what I am reading is all about removing seasonality and that confused me.

What about the strive to make a time-series stationary if it is not?

Do you have a simple example like the ones above about when removing non-stationarity or keeping may be application dependent?

Thank you again!

fog37 said:
What about the strive to make a time-series stationary if it is not?
That I don’t know enough about to give solid advice. Maybe different statistical methods need to be used for non-stationary series?

fog37 said:
What about the strive to make a time-series stationary if it is not?

Do you have a simple example like the ones above about when removing non-stationarity or keeping may be application dependent?
How would you measure the volatility of the S&P 500 over the past 30 years, when the index value has gone up 10x? you cannot take simply take the standard deviation of the price

Dale
fog37 said:
Do you have a simple example like the ones above about when removing non-stationarity or keeping may be application dependent?
1) Suppose you had decades of data of the daily high temperatures at a location. If you were interested in the long-term temperature change (Global warming?), you would not want to have to always consider if you were looking at summer or winter data. So you would want to remove the effects of seasons. On the other hand, if you were interested in seasonal variation, you would want to remove the long-term effects of global warming so that you can compare summers to winters without having to consider the general upward trend over time.

2) Suppose you were looking at the daily reported deaths from COVID-19 over a year. In general, deaths are not well reported over the weekends and then they catch up on Monday and Tuesday. If you are interested in the long-term spread of COVID, you would want to remove the weekend/Monday_Tuesday cycles so that you can see the long term growth rate. On the other hand, if you are interested on how the reporting is done, you might want to do the opposite -- remove the long term trend to get stationary data and compare weekends to the Monday_Tuesday numbers.

Dale
Reading back on all your comments, I was thinking of the fundamental concept of why time series are so special.
For example, given two continuous variables ##X## and ##Y##, we can do a scatter plot ##Y## vs ##X## and determine the Pearson correlation as well as come up with the linear regression model ##Y= a X +b##.

In the case of a time variable ##t## and the variable ##Z(t)##, we can also do a scatter plot and determine how the correlation coefficient of the linear predictive model ## Y(t)=a t + b##...

So what makes ##t## so different from the variable ##X##? We plot both ##t## and ##X##, in increasing order, on the horizontal axis and the other variable on the vertical axis. For each ##X## there is an associated ##Y## and for each ##t## there is an associated ##Z## value...

Maybe the difference really show up when we get autoregressive type models where ##Z(t)## can depend on ##t## at the current value, at previous values of ##t## and on previous values of ##Z##. All of this does not happen for the variable pair ##Y## and ##X##. Also, autocorrelation would not make sense on the values of ##X## or ##Y## alone while we can do it for ##Z##, i.e. ##corr(lag)=E[Z(t) Z(t+lag)]##

Is my understanding correct?

fog37 said:
Is my understanding correct?
You may be over-thinking it. There are a great many things where the best predictor of a future value is the current value and/or some combination of prior values. It's as simple as that.

fog37
FactChecker said:
You may be over-thinking it. There are a great many things where the best predictor of a future value is the current value and/or some combination of prior values. It's as simple as that.
You might be right.

Yes
best predictor of a future value is the current value and/or some combination of prior values. It's as simple as that.
but that is happens with time sequenced data (future, prior, current apply to time data).

That concept does not apply to cross-sectional data that is not time data. For example, ##X##=weight and ##Y##=height. Usually we don't see autoregressive models on cross-section data. On the other hand, autocorrelation is used in regression analysis with cross-sectional to determine, for example, if the residuals are independent.

fog37 said:
but that is happens with time sequenced data (future, prior, current apply to time data).

That concept does not apply to cross-sectional data that is not time data.
You make an interesting point. IMO, the concept still applies but the implementation is not common. Often the best predictor of the value at one location is the values around it. The term "around it" could mean time, position, or some other dimension. I have personally dealt only with the standard time series where earlier data was known, a future data is being estimated, and nothing is known for times beyond that. If data afterward (or around) was known and we are just estimating an unknown intermediate value, would that be completely different? It seems very similar to me.

## What is a time series?

A time series is a sequence of data points collected or recorded at specific time intervals. These data points are typically measured over consistent time periods, such as daily, monthly, or yearly, and are used to analyze trends, patterns, and other temporal dynamics in the data.

## Why is seasonality important in time series analysis?

Seasonality refers to regular, predictable changes that recur every calendar year in a time series. It is important because it can significantly affect the analysis and forecasting of the data. By understanding and accounting for seasonality, analysts can make more accurate predictions and better understand the underlying patterns in the data.

## What is a trend in a time series?

A trend is the long-term movement or direction in the data over time. It represents the underlying pattern that emerges when short-term fluctuations and seasonal effects are removed. Identifying the trend helps in understanding the general direction in which the data is moving, which is crucial for long-term forecasting and strategic planning.

## Why would we remove seasonality and trend from a time series?

Removing seasonality and trend from a time series, a process known as detrending and deseasonalizing, is done to isolate the underlying noise or irregular component. This helps in better understanding the inherent variability in the data and improves the accuracy of models used for forecasting and anomaly detection. It also allows for the analysis of the residuals, which can provide insights into the underlying processes driving the time series.

## How can we remove seasonality and trend from a time series?

Seasonality and trend can be removed using various methods such as differencing, decomposition, and filtering. Differencing involves subtracting the previous observation from the current observation to remove the trend. Decomposition separates the time series into trend, seasonal, and residual components. Filtering techniques, like moving averages or exponential smoothing, can also be used to smooth out the trend and seasonal components, leaving the residuals for further analysis.

Replies
8
Views
1K
Replies
8
Views
1K
Replies
14
Views
925
Replies
5
Views
3K
Replies
3
Views
833
Replies
6
Views
906
Replies
3
Views
1K
Replies
11
Views
1K
Replies
9
Views
2K
Replies
2
Views
11K