# Linear Models vs Nonlinear Models

• I
• fog37
A linear model is just a model where the coefficients/parameters are not raised to the a power higher than 1. So, a cubic polynomial is also an example of a linear model even if the fitting curve is not a plane or straight line.On the other hand, a nonlinear model is a function connecting ##Y## and the ##X## via parameters raised to powers higher than 1? For example, would $$Y= a log(X)$$ be still a linear model since the unknown coefficient ##a## is raised to the 1st power? How about $$Y= \frac {a X_1{^2}} {b X_2 ^3}$$ Is it linear or nonlinear since the

#### fog37

TL;DR Summary
Linear Model vs Nonlinear Models
Hello,

Models can be linear and nonlinear and I just learned that a "linear model" is not just a model where the dependent variable ##Y## is connected to independent variables ##X## raised to the 1st power... The model is called "linear" because the coefficients/parameters are not raised to the a power higher than 1. Is that correct?

So a cubic polynomial is also an example of a linear model even if the fitting curve is not a plane or straight line...

On the other hand, a nonlinear model is a function connecting ##Y## and the ##X## via parameters raised to powers higher than 1? For example, would $$Y= a log(X)$$ be still a linear model since the unknown coefficient ##a## is raised to the 1st power? How about $$Y= \frac {a X_1{^2}} {b X_2 ^3}$$ Is it linear or nonlinear since the coefficients ##a## and ##b## are not raised to exponents higher than 1?

Thank you and happy thanksgiving.

The normal definition of linearity is just superposition: a function f(x) is linear iff f(a+b) = f(a) + f(b).
So, note that the equation for a line, f(x) = mx + b, is not a linear function (unless b=0). That often confuses people.

• fog37 and malawi_glenn
In particular, to be a linear model ##X## needs to have nothing done to it, but you can apply arbitrary functions to parameters.

##Y= a\log(X)## is not linear. ##Y=\log(a)X## is.

DaveE said:
The normal definition of linearity is just superposition: a function f(x) is linear iff f(a+b) = f(a) + f(b).
So, note that the equation for a line, f(x) = mx + b, is not a linear function (unless b=0). That often confuses people.
I understand the linearity requirements:
$$f(ax)=af(x)$$ $$f(x+y)=f(x) +f(y)$$ $$f(0)=0$$

In that case, a quadratic polynomial like ##f(x) = x^2+x+3## would not fall in the "linear models" category, I believe... However, it is described as such...

Generally a model requires one to solve a differential equation in some unknown function of position or time, and the model is linear if that equation is linear in the unknown and its derivatives. The solution doesn't have to be linear in position or time.

We allow models with equations of the form $L(y(x)) = f(x)$ as linear models provided the operator $L$ is linear in $y$ and its derivatives.

fog37 said:
I understand the linearity requirements:
$$f(ax)=af(x)$$ $$f(x+y)=f(x) +f(y)$$ $$f(0)=0$$

In that case, a quadratic polynomial like ##f(x) = x^2+x+3## would not fall in the "linear models" category, I believe... However, it is described as such...
Yes. However, you will find examples of people that are either sloppy, or assuming other knowledge/contexts. For example, you might describe ##f(x) = x^2+x+3## as a linear combination of the basis set of functions {##x^n##}, but that's not the same as saying ##f(x)## is linear. You will often see people thinking lines are linear, as in the ##f(x)=mx+b## example, as I said before. So, that begs the question "Who is describing what?"

BTW, I never understood why people list all three of those linearity requirements since two of them are a trivial derivation of superposition.

• hutchphd
DaveE said:
BTW, I never understood why people list all three of those linearity requirements since two of them are a trivial derivation of superposition.

If it's so trivial, go ahead and prove it. No, you're not allowed to assume the function is continuous, or even that the field has a topology.

I've seen several setups in which linearity is defined for coefficients, rather than for parameters, so that, e.g., ##ax^2## is considered linear, since ##c(a+b)x^2##= ##c(ax^2+bx^2)=cax^2+cbx^2##
Ideally, you'd see from the context which one they're using.

Office_Shredder said:
If it's so trivial, go ahead and prove it. No, you're not allowed to assume the function is continuous, or even that the field has a topology.
Yes. I know you're correct. Mathematicians always are, and have a pathological functions available to prove it. That's why I switched from math to engineering when I got to Algebra.

However, I'll bite, since I know I'll learn something. Can you give me example functions that satisfy superposition but not the others? But, one (obvious) stipulation, no fair leaving the domain of the functions, so if f(x) is undefined for x≤0, for example, of course you can't have f(0)=0.

Seriously, I'm not arguing or trolling, I'd like to see this.

In machine learning and statistical analysis, algorithms are often separated as "linear" and "nonlinear". For example, logistic regression is considered a "linear" model...

DaveE said:
.However, I'll bite, since I know I'll learn something. Can you give me example functions that satisfy superposition but not the others? But, one (obvious) stipulation, no fair leaving the domain of the functions, so if f(x) is undefined for x≤0, for example, of course you can't have f(0)=0.

Yeah, this is a pretty neat question. Let's focus on real vector spaces.

Since we're just examining scaling, we can assume that our vector space is one dimensional. Let ##x## be an arbitrary element of the space, so every element is ##\alpha x## for some ##\alpha \in \mathbb{R}##. You could drop the x entirely but I find it helps to keep the concepts more clear. Keep in mind ##x## is an arbitrary element of the space with no special property, so if we prove something for ##x## we prove it for the whole space.

We know, for example, that ##f(nx)= nf(x)## for any integer n and ## f(x)=f(mx/m)=mf(x/m)## which gives ##f(x/m)=f(x)/m## for any integer m. Combining these gives ##f(\frac{n}{m} x)=\frac{n}{m} f(x)##.

OK, so we have proven that this function is a linear function over ##\mathbb{Q}##. This is an infinite dimensional vector space over ##\mathbb{Q}## though. If the function is continuous with a reasonable topology then we can use that to show linearity over ##\mathbb{ R}##, as for any ##\alpha\in \mathbb{R}##, we can pick a sequence ##q_n\to \alpha##, ##q_n\I'm \mathbb{Q}##, ##f(\alpha x)=\lim_n f(q_n x)= \lim_{n\to \infty} q_nf(x) =\alpha f(x)##.

So we need to pick a linear function over ##\mathbb{Q}## which is not continuous. Well the way you define any linear function is by defining it on a basis.

https://mathworld.wolfram.com/HamelBasis.html

Fortunately this exists. There is some index set ##T##, and real numbers ##\alpha_t## for each ##t\in T##, such that any real number is uniquely defined as a finite rational combination of the ##\alpha_t##.

In particular this means that ##\alpha_t x## is a basis for our one dimensional vector space.

We can pick any arbitrary value for each ##f(\alpha_t x)##, and then define ##f(\alpha x)=\sum q_t f(\alpha_t x)## where ##\alpha= \sum q_t \alpha_t##. Remember this is a finite sum so is very well defined.

That's pretty much it. Almost any choice of numbers works. It's unfortunately impossible to actually construct a full hamel basis. For example I could pick two of my elements to be ##\sqrt{2}## and ##\sqrt{3}##, and then extend this set to a hamel basis, and define ##f(\sqrt{2} x)=1##,##f(\sqrt{3} x)=1##, and ##f(\alpha_t x)=0## for the rest of them. But this isn't enough to compute anything, ##f(\pi)## would be zero if ##\pi +\sqrt{2}-\sqrt{3}## is in my basis, but would be 1 if ##\pi-\sqrt{2}## is in my basis, and I haven't picked that with what I've written so far. But the function is obviously not scaling the way you want in ##\mathbb{R}## at this point.

There is probably a simpler example with finite fields, I'll try to think of one.

• BvU and DaveE
Office_Shredder said:
Yeah, this is a pretty neat question. Let's focus on real vector spaces.

Since we're just examining scaling, we can assume that our vector space is one dimensional. Let ##x## be an arbitrary element of the space, so every element is ##\alpha x## for some ##\alpha \in \mathbb{R}##. You could drop the x entirely but I find it helps to keep the concepts more clear. Keep in mind ##x## is an arbitrary element of the space with no special property, so if we prove something for ##x## we prove it for the whole space.

We know, for example, that ##f(nx)= nf(x)## for any integer n and ## f(x)=f(mx/m)=mf(x/m)## which gives ##f(x/m)=f(x)/m## for any integer m. Combining these gives ##f(\frac{n}{m} x)=\frac{n}{m} f(x)##.

OK, so we have proven that this function is a linear function over ##\mathbb{Q}##. This is an infinite dimensional vector space over ##\mathbb{Q}## though. If the function is continuous with a reasonable topology then we can use that to show linearity over ##\mathbb{ R}##, as for any ##\alpha\in \mathbb{R}##, we can pick a sequence ##q_n\to \alpha##, ##q_n\I'm \mathbb{Q}##, ##f(\alpha x)=\lim_n f(q_n x)= \lim_{n\to \infty} q_nf(x) =\alpha f(x)##.

So we need to pick a linear function over ##\mathbb{Q}## which is not continuous. Well the way you define any linear function is by defining it on a basis.

https://mathworld.wolfram.com/HamelBasis.html

Fortunately this exists. There is some index set ##T##, and real numbers ##\alpha_t## for each ##t\in T##, such that any real number is uniquely defined as a finite rational combination of the ##\alpha_t##.

In particular this means that ##\alpha_t x## is a basis for our one dimensional vector space.

We can pick any arbitrary value for each ##f(\alpha_t x)##, and then define ##f(\alpha x)=\sum q_t f(\alpha_t x)## where ##\alpha= \sum q_t \alpha_t##. Remember this is a finite sum so is very well defined.

That's pretty much it. Almost any choice of numbers works. It's unfortunately impossible to actually construct a full hamel basis. For example I could pick two of my elements to be ##\sqrt{2}## and ##\sqrt{3}##, and then extend this set to a hamel basis, and define ##f(\sqrt{2} x)=1##,##f(\sqrt{3} x)=1##, and ##f(\alpha_t x)=0## for the rest of them. But this isn't enough to compute anything, ##f(\pi)## would be zero if ##\pi +\sqrt{2}-\sqrt{3}## is in my basis, but would be 1 if ##\pi-\sqrt{2}## is in my basis, and I haven't picked that with what I've written so far. But the function is obviously not scaling the way you want in ##\mathbb{R}## at this point.

There is probably a simpler example with finite fields, I'll try to think of one.
Excellent, thank you.

• hutchphd
Simple example that doesn't require the axiom of choice, but does require some basic field knowledge. Let ##F=F_2[x]/(x^2+x+1)##. Concretely, if that notation looks weird, ##F## is a field with four elements. 0, 1, ##x##, ##x+1## with ##b+b=0## for every ##b##, and ##x## a root of the equation ##y^2+y+1=0##. This uniquely defines a well defined addition, e.g. ##x+x+1=(x+x)+1=1##. Also, ##x^2+x+1=0## which defines multiplication, e.g. ##x(x+1)= x^2+x=x^2+x+(1+1)=1##.

This turns out to be a field. Here additive functions on a one dimensional vector space only get the property for free that ##f(y+y)=0##. In fact, ##F## is a two dimensional vector space over ##F_2##, which is the field of 2 elements only containing 0 and 1, with basis 1 and x. Define ##f:F\to F## to be a linear function on ##F_2## where ##f(1)=1##, ##f(x)=1##. This is well defined and additive by definition, but to give an example, by definition ##f(x+1)=f(x)+f(1)=1+1=0##. Then ##1=f(1)=f(x+x+1)=f(x)+f(x+1)=1+0=1##. You can check all the cases for yourself if you want.

It's not linear over ##F## though, ##f((x+1)x)=f(1)=1##, which is not ##(x+1)f(x)=x+1##.

The basic construction is the same. Pick a field, which has a subfield. The big field has a basis over the small field. Pick any linear function (over the small field) from the field to itself that isn't the identity map. This will be additive in the big field, but not linear in the big field.

Office_Shredder said:
Simple example that doesn't require the axiom of choice, but does require some basic field knowledge. Let ##F=F_2[x]/(x^2+x+1)##. Concretely, if that notation looks weird, ##F## is a field with four elements. 0, 1, ##x##, ##x+1## with ##b+b=0## for every ##b##, and ##x## a root of the equation ##y^2+y+1=0##. This uniquely defines a well defined addition, e.g. ##x+x+1=(x+x)+1=1##. Also, ##x^2+x+1=0## which defines multiplication, e.g. ##x(x+1)= x^2+x=x^2+x+(1+1)=1##.

This turns out to be a field. Here additive functions on a one dimensional vector space only get the property for free that ##f(y+y)=0##. In fact, ##F## is a two dimensional vector space over ##F_2##, which is the field of 2 elements only containing 0 and 1, with basis 1 and x. Define ##f:F\to F## to be a linear function on ##F_2## where ##f(1)=1##, ##f(x)=1##. This is well defined and additive by definition, but to give an example, by definition ##f(x+1)=f(x)+f(1)=1+1=0##. Then ##1=f(1)=f(x+x+1)=f(x)+f(x+1)=1+0=1##. You can check all the cases for yourself if you want.

It's not linear over ##F## though, ##f((x+1)x)=f(1)=1##, which is not ##(x+1)f(x)=x+1##.

The basic construction is the same. Pick a field, which has a subfield. The big field has a basis over the small field. Pick any linear function (over the small field) from the field to itself that isn't the identity map. This will be additive in the big field, but not linear in the big field.
This is your simpler example? Mostly over my head, but I've only read it 3 times and I did drink a beer or two a while ago. LOL. I'll try again later.

BTW, I'll cite this as proof that Engineering Math is way easier (and more useful) than Math Math.
Which reminds me of a quote I can only paraphrase now from a Stanford Math Prof that was asked what use his research was. He replied 'If it was useful, I wouldn't be doing it. That's the Math they do in the engineering and physics departments.'

• scottdave
DaveE said:
This is your simpler example? Mostly over my head, but I've only read it 3 times and I did drink a beer or two a while ago. LOL. I'll try again later.

BTW, I'll cite this as proof that Engineering Math is way easier (and more useful) than Math Math.
Which reminds me of a quote I can only paraphrase now from a Stanford Math Prof that was asked what use his research was. He replied 'If it was useful, I wouldn't be doing it. That's the Math they do in the engineering and physics departments.'

OK fine maybe a simpler example. The set of all numbers of the form ##a+b\sqrt{2}## with ##a,b\in \mathbb{Q}## is a field called ##\mathbb{Q}(\sqrt{2})##, and also a two dimensional vector space over ##\mathbb{Q}##. ##f((a+b\sqrt{2})x)=(a+b)x## is an additive function on the vector space of one dimension over ##\mathbb{Q}(\sqrt{2})##, but it's not linear, as ## f(\sqrt(2)x))=x\neq \sqrt{2}f(x)##. Note it is linear on the two dimensional vector space over ##\mathbb{Q}## of elements of the for. ##ax + b\sqrt{2}x##.

• DaveE
Office_Shredder said:
OK fine maybe a simpler example. The set of all numbers of the form ##a+b\sqrt{2}## with ##a,b\in \mathbb{Q}## is a field called ##\mathbb{Q}(\sqrt{2})##, and also a two dimensional vector space over ##\mathbb{Q}##. ##f((a+b\sqrt{2})x)=(a+b)x## is an additive function on the vector space of one dimension over ##\mathbb{Q}(\sqrt{2})##, but it's not linear, as ## f(\sqrt(2)x))=x\neq \sqrt{2}f(x)##. Note it is linear on the two dimensional vector space over ##\mathbb{Q}## of elements of the for. ##ax + b\sqrt{2}x##.
Much better, less jargon, less familiarity with the subject required for this one. Thanks.

When you say Linear Model, then a Linear Regression approach to Machine Learning comes to mind. This is often described as finding coefficients B0, B1, etc which can satisfy f(X) = B0 + B1×X1 + B2×X2 + ... + Bn×Xn + errors. We want to minimize the errors to get a good fit.

Note that the X1, X2, etc are vectors representing several observations of a variable. For example X1 could represent heights of people, X2 might be waist measurement and maybe X3 is waist^2, etc. So while there is a squared term, the function is a linear combination of the vectors.

Sorry I am on my phone or I would do the Greek beta characters for coefficients.

Last edited:
• jbergman and SammyS
DaveE said:
The normal definition of linearity is just superposition: a function f(x) is linear iff f(a+b) = f(a) + f(b).
So, note that the equation for a line, f(x) = mx + b, is not a linear function (unless b=0). That often confuses people.
Huh, I have a masters in statistics and didn't know this. I guess it is because it is so easy to convert this function to a truly linear one, do your analysis, then convert back. Or for most of the tools we use it doesn't matter.

• scottdave
Yes, these so -called lines or linear models are, strictly-speaking; maybe pedantically, affine models, which are just the composition of a translation with a linear map.

fog37 said:
TL;DR Summary: Linear Model vs Nonlinear Models

Hello,

Models can be linear and nonlinear and I just learned that a "linear model" is not just a model where the dependent variable ##Y## is connected to independent variables ##X## raised to the 1st power... The model is called "linear" because the coefficients/parameters are not raised to the a power higher than 1. Is that correct?

So a cubic polynomial is also an example of a linear model even if the fitting curve is not a plane or straight line...

On the other hand, a nonlinear model is a function connecting ##Y## and the ##X## via parameters raised to powers higher than 1? For example, would $$Y= a log(X)$$ be still a linear model since the unknown coefficient ##a## is raised to the 1st power? How about $$Y= \frac {a X_1{^2}} {b X_2 ^3}$$ Is it linear or nonlinear since the coefficients ##a## and ##b## are not raised to exponents higher than 1?

Thank you and happy thanksgiving.

Yes, if you are talking about models in the sense of regression [statistics]. We say a regression model is linear if the coefficients appear only to the first power.

Think about this equation in algebra: 3x + 5y + 9z = 10. The unknowns are x, y, z, all raised to the first power, none in radicals or other functions, and this is referred to as a linear equation.

In a regression model like

y = b0 + b1x1 + b2x2 + b3x3

the y and x values are known: it is b0, b1, b2, and b3 that are the unknowns. That equation above is a linear regression model because the unknowns, the bs, appear only to the first power, not in any fractions or other functions. This next is also a linear model, for the same reason.

y = b0 + b1x1 + b2 sqrt(x2) + b3 (1/x3)

-- again, the x-values are not the unknowns, the b value are the unknowns.

This is an example of a nonlinear regression model

y = exp(b0 + b1x1)/(1+exp(b0 + b1x1))

• • jbergman, berkeman, DaveE and 1 other person
Yes, if you are talking about models in the sense of regression [statistics]. We say a regression model is linear if the coefficients appear only to the first power.

Think about this equation in algebra: 3x + 5y + 9z = 10. The unknowns are x, y, z, all raised to the first power, none in radicals or other functions, and this is referred to as a linear equation.

In a regression model like

y = b0 + b1x1 + b2x2 + b3x3

the y and x values are known: it is b0, b1, b2, and b3 that are the unknowns. That equation above is a linear regression model because the unknowns, the bs, appear only to the first power, not in any fractions or other functions. This next is also a linear model, for the same reason.

y = b0 + b1x1 + b2 sqrt(x2) + b3 (1/x3)

-- again, the x-values are not the unknowns, the b value are the unknowns.

This is an example of a nonlinear regression model

y = exp(b0 + b1x1)/(1+exp(b0 + b1x1))
Right. And the reason this is done Is that the X data is fixed when you are fitting your models so any transformations won't affect model fitting.

Whereas your coefficients are the unknowns and whether the model is linear in them impacts the possible ways to fit the model.

What is also confusing are terms like Generalized Linear Models like logistic regression which strictly speaking aren't linear functions of the coefficients but are in some sense close to linear models.

Last edited:
fog37 said:
I understand the linearity requirements:
$$f(ax)=af(x)$$ $$f(x+y)=f(x) +f(y)$$ $$f(0)=0$$

It's not controversial that these are requirements for a "linear transformation" ##f## that operates on a vector space. (The real numbers can be considered to be a vector space.) When it comes to the requirements for a "linear function" or "linear model", I'm not sure whether standard definitions for those phrases are well established.

Stephen Tashi said:
It's not controversial that these are requirements for a "linear transformation" ##f## that operates on a vector space. (The real numbers can be considered to be a vector space.) When it comes to the requirements for a "linear function" or "linear model", I'm not sure whether standard definitions for those phrases are well established.
Maybe not universal, but really, really, really common in the world of dynamic systems of all sorts. This is because it opens up the world of linear analysis tools like Laplace transforms and such. It allows canonical solutions that can be scaled or are not sensitive to initial conditions. In fact it's pretty normal in the engineering world to use them anyway, even for non-linear systems (linearization with parameters and such), then you see how good the approximation is and prey that you aren't really non-linear, LOL.

• scottdave