# Linear Model vs Nonlinear Models

• I
fog37
TL;DR Summary
Linear Model vs Nonlinear Models
Hello,

Models can be linear and nonlinear and I just learned that a "linear model" is not just a model where the dependent variable ##Y## is connected to independent variables ##X## raised to the 1st power... The model is called "linear" because the coefficients/parameters are not raised to the a power higher than 1. Is that correct?

So a cubic polynomial is also an example of a linear model even if the fitting curve is not a plane or straight line...

On the other hand, a nonlinear model is a function connecting ##Y## and the ##X## via parameters raised to powers higher than 1? For example, would $$Y= a log(X)$$ be still a linear model since the unknown coefficient ##a## is raised to the 1st power? How about $$Y= \frac {a X_1{^2}} {b X_2 ^3}$$ Is it linear or nonlinear since the coefficients ##a## and ##b## are not raised to exponents higher than 1?

Thank you and happy thanksgiving.

Gold Member
The normal definition of linearity is just superposition: a function f(x) is linear iff f(a+b) = f(a) + f(b).
So, note that the equation for a line, f(x) = mx + b, is not a linear function (unless b=0). That often confuses people.

fog37 and malawi_glenn
Staff Emeritus
Gold Member
2021 Award
In particular, to be a linear model ##X## needs to have nothing done to it, but you can apply arbitrary functions to parameters.

##Y= a\log(X)## is not linear. ##Y=\log(a)X## is.

fog37
The normal definition of linearity is just superposition: a function f(x) is linear iff f(a+b) = f(a) + f(b).
So, note that the equation for a line, f(x) = mx + b, is not a linear function (unless b=0). That often confuses people.
I understand the linearity requirements:
$$f(ax)=af(x)$$ $$f(x+y)=f(x) +f(y)$$ $$f(0)=0$$

In that case, a quadratic polynomial like ##f(x) = x^2+x+3## would not fall in the "linear models" category, I believe... However, it is described as such...

Homework Helper
Generally a model requires one to solve a differential equation in some unknown function of position or time, and the model is linear if that equation is linear in the unknown and its derivatives. The solution doesn't have to be linear in position or time.

We allow models with equations of the form $L(y(x)) = f(x)$ as linear models provided the operator $L$ is linear in $y$ and its derivatives.

Gold Member
I understand the linearity requirements:
$$f(ax)=af(x)$$ $$f(x+y)=f(x) +f(y)$$ $$f(0)=0$$

In that case, a quadratic polynomial like ##f(x) = x^2+x+3## would not fall in the "linear models" category, I believe... However, it is described as such...
Yes. However, you will find examples of people that are either sloppy, or assuming other knowledge/contexts. For example, you might describe ##f(x) = x^2+x+3## as a linear combination of the basis set of functions {##x^n##}, but that's not the same as saying ##f(x)## is linear. You will often see people thinking lines are linear, as in the ##f(x)=mx+b## example, as I said before. So, that begs the question "Who is describing what?"

BTW, I never understood why people list all three of those linearity requirements since two of them are a trivial derivation of superposition.

hutchphd
Staff Emeritus
Gold Member
2021 Award
BTW, I never understood why people list all three of those linearity requirements since two of them are a trivial derivation of superposition.

If it's so trivial, go ahead and prove it. No, you're not allowed to assume the function is continuous, or even that the field has a topology.

Gold Member
I've seen several setups in which linearity is defined for coefficients, rather than for parameters, so that, e.g., ##ax^2## is considered linear, since ##c(a+b)x^2##= ##c(ax^2+bx^2)=cax^2+cbx^2##
Ideally, you'd see from the context which one they're using.

Gold Member
If it's so trivial, go ahead and prove it. No, you're not allowed to assume the function is continuous, or even that the field has a topology.
Yes. I know you're correct. Mathematicians always are, and have a pathological functions available to prove it. That's why I switched from math to engineering when I got to Algebra.

However, I'll bite, since I know I'll learn something. Can you give me example functions that satisfy superposition but not the others? But, one (obvious) stipulation, no fair leaving the domain of the functions, so if f(x) is undefined for x≤0, for example, of course you can't have f(0)=0.

Seriously, I'm not arguing or trolling, I'd like to see this.

fog37
In machine learning and statistical analysis, algorithms are often separated as "linear" and "nonlinear". For example, logistic regression is considered a "linear" model...

Staff Emeritus
Gold Member
2021 Award
.However, I'll bite, since I know I'll learn something. Can you give me example functions that satisfy superposition but not the others? But, one (obvious) stipulation, no fair leaving the domain of the functions, so if f(x) is undefined for x≤0, for example, of course you can't have f(0)=0.

Yeah, this is a pretty neat question. Let's focus on real vector spaces.

Since we're just examining scaling, we can assume that our vector space is one dimensional. Let ##x## be an arbitrary element of the space, so every element is ##\alpha x## for some ##\alpha \in \mathbb{R}##. You could drop the x entirely but I find it helps to keep the concepts more clear. Keep in mind ##x## is an arbitrary element of the space with no special property, so if we prove something for ##x## we prove it for the whole space.

We know, for example, that ##f(nx)= nf(x)## for any integer n and ## f(x)=f(mx/m)=mf(x/m)## which gives ##f(x/m)=f(x)/m## for any integer m. Combining these gives ##f(\frac{n}{m} x)=\frac{n}{m} f(x)##.

OK, so we have proven that this function is a linear function over ##\mathbb{Q}##. This is an infinite dimensional vector space over ##\mathbb{Q}## though. If the function is continuous with a reasonable topology then we can use that to show linearity over ##\mathbb{ R}##, as for any ##\alpha\in \mathbb{R}##, we can pick a sequence ##q_n\to \alpha##, ##q_n\I'm \mathbb{Q}##, ##f(\alpha x)=\lim_n f(q_n x)= \lim_{n\to \infty} q_nf(x) =\alpha f(x)##.

So we need to pick a linear function over ##\mathbb{Q}## which is not continuous. Well the way you define any linear function is by defining it on a basis.

https://mathworld.wolfram.com/HamelBasis.html

Fortunately this exists. There is some index set ##T##, and real numbers ##\alpha_t## for each ##t\in T##, such that any real number is uniquely defined as a finite rational combination of the ##\alpha_t##.

In particular this means that ##\alpha_t x## is a basis for our one dimensional vector space.

We can pick any arbitrary value for each ##f(\alpha_t x)##, and then define ##f(\alpha x)=\sum q_t f(\alpha_t x)## where ##\alpha= \sum q_t \alpha_t##. Remember this is a finite sum so is very well defined.

That's pretty much it. Almost any choice of numbers works. It's unfortunately impossible to actually construct a full hamel basis. For example I could pick two of my elements to be ##\sqrt{2}## and ##\sqrt{3}##, and then extend this set to a hamel basis, and define ##f(\sqrt{2} x)=1##,##f(\sqrt{3} x)=1##, and ##f(\alpha_t x)=0## for the rest of them. But this isn't enough to compute anything, ##f(\pi)## would be zero if ##\pi +\sqrt{2}-\sqrt{3}## is in my basis, but would be 1 if ##\pi-\sqrt{2}## is in my basis, and I haven't picked that with what I've written so far. But the function is obviously not scaling the way you want in ##\mathbb{R}## at this point.

There is probably a simpler example with finite fields, I'll try to think of one.

BvU and DaveE
Gold Member
Yeah, this is a pretty neat question. Let's focus on real vector spaces.

Since we're just examining scaling, we can assume that our vector space is one dimensional. Let ##x## be an arbitrary element of the space, so every element is ##\alpha x## for some ##\alpha \in \mathbb{R}##. You could drop the x entirely but I find it helps to keep the concepts more clear. Keep in mind ##x## is an arbitrary element of the space with no special property, so if we prove something for ##x## we prove it for the whole space.

We know, for example, that ##f(nx)= nf(x)## for any integer n and ## f(x)=f(mx/m)=mf(x/m)## which gives ##f(x/m)=f(x)/m## for any integer m. Combining these gives ##f(\frac{n}{m} x)=\frac{n}{m} f(x)##.

OK, so we have proven that this function is a linear function over ##\mathbb{Q}##. This is an infinite dimensional vector space over ##\mathbb{Q}## though. If the function is continuous with a reasonable topology then we can use that to show linearity over ##\mathbb{ R}##, as for any ##\alpha\in \mathbb{R}##, we can pick a sequence ##q_n\to \alpha##, ##q_n\I'm \mathbb{Q}##, ##f(\alpha x)=\lim_n f(q_n x)= \lim_{n\to \infty} q_nf(x) =\alpha f(x)##.

So we need to pick a linear function over ##\mathbb{Q}## which is not continuous. Well the way you define any linear function is by defining it on a basis.

https://mathworld.wolfram.com/HamelBasis.html

Fortunately this exists. There is some index set ##T##, and real numbers ##\alpha_t## for each ##t\in T##, such that any real number is uniquely defined as a finite rational combination of the ##\alpha_t##.

In particular this means that ##\alpha_t x## is a basis for our one dimensional vector space.

We can pick any arbitrary value for each ##f(\alpha_t x)##, and then define ##f(\alpha x)=\sum q_t f(\alpha_t x)## where ##\alpha= \sum q_t \alpha_t##. Remember this is a finite sum so is very well defined.

That's pretty much it. Almost any choice of numbers works. It's unfortunately impossible to actually construct a full hamel basis. For example I could pick two of my elements to be ##\sqrt{2}## and ##\sqrt{3}##, and then extend this set to a hamel basis, and define ##f(\sqrt{2} x)=1##,##f(\sqrt{3} x)=1##, and ##f(\alpha_t x)=0## for the rest of them. But this isn't enough to compute anything, ##f(\pi)## would be zero if ##\pi +\sqrt{2}-\sqrt{3}## is in my basis, but would be 1 if ##\pi-\sqrt{2}## is in my basis, and I haven't picked that with what I've written so far. But the function is obviously not scaling the way you want in ##\mathbb{R}## at this point.

There is probably a simpler example with finite fields, I'll try to think of one.
Excellent, thank you.

Staff Emeritus
Gold Member
2021 Award
Simple example that doesn't require the axiom of choice, but does require some basic field knowledge. Let ##F=F_2[x]/(x^2+x+1)##. Concretely, if that notation looks weird, ##F## is a field with four elements. 0, 1, ##x##, ##x+1## with ##b+b=0## for every ##b##, and ##x## a root of the equation ##y^2+y+1=0##. This uniquely defines a well defined addition, e.g. ##x+x+1=(x+x)+1=1##. Also, ##x^2+x+1=0## which defines multiplication, e.g. ##x(x+1)= x^2+x=x^2+x+(1+1)=1##.

This turns out to be a field. Here additive functions on a one dimensional vector space only get the property for free that ##f(y+y)=0##. In fact, ##F## is a two dimensional vector space over ##F_2##, which is the field of 2 elements only containing 0 and 1, with basis 1 and x. Define ##f:F\to F## to be a linear function on ##F_2## where ##f(1)=1##, ##f(x)=1##. This is well defined and additive by definition, but to give an example, by definition ##f(x+1)=f(x)+f(1)=1+1=0##. Then ##1=f(1)=f(x+x+1)=f(x)+f(x+1)=1+0=1##. You can check all the cases for yourself if you want.

It's not linear over ##F## though, ##f((x+1)x)=f(1)=1##, which is not ##(x+1)f(x)=x+1##.

The basic construction is the same. Pick a field, which has a subfield. The big field has a basis over the small field. Pick any linear function (over the small field) from the field to itself that isn't the identity map. This will be additive in the big field, but not linear in the big field.

Gold Member
Simple example that doesn't require the axiom of choice, but does require some basic field knowledge. Let ##F=F_2[x]/(x^2+x+1)##. Concretely, if that notation looks weird, ##F## is a field with four elements. 0, 1, ##x##, ##x+1## with ##b+b=0## for every ##b##, and ##x## a root of the equation ##y^2+y+1=0##. This uniquely defines a well defined addition, e.g. ##x+x+1=(x+x)+1=1##. Also, ##x^2+x+1=0## which defines multiplication, e.g. ##x(x+1)= x^2+x=x^2+x+(1+1)=1##.

This turns out to be a field. Here additive functions on a one dimensional vector space only get the property for free that ##f(y+y)=0##. In fact, ##F## is a two dimensional vector space over ##F_2##, which is the field of 2 elements only containing 0 and 1, with basis 1 and x. Define ##f:F\to F## to be a linear function on ##F_2## where ##f(1)=1##, ##f(x)=1##. This is well defined and additive by definition, but to give an example, by definition ##f(x+1)=f(x)+f(1)=1+1=0##. Then ##1=f(1)=f(x+x+1)=f(x)+f(x+1)=1+0=1##. You can check all the cases for yourself if you want.

It's not linear over ##F## though, ##f((x+1)x)=f(1)=1##, which is not ##(x+1)f(x)=x+1##.

The basic construction is the same. Pick a field, which has a subfield. The big field has a basis over the small field. Pick any linear function (over the small field) from the field to itself that isn't the identity map. This will be additive in the big field, but not linear in the big field.
This is your simpler example? Mostly over my head, but I've only read it 3 times and I did drink a beer or two a while ago. LOL. I'll try again later.

BTW, I'll cite this as proof that Engineering Math is way easier (and more useful) than Math Math.
Which reminds me of a quote I can only paraphrase now from a Stanford Math Prof that was asked what use his research was. He replied 'If it was useful, I wouldn't be doing it. That's the Math they do in the engineering and physics departments.'

Staff Emeritus
Gold Member
2021 Award
This is your simpler example? Mostly over my head, but I've only read it 3 times and I did drink a beer or two a while ago. LOL. I'll try again later.

BTW, I'll cite this as proof that Engineering Math is way easier (and more useful) than Math Math.
Which reminds me of a quote I can only paraphrase now from a Stanford Math Prof that was asked what use his research was. He replied 'If it was useful, I wouldn't be doing it. That's the Math they do in the engineering and physics departments.'

OK fine maybe a simpler example. The set of all numbers of the form ##a+b\sqrt{2}## with ##a,b\in \mathbb{Q}## is a field called ##\mathbb{Q}(\sqrt{2})##, and also a two dimensional vector space over ##\mathbb{Q}##. ##f((a+b\sqrt{2})x)=(a+b)x## is an additive function on the vector space of one dimension over ##\mathbb{Q}(\sqrt{2})##, but it's not linear, as ## f(\sqrt(2)x))=x\neq \sqrt{2}f(x)##. Note it is linear on the two dimensional vector space over ##\mathbb{Q}## of elements of the for. ##ax + b\sqrt{2}x##.

DaveE