Choosing which variable is the dependent or independent variable

  • B
  • Thread starter fog37
  • Start date
  • #1
fog37
1,398
95
TL;DR Summary
Choosing which variable is the dependent or independent variable
Hello,
I am review some basics. In the case of two variables like ##t## and ##C##, which represent the time in year and the cost in $ of a product respectively, we can graphically represent this pair of variables on a Cartesian plane. The horizonal axis is generally assigned the variable ##t## and the vertical axis to the variable ##C##.

The horizontal axis is commonly reserved for the independent variable and the vertical one for the dependent variable. But this idea of dependent/independent or cause/effect does not always apply. For example, there many not be any cause/effect or "dependence" between the two variables. We are just pairing them and visualize the pairs of values as points.

So, in general, given two variables, how do we choose which one to assign to the horizontal axis and which to the vertical axis? What is the best practice criterion? We can clearly do things the other way without breaking any rule. Is the vertical axis for the variable that we consider...

For example, if ##n## is the number of clicks and ##p## is the number of purchases associated to the number of clicks, we can form the ordered pairs ##(n,p)## or also the ordered pairs ##(p,n)## and plot these pairs as points either in the graph "p vs n" or "n vs p"...

Thanks
 

Answers and Replies

  • #2
phinds
Science Advisor
Insights Author
Gold Member
2022 Award
18,212
11,223
For example, if ##n## is the number of clicks and ##p## is the number of purchases associated to the number of clicks, we can form the ordered pairs ##(n,p)## or also the ordered pairs ##(p,n)## and plot these pairs as points either in the graph "p vs n" or "n vs p"...
Exactly. Why would you want one of these to be preferable? Well, it depends on what you are trying to show. THAT is the criterion.
 
  • #3
fog37
1,398
95
Exactly. Why would you want one of these to be preferable? Well, it depends on what you are trying to show. THAT is the criterion.

Thanks phinds. When you say "it depends on what you are trying to show", I interpret that as meaning that we may be interested in showing how one specific variable varies as we vary the other variable. That puts emphasis on the first variable and subordinates the other variable...
For example, we are generally interested in seeing how the cost (variable ##C##) behaves as times goes by, i.e. as we increase the variable ##t## so we put ##C## on the vertical axis and ##t## on the horizontal axis.

The other way around, seeing how time ##t## increases as we increase the price ##C## does not make much sense even if it is surely allowed.

So, in many cases, I guess it depends on the context and in other cases it does not matter at all which variable is on the vertical or horizontal axis.

For example, in a multi-variable problem where we have, say, 10 variables, we can choose one of them to be the dependent variable and the other become the independent ones...
 
  • #5
phinds
Science Advisor
Insights Author
Gold Member
2022 Award
18,212
11,223
The other way around, seeing how time ##t## increases as we increase the price ##C## does not make much sense even if it is surely allowed.
Right. There are specific cases where it would make little to no sense for one of the variables to be the dependent variable, but IN GENERAL, it depends on what you are trying to show.
 
  • #6
Office_Shredder
Staff Emeritus
Science Advisor
Gold Member
5,517
1,467
For example, if ##n## is the number of clicks and ##p## is the number of purchases associated to the number of clicks, we can form the ordered pairs ##(n,p)## or also the ordered pairs ##(p,n)## and plot these pairs as points either in the graph "p vs n" or "n vs p"...

Thanks

Which question do you think you are going to ask:

If I get 20 more clicks (independent variable I control through my advertising budget), how many more purchases will I get?

Or

If I get 7 more purchases (by... Increasing the number of clicks? How many more clicks do I need to do that anyway) how many more clicks will I get?

I think it's pretty clear that for most ways that you would want to analyze these variables, the number of clicks is the x-axis and the number of purchases is the y axis.

As long as your are careful about the details it doesn't matter which way you do it, you can get the same answers anyway (e.g. you can plot it (p,n) and still ask how many more purchases you get if you get 20 more clicks), but it will be less intuitive that way.
 
  • #7
fog37
1,398
95
Which question do you think you are going to ask:

If I get 20 more clicks (independent variable I control through my advertising budget), how many more purchases will I get?

Or

If I get 7 more purchases (by... Increasing the number of clicks? How many more clicks do I need to do that anyway) how many more clicks will I get?

I think it's pretty clear that for most ways that you would want to analyze these variables, the number of clicks is the x-axis and the number of purchases is the y axis.

As long as your are careful about the details it doesn't matter which way you do it, you can get the same answers anyway (e.g. you can plot it (p,n) and still ask how many more purchases you get if you get 20 more clicks), but it will be less intuitive that way.

Yes, I agree. In this case (purchases and clicks) it is naturally more relevant to see how the number of clicks (x-axis) controls the number of purchases (y-axis) and not the other way around.
In the end, I guess it really depends on the context. One of the variable becomes what we are interested in while the other variable, like a knob, is changed. What we really care may be the y-variable but more importantly is the relationship between the two variables considered together.
 
Last edited:
  • #8
fog37
1,398
95
Just to wrap this topic up:

if the two-variable data was random, it could still be represented by a function, correct? For example, let's consider the ordered pairs ##(t, Temp)## where ##t## is time in seconds and ##Temp## is the temperature of a certain environment. We collect temperature data every second and plot it. The graph looks "random", i.e. we cannot visually identify a precise pattern for the dots in the graph.

However, there is a relation between the set ##t## and the set ##Temp##. Such relation is a function since every time instant ##t## has one temperature value ##Temp## associate to it.

However, regardless of being a function, such data does not have a mathematical equation that can represent it for future unknown values. So there is no equation associated to this function but it is still a function nonetheless.

Of course, if we collected a finite number of values, say 100 values, after the data collection we could still find some equation (higher order complicated polynomial), via best-fit, that could describe the ordered pair ##(t, Temp)##.

I guess we can always find an equation for some two-variable data over a finite domain...

Is that correct? Thanks!
 
  • #9
phinds
Science Advisor
Insights Author
Gold Member
2022 Award
18,212
11,223
If there is no equation then no, it is not a function in the math sense but in the English language sense of one thing being a function of another then sure. In science it's important to not confuse English language terms with the same word as a STEM term even though they may be quite similar in meaning. English is much less precise.
 
  • #10
fog37
1,398
95
When the definition of function is introduced, even in math books, a function is solely defined as a relation between elements of set A and the elements of another set B with the special property that elements of set A can only have one element in set B.

What this means is that a function can be a graph, a table, a set of instructions that link an element to another element or an equation. But not all functions are equations. Even in the mathematical sense, I guess a function does not to always have to be an equation. If we take the set of functions, functions that are equations is just a subset of the entire function set. I learned this from this Khan Academy video: https://www.khanacademy.org/math/cc-eighth-grade-math/cc-8th-linear-equations-functions/cc-8th-function-intro/v/relations-and-functions
 
  • #11
phinds
Science Advisor
Insights Author
Gold Member
2022 Award
18,212
11,223
Even in the mathematical sense, I guess a function does not to always have to be an equation.
OK, Not the way I learned it, but that doesn't mean the way I learned it is right.
Wikipedia says
In mathematics, a function is a binary relation between two sets that associates every element of the first set to exactly one element of the second set.
so sounds to me like you're right.
 
  • #12
Office_Shredder
Staff Emeritus
Science Advisor
Gold Member
5,517
1,467
I agree the temperature on a thermometer is a function of time.

Of course, if we collected a finite number of values, say 100 values, after the data collection we could still find some equation (higher order complicated polynomial), via best-fit, that could describe the ordered pair ##(t, Temp)##.

I strongly recommend you do not do this.
 
  • Like
  • Haha
Likes fog37 and suremarc
  • #13
fog37
1,398
95
I agree the temperature on a thermometer is a function of time.



I strongly recommend you do not do this.

Hi, Offfice_shredder. Why not exactly?

My impression is that that particular best-fit equation would/might fit the 100 random data points but would be no good for predicting the 100th value and any other future random value. Is that what you mean?
 
  • #14
Office_Shredder
Staff Emeritus
Science Advisor
Gold Member
5,517
1,467
Yes, it would not be good at predicting anything.

Also you describe it as a "best fit" polynomial, but given the conversation I assume you are actually picking an exact fit polynomial of degree equal to the number of data points you have. That is going to be particularly bad at predicting anything.
 
  • #15
fog37
1,398
95
Yes, it would not be good at predicting anything.

Also you describe it as a "best fit" polynomial, but given the conversation I assume you are actually picking an exact fit polynomial of degree equal to the number of data points you have. That is going to be particularly bad at predicting anything.
Thanks Office_Shredder!

So it is (always?) possible to find an overly convoluted polynomial that will pass exactly through N arbitrary points. And that polynomial is exactly Nth degree, correct?
 
  • #16
Office_Shredder
Staff Emeritus
Science Advisor
Gold Member
5,517
1,467
Yeah, actually you can do one better, with N data points you only need a polynomial of degree N-1.

For example any two points have a line passing through them.

The polynomial tends to look very bad when you're extrapolating outside of your data.

For example if you had two temperature data points, you draw a line in between, the function you have will give reasonable values for temperature (though not necessarily accurate values), but if you let time go to infinity it's going to give values that are obviously dumb (either positive or negative infinity depending on the slope).
 
  • #17
Stephen Tashi
Science Advisor
7,783
1,541
My impression is that that particular best-fit equation would/might fit the 100 random data points

There are different mathematical definitions of "best-fit" and some of them (like linear least squares regression) give different answers depending on which variables are considered dependent and which are considered independent. Some definitions treat all variables on the same footing - for example: https://en.wikipedia.org/wiki/Total_least_squares
 

Suggested for: Choosing which variable is the dependent or independent variable

Replies
7
Views
358
Replies
4
Views
403
Replies
2
Views
578
Replies
6
Views
2K
  • Last Post
Replies
10
Views
854
Replies
18
Views
1K
Replies
11
Views
495
Replies
9
Views
670
Replies
7
Views
849
  • Last Post
Replies
3
Views
489
Top