B Choosing which variable is the dependent or independent variable

AI Thread Summary
Choosing which variable to assign as dependent or independent often depends on the context and the specific relationship being analyzed. Typically, the independent variable is placed on the horizontal axis, while the dependent variable is on the vertical axis, reflecting how one variable influences the other. However, this convention can be flexible; the key is to clarify what you aim to demonstrate with the data. For example, in analyzing clicks and purchases, it is more intuitive to consider clicks as the independent variable affecting purchases. Ultimately, understanding the relationship between the variables is crucial, and while functions can exist without a clear predictive equation, their representation can still provide valuable insights.
fog37
Messages
1,566
Reaction score
108
TL;DR Summary
Choosing which variable is the dependent or independent variable
Hello,
I am review some basics. In the case of two variables like ##t## and ##C##, which represent the time in year and the cost in $ of a product respectively, we can graphically represent this pair of variables on a Cartesian plane. The horizonal axis is generally assigned the variable ##t## and the vertical axis to the variable ##C##.

The horizontal axis is commonly reserved for the independent variable and the vertical one for the dependent variable. But this idea of dependent/independent or cause/effect does not always apply. For example, there many not be any cause/effect or "dependence" between the two variables. We are just pairing them and visualize the pairs of values as points.

So, in general, given two variables, how do we choose which one to assign to the horizontal axis and which to the vertical axis? What is the best practice criterion? We can clearly do things the other way without breaking any rule. Is the vertical axis for the variable that we consider...

For example, if ##n## is the number of clicks and ##p## is the number of purchases associated to the number of clicks, we can form the ordered pairs ##(n,p)## or also the ordered pairs ##(p,n)## and plot these pairs as points either in the graph "p vs n" or "n vs p"...

Thanks
 
Mathematics news on Phys.org
fog37 said:
For example, if ##n## is the number of clicks and ##p## is the number of purchases associated to the number of clicks, we can form the ordered pairs ##(n,p)## or also the ordered pairs ##(p,n)## and plot these pairs as points either in the graph "p vs n" or "n vs p"...
Exactly. Why would you want one of these to be preferable? Well, it depends on what you are trying to show. THAT is the criterion.
 
  • Like
Likes fog37
phinds said:
Exactly. Why would you want one of these to be preferable? Well, it depends on what you are trying to show. THAT is the criterion.

Thanks phinds. When you say "it depends on what you are trying to show", I interpret that as meaning that we may be interested in showing how one specific variable varies as we vary the other variable. That puts emphasis on the first variable and subordinates the other variable...
For example, we are generally interested in seeing how the cost (variable ##C##) behaves as times goes by, i.e. as we increase the variable ##t## so we put ##C## on the vertical axis and ##t## on the horizontal axis.

The other way around, seeing how time ##t## increases as we increase the price ##C## does not make much sense even if it is surely allowed.

So, in many cases, I guess it depends on the context and in other cases it does not matter at all which variable is on the vertical or horizontal axis.

For example, in a multi-variable problem where we have, say, 10 variables, we can choose one of them to be the dependent variable and the other become the independent ones...
 
fog37 said:
The other way around, seeing how time ##t## increases as we increase the price ##C## does not make much sense even if it is surely allowed.
Right. There are specific cases where it would make little to no sense for one of the variables to be the dependent variable, but IN GENERAL, it depends on what you are trying to show.
 
  • Like
Likes jedishrfu
fog37 said:
For example, if ##n## is the number of clicks and ##p## is the number of purchases associated to the number of clicks, we can form the ordered pairs ##(n,p)## or also the ordered pairs ##(p,n)## and plot these pairs as points either in the graph "p vs n" or "n vs p"...

Thanks

Which question do you think you are going to ask:

If I get 20 more clicks (independent variable I control through my advertising budget), how many more purchases will I get?

Or

If I get 7 more purchases (by... Increasing the number of clicks? How many more clicks do I need to do that anyway) how many more clicks will I get?

I think it's pretty clear that for most ways that you would want to analyze these variables, the number of clicks is the x-axis and the number of purchases is the y axis.

As long as your are careful about the details it doesn't matter which way you do it, you can get the same answers anyway (e.g. you can plot it (p,n) and still ask how many more purchases you get if you get 20 more clicks), but it will be less intuitive that way.
 
  • Like
Likes fog37
Office_Shredder said:
Which question do you think you are going to ask:

If I get 20 more clicks (independent variable I control through my advertising budget), how many more purchases will I get?

Or

If I get 7 more purchases (by... Increasing the number of clicks? How many more clicks do I need to do that anyway) how many more clicks will I get?

I think it's pretty clear that for most ways that you would want to analyze these variables, the number of clicks is the x-axis and the number of purchases is the y axis.

As long as your are careful about the details it doesn't matter which way you do it, you can get the same answers anyway (e.g. you can plot it (p,n) and still ask how many more purchases you get if you get 20 more clicks), but it will be less intuitive that way.

Yes, I agree. In this case (purchases and clicks) it is naturally more relevant to see how the number of clicks (x-axis) controls the number of purchases (y-axis) and not the other way around.
In the end, I guess it really depends on the context. One of the variable becomes what we are interested in while the other variable, like a knob, is changed. What we really care may be the y-variable but more importantly is the relationship between the two variables considered together.
 
Last edited:
Just to wrap this topic up:

if the two-variable data was random, it could still be represented by a function, correct? For example, let's consider the ordered pairs ##(t, Temp)## where ##t## is time in seconds and ##Temp## is the temperature of a certain environment. We collect temperature data every second and plot it. The graph looks "random", i.e. we cannot visually identify a precise pattern for the dots in the graph.

However, there is a relation between the set ##t## and the set ##Temp##. Such relation is a function since every time instant ##t## has one temperature value ##Temp## associate to it.

However, regardless of being a function, such data does not have a mathematical equation that can represent it for future unknown values. So there is no equation associated to this function but it is still a function nonetheless.

Of course, if we collected a finite number of values, say 100 values, after the data collection we could still find some equation (higher order complicated polynomial), via best-fit, that could describe the ordered pair ##(t, Temp)##.

I guess we can always find an equation for some two-variable data over a finite domain...

Is that correct? Thanks!
 
If there is no equation then no, it is not a function in the math sense but in the English language sense of one thing being a function of another then sure. In science it's important to not confuse English language terms with the same word as a STEM term even though they may be quite similar in meaning. English is much less precise.
 
  • #10
When the definition of function is introduced, even in math books, a function is solely defined as a relation between elements of set A and the elements of another set B with the special property that elements of set A can only have one element in set B.

What this means is that a function can be a graph, a table, a set of instructions that link an element to another element or an equation. But not all functions are equations. Even in the mathematical sense, I guess a function does not to always have to be an equation. If we take the set of functions, functions that are equations is just a subset of the entire function set. I learned this from this Khan Academy video: https://www.khanacademy.org/math/cc-eighth-grade-math/cc-8th-linear-equations-functions/cc-8th-function-intro/v/relations-and-functions
 
  • #11
fog37 said:
Even in the mathematical sense, I guess a function does not to always have to be an equation.
OK, Not the way I learned it, but that doesn't mean the way I learned it is right.
Wikipedia says
In mathematics, a function is a binary relation between two sets that associates every element of the first set to exactly one element of the second set.
so sounds to me like you're right.
 
  • #12
I agree the temperature on a thermometer is a function of time.

fog37 said:
Of course, if we collected a finite number of values, say 100 values, after the data collection we could still find some equation (higher order complicated polynomial), via best-fit, that could describe the ordered pair ##(t, Temp)##.

I strongly recommend you do not do this.
 
  • Like
  • Haha
Likes fog37 and suremarc
  • #13
Office_Shredder said:
I agree the temperature on a thermometer is a function of time.
I strongly recommend you do not do this.

Hi, Offfice_shredder. Why not exactly?

My impression is that that particular best-fit equation would/might fit the 100 random data points but would be no good for predicting the 100th value and any other future random value. Is that what you mean?
 
  • #14
Yes, it would not be good at predicting anything.

Also you describe it as a "best fit" polynomial, but given the conversation I assume you are actually picking an exact fit polynomial of degree equal to the number of data points you have. That is going to be particularly bad at predicting anything.
 
  • Like
Likes fog37
  • #15
Office_Shredder said:
Yes, it would not be good at predicting anything.

Also you describe it as a "best fit" polynomial, but given the conversation I assume you are actually picking an exact fit polynomial of degree equal to the number of data points you have. That is going to be particularly bad at predicting anything.
Thanks Office_Shredder!

So it is (always?) possible to find an overly convoluted polynomial that will pass exactly through N arbitrary points. And that polynomial is exactly Nth degree, correct?
 
  • #16
Yeah, actually you can do one better, with N data points you only need a polynomial of degree N-1.

For example any two points have a line passing through them.

The polynomial tends to look very bad when you're extrapolating outside of your data.

For example if you had two temperature data points, you draw a line in between, the function you have will give reasonable values for temperature (though not necessarily accurate values), but if you let time go to infinity it's going to give values that are obviously dumb (either positive or negative infinity depending on the slope).
 
  • Like
Likes fog37
  • #17
fog37 said:
My impression is that that particular best-fit equation would/might fit the 100 random data points

There are different mathematical definitions of "best-fit" and some of them (like linear least squares regression) give different answers depending on which variables are considered dependent and which are considered independent. Some definitions treat all variables on the same footing - for example: https://en.wikipedia.org/wiki/Total_least_squares
 
  • Like
Likes fog37

Similar threads

Replies
10
Views
2K
Replies
13
Views
2K
Replies
9
Views
2K
Replies
3
Views
1K
Replies
3
Views
2K
Replies
2
Views
2K
Replies
2
Views
5K
Back
Top