Choosing which variable is the dependent or independent variable

fog37 · Feb 12, 2021

Hello,
I am review some basics. In the case of two variables like ##t## and ##C##, which represent the time in year and the cost in $ of a product respectively, we can graphically represent this pair of variables on a Cartesian plane. The horizonal axis is generally assigned the variable ##t## and the vertical axis to the variable ##C##.

The horizontal axis is commonly reserved for the independent variable and the vertical one for the dependent variable. But this idea of dependent/independent or cause/effect does not always apply. For example, there many not be any cause/effect or "dependence" between the two variables. We are just pairing them and visualize the pairs of values as points.

So, in general, given two variables, how do we choose which one to assign to the horizontal axis and which to the vertical axis? What is the best practice criterion? We can clearly do things the other way without breaking any rule. Is the vertical axis for the variable that we consider...

For example, if ##n## is the number of clicks and ##p## is the number of purchases associated to the number of clicks, we can form the ordered pairs ##(n,p)## or also the ordered pairs ##(p,n)## and plot these pairs as points either in the graph "p vs n" or "n vs p"...

Thanks

phinds · Feb 12, 2021

fog37 said:

For example, if ##n## is the number of clicks and ##p## is the number of purchases associated to the number of clicks, we can form the ordered pairs ##(n,p)## or also the ordered pairs ##(p,n)## and plot these pairs as points either in the graph "p vs n" or "n vs p"...

Exactly. Why would you want one of these to be preferable? Well, it depends on what you are trying to show. THAT is the criterion.

fog37 · Feb 12, 2021

phinds said:

Exactly. Why would you want one of these to be preferable? Well, it depends on what you are trying to show. THAT is the criterion.

Thanks phinds. When you say "it depends on what you are trying to show", I interpret that as meaning that we may be interested in showing how one specific variable varies as we vary the other variable. That puts emphasis on the first variable and subordinates the other variable...
For example, we are generally interested in seeing how the cost (variable ##C##) behaves as times goes by, i.e. as we increase the variable ##t## so we put ##C## on the vertical axis and ##t## on the horizontal axis.

The other way around, seeing how time ##t## increases as we increase the price ##C## does not make much sense even if it is surely allowed.

So, in many cases, I guess it depends on the context and in other cases it does not matter at all which variable is on the vertical or horizontal axis.

For example, in a multi-variable problem where we have, say, 10 variables, we can choose one of them to be the dependent variable and the other become the independent ones...

jedishrfu · Feb 12, 2021

This Wikipedia article explains the two types of variables and as @phinds says it depends on the context and what you are trying to demonstrate.

https://en.wikipedia.org/wiki/Dependent_and_independent_variables

Basically, you’re trying to determine if the tail wags the dog or the dog wags it’s tail based what you‘re trying to prove.

phinds · Feb 12, 2021

fog37 said:

The other way around, seeing how time ##t## increases as we increase the price ##C## does not make much sense even if it is surely allowed.

Right. There are specific cases where it would make little to no sense for one of the variables to be the dependent variable, but IN GENERAL, it depends on what you are trying to show.

Office_Shredder · Feb 12, 2021

fog37 said:

For example, if ##n## is the number of clicks and ##p## is the number of purchases associated to the number of clicks, we can form the ordered pairs ##(n,p)## or also the ordered pairs ##(p,n)## and plot these pairs as points either in the graph "p vs n" or "n vs p"...

Thanks

Which question do you think you are going to ask:

If I get 20 more clicks (independent variable I control through my advertising budget), how many more purchases will I get?

Or

If I get 7 more purchases (by... Increasing the number of clicks? How many more clicks do I need to do that anyway) how many more clicks will I get?

I think it's pretty clear that for most ways that you would want to analyze these variables, the number of clicks is the x-axis and the number of purchases is the y axis.

As long as your are careful about the details it doesn't matter which way you do it, you can get the same answers anyway (e.g. you can plot it (p,n) and still ask how many more purchases you get if you get 20 more clicks), but it will be less intuitive that way.

fog37 · Feb 12, 2021

Office_Shredder said:

Which question do you think you are going to ask:

If I get 20 more clicks (independent variable I control through my advertising budget), how many more purchases will I get?

Or

If I get 7 more purchases (by... Increasing the number of clicks? How many more clicks do I need to do that anyway) how many more clicks will I get?

I think it's pretty clear that for most ways that you would want to analyze these variables, the number of clicks is the x-axis and the number of purchases is the y axis.

As long as your are careful about the details it doesn't matter which way you do it, you can get the same answers anyway (e.g. you can plot it (p,n) and still ask how many more purchases you get if you get 20 more clicks), but it will be less intuitive that way.

Yes, I agree. In this case (purchases and clicks) it is naturally more relevant to see how the number of clicks (x-axis) controls the number of purchases (y-axis) and not the other way around.
In the end, I guess it really depends on the context. One of the variable becomes what we are interested in while the other variable, like a knob, is changed. What we really care may be the y-variable but more importantly is the relationship between the two variables considered together.

fog37 · Feb 14, 2021

Just to wrap this topic up:

if the two-variable data was random, it could still be represented by a function, correct? For example, let's consider the ordered pairs ##(t, Temp)## where ##t## is time in seconds and ##Temp## is the temperature of a certain environment. We collect temperature data every second and plot it. The graph looks "random", i.e. we cannot visually identify a precise pattern for the dots in the graph.

However, there is a relation between the set ##t## and the set ##Temp##. Such relation is a function since every time instant ##t## has one temperature value ##Temp## associate to it.

However, regardless of being a function, such data does not have a mathematical equation that can represent it for future unknown values. So there is no equation associated to this function but it is still a function nonetheless.

Of course, if we collected a finite number of values, say 100 values, after the data collection we could still find some equation (higher order complicated polynomial), via best-fit, that could describe the ordered pair ##(t, Temp)##.

I guess we can always find an equation for some two-variable data over a finite domain...

Is that correct? Thanks!

phinds · Feb 14, 2021

If there is no equation then no, it is not a function in the math sense but in the English language sense of one thing being a function of another then sure. In science it's important to not confuse English language terms with the same word as a STEM term even though they may be quite similar in meaning. English is much less precise.

fog37 · Feb 14, 2021

When the definition of function is introduced, even in math books, a function is solely defined as a relation between elements of set A and the elements of another set B with the special property that elements of set A can only have one element in set B.

What this means is that a function can be a graph, a table, a set of instructions that link an element to another element or an equation. But not all functions are equations. Even in the mathematical sense, I guess a function does not to always have to be an equation. If we take the set of functions, functions that are equations is just a subset of the entire function set. I learned this from this Khan Academy video: https://www.khanacademy.org/math/cc-eighth-grade-math/cc-8th-linear-equations-functions/cc-8th-function-intro/v/relations-and-functions

phinds · Feb 14, 2021

fog37 said:

Even in the mathematical sense, I guess a function does not to always have to be an equation.

OK, Not the way I learned it, but that doesn't mean the way I learned it is right.
Wikipedia says

In mathematics, a function is a binary relation between two sets that associates every element of the first set to exactly one element of the second set.

so sounds to me like you're right.

Office_Shredder · Feb 14, 2021

I agree the temperature on a thermometer is a function of time.

fog37 said:

Of course, if we collected a finite number of values, say 100 values, after the data collection we could still find some equation (higher order complicated polynomial), via best-fit, that could describe the ordered pair ##(t, Temp)##.

I strongly recommend you do not do this.

fog37 · Feb 14, 2021

Office_Shredder said:

I agree the temperature on a thermometer is a function of time.
I strongly recommend you do not do this.

Hi, Offfice_shredder. Why not exactly?

My impression is that that particular best-fit equation would/might fit the 100 random data points but would be no good for predicting the 100th value and any other future random value. Is that what you mean?

Office_Shredder · Feb 14, 2021

Yes, it would not be good at predicting anything.

Also you describe it as a "best fit" polynomial, but given the conversation I assume you are actually picking an exact fit polynomial of degree equal to the number of data points you have. That is going to be particularly bad at predicting anything.

fog37 · Feb 14, 2021

Office_Shredder said:

Yes, it would not be good at predicting anything.

Also you describe it as a "best fit" polynomial, but given the conversation I assume you are actually picking an exact fit polynomial of degree equal to the number of data points you have. That is going to be particularly bad at predicting anything.

Thanks Office_Shredder!

So it is (always?) possible to find an overly convoluted polynomial that will pass exactly through N arbitrary points. And that polynomial is exactly Nth degree, correct?

Office_Shredder · Feb 14, 2021

Yeah, actually you can do one better, with N data points you only need a polynomial of degree N-1.

For example any two points have a line passing through them.

The polynomial tends to look very bad when you're extrapolating outside of your data.

For example if you had two temperature data points, you draw a line in between, the function you have will give reasonable values for temperature (though not necessarily accurate values), but if you let time go to infinity it's going to give values that are obviously dumb (either positive or negative infinity depending on the slope).

Stephen Tashi · Feb 18, 2021

fog37 said:

My impression is that that particular best-fit equation would/might fit the 100 random data points

There are different mathematical definitions of "best-fit" and some of them (like linear least squares regression) give different answers depending on which variables are considered dependent and which are considered independent. Some definitions treat all variables on the same footing - for example: https://en.wikipedia.org/wiki/Total_least_squares

Choosing which variable is the dependent or independent variable

1. What is the difference between a dependent and independent variable?

2. How do I determine which variable is the dependent and which is the independent?

3. Can a variable be both dependent and independent?

4. What happens if I incorrectly identify the dependent and independent variables?

5. Can I change the dependent and independent variables during an experiment?

Similar threads

Hot Threads

Recent Insights