Choosing which variable is the dependent or independent variable

In summary, when graphically representing a pair of variables like time and cost on a Cartesian plane, the horizontal axis is usually assigned to the independent variable and the vertical axis to the dependent variable. However, in some cases, there may not be a cause and effect relationship between the two variables. The decision of which variable to assign to each axis depends on the context and what is trying to be shown. In general, the variable that is being emphasized or controlled is placed on the horizontal axis, while the variable of interest is placed on the vertical axis. This can vary depending on the specific case and it is important to carefully consider the details. Ultimately, the choice of axis placement should be based on the question being asked and the information being presented
  • #1
fog37
1,568
108
TL;DR Summary
Choosing which variable is the dependent or independent variable
Hello,
I am review some basics. In the case of two variables like ##t## and ##C##, which represent the time in year and the cost in $ of a product respectively, we can graphically represent this pair of variables on a Cartesian plane. The horizonal axis is generally assigned the variable ##t## and the vertical axis to the variable ##C##.

The horizontal axis is commonly reserved for the independent variable and the vertical one for the dependent variable. But this idea of dependent/independent or cause/effect does not always apply. For example, there many not be any cause/effect or "dependence" between the two variables. We are just pairing them and visualize the pairs of values as points.

So, in general, given two variables, how do we choose which one to assign to the horizontal axis and which to the vertical axis? What is the best practice criterion? We can clearly do things the other way without breaking any rule. Is the vertical axis for the variable that we consider...

For example, if ##n## is the number of clicks and ##p## is the number of purchases associated to the number of clicks, we can form the ordered pairs ##(n,p)## or also the ordered pairs ##(p,n)## and plot these pairs as points either in the graph "p vs n" or "n vs p"...

Thanks
 
Mathematics news on Phys.org
  • #2
fog37 said:
For example, if ##n## is the number of clicks and ##p## is the number of purchases associated to the number of clicks, we can form the ordered pairs ##(n,p)## or also the ordered pairs ##(p,n)## and plot these pairs as points either in the graph "p vs n" or "n vs p"...
Exactly. Why would you want one of these to be preferable? Well, it depends on what you are trying to show. THAT is the criterion.
 
  • Like
Likes fog37
  • #3
phinds said:
Exactly. Why would you want one of these to be preferable? Well, it depends on what you are trying to show. THAT is the criterion.

Thanks phinds. When you say "it depends on what you are trying to show", I interpret that as meaning that we may be interested in showing how one specific variable varies as we vary the other variable. That puts emphasis on the first variable and subordinates the other variable...
For example, we are generally interested in seeing how the cost (variable ##C##) behaves as times goes by, i.e. as we increase the variable ##t## so we put ##C## on the vertical axis and ##t## on the horizontal axis.

The other way around, seeing how time ##t## increases as we increase the price ##C## does not make much sense even if it is surely allowed.

So, in many cases, I guess it depends on the context and in other cases it does not matter at all which variable is on the vertical or horizontal axis.

For example, in a multi-variable problem where we have, say, 10 variables, we can choose one of them to be the dependent variable and the other become the independent ones...
 
  • #5
fog37 said:
The other way around, seeing how time ##t## increases as we increase the price ##C## does not make much sense even if it is surely allowed.
Right. There are specific cases where it would make little to no sense for one of the variables to be the dependent variable, but IN GENERAL, it depends on what you are trying to show.
 
  • Like
Likes jedishrfu
  • #6
fog37 said:
For example, if ##n## is the number of clicks and ##p## is the number of purchases associated to the number of clicks, we can form the ordered pairs ##(n,p)## or also the ordered pairs ##(p,n)## and plot these pairs as points either in the graph "p vs n" or "n vs p"...

Thanks

Which question do you think you are going to ask:

If I get 20 more clicks (independent variable I control through my advertising budget), how many more purchases will I get?

Or

If I get 7 more purchases (by... Increasing the number of clicks? How many more clicks do I need to do that anyway) how many more clicks will I get?

I think it's pretty clear that for most ways that you would want to analyze these variables, the number of clicks is the x-axis and the number of purchases is the y axis.

As long as your are careful about the details it doesn't matter which way you do it, you can get the same answers anyway (e.g. you can plot it (p,n) and still ask how many more purchases you get if you get 20 more clicks), but it will be less intuitive that way.
 
  • Like
Likes fog37
  • #7
Office_Shredder said:
Which question do you think you are going to ask:

If I get 20 more clicks (independent variable I control through my advertising budget), how many more purchases will I get?

Or

If I get 7 more purchases (by... Increasing the number of clicks? How many more clicks do I need to do that anyway) how many more clicks will I get?

I think it's pretty clear that for most ways that you would want to analyze these variables, the number of clicks is the x-axis and the number of purchases is the y axis.

As long as your are careful about the details it doesn't matter which way you do it, you can get the same answers anyway (e.g. you can plot it (p,n) and still ask how many more purchases you get if you get 20 more clicks), but it will be less intuitive that way.

Yes, I agree. In this case (purchases and clicks) it is naturally more relevant to see how the number of clicks (x-axis) controls the number of purchases (y-axis) and not the other way around.
In the end, I guess it really depends on the context. One of the variable becomes what we are interested in while the other variable, like a knob, is changed. What we really care may be the y-variable but more importantly is the relationship between the two variables considered together.
 
Last edited:
  • #8
Just to wrap this topic up:

if the two-variable data was random, it could still be represented by a function, correct? For example, let's consider the ordered pairs ##(t, Temp)## where ##t## is time in seconds and ##Temp## is the temperature of a certain environment. We collect temperature data every second and plot it. The graph looks "random", i.e. we cannot visually identify a precise pattern for the dots in the graph.

However, there is a relation between the set ##t## and the set ##Temp##. Such relation is a function since every time instant ##t## has one temperature value ##Temp## associate to it.

However, regardless of being a function, such data does not have a mathematical equation that can represent it for future unknown values. So there is no equation associated to this function but it is still a function nonetheless.

Of course, if we collected a finite number of values, say 100 values, after the data collection we could still find some equation (higher order complicated polynomial), via best-fit, that could describe the ordered pair ##(t, Temp)##.

I guess we can always find an equation for some two-variable data over a finite domain...

Is that correct? Thanks!
 
  • #9
If there is no equation then no, it is not a function in the math sense but in the English language sense of one thing being a function of another then sure. In science it's important to not confuse English language terms with the same word as a STEM term even though they may be quite similar in meaning. English is much less precise.
 
  • #10
When the definition of function is introduced, even in math books, a function is solely defined as a relation between elements of set A and the elements of another set B with the special property that elements of set A can only have one element in set B.

What this means is that a function can be a graph, a table, a set of instructions that link an element to another element or an equation. But not all functions are equations. Even in the mathematical sense, I guess a function does not to always have to be an equation. If we take the set of functions, functions that are equations is just a subset of the entire function set. I learned this from this Khan Academy video: https://www.khanacademy.org/math/cc-eighth-grade-math/cc-8th-linear-equations-functions/cc-8th-function-intro/v/relations-and-functions
 
  • #11
fog37 said:
Even in the mathematical sense, I guess a function does not to always have to be an equation.
OK, Not the way I learned it, but that doesn't mean the way I learned it is right.
Wikipedia says
In mathematics, a function is a binary relation between two sets that associates every element of the first set to exactly one element of the second set.
so sounds to me like you're right.
 
  • #12
I agree the temperature on a thermometer is a function of time.

fog37 said:
Of course, if we collected a finite number of values, say 100 values, after the data collection we could still find some equation (higher order complicated polynomial), via best-fit, that could describe the ordered pair ##(t, Temp)##.

I strongly recommend you do not do this.
 
  • Like
  • Haha
Likes fog37 and suremarc
  • #13
Office_Shredder said:
I agree the temperature on a thermometer is a function of time.
I strongly recommend you do not do this.

Hi, Offfice_shredder. Why not exactly?

My impression is that that particular best-fit equation would/might fit the 100 random data points but would be no good for predicting the 100th value and any other future random value. Is that what you mean?
 
  • #14
Yes, it would not be good at predicting anything.

Also you describe it as a "best fit" polynomial, but given the conversation I assume you are actually picking an exact fit polynomial of degree equal to the number of data points you have. That is going to be particularly bad at predicting anything.
 
  • Like
Likes fog37
  • #15
Office_Shredder said:
Yes, it would not be good at predicting anything.

Also you describe it as a "best fit" polynomial, but given the conversation I assume you are actually picking an exact fit polynomial of degree equal to the number of data points you have. That is going to be particularly bad at predicting anything.
Thanks Office_Shredder!

So it is (always?) possible to find an overly convoluted polynomial that will pass exactly through N arbitrary points. And that polynomial is exactly Nth degree, correct?
 
  • #16
Yeah, actually you can do one better, with N data points you only need a polynomial of degree N-1.

For example any two points have a line passing through them.

The polynomial tends to look very bad when you're extrapolating outside of your data.

For example if you had two temperature data points, you draw a line in between, the function you have will give reasonable values for temperature (though not necessarily accurate values), but if you let time go to infinity it's going to give values that are obviously dumb (either positive or negative infinity depending on the slope).
 
  • Like
Likes fog37
  • #17
fog37 said:
My impression is that that particular best-fit equation would/might fit the 100 random data points

There are different mathematical definitions of "best-fit" and some of them (like linear least squares regression) give different answers depending on which variables are considered dependent and which are considered independent. Some definitions treat all variables on the same footing - for example: https://en.wikipedia.org/wiki/Total_least_squares
 
  • Like
Likes fog37

1. What is the difference between a dependent and independent variable?

A dependent variable is the outcome or result of an experiment, while an independent variable is the factor that is being manipulated or changed by the researcher.

2. How do I determine which variable is the dependent and which is the independent?

The dependent variable is usually the one that is measured or observed, while the independent variable is the one that is controlled or changed by the researcher. It is important to identify the cause and effect relationship between the two variables to determine which is dependent and which is independent.

3. Can a variable be both dependent and independent?

In some cases, a variable can act as both dependent and independent. This is known as a mediating variable, where it is affected by the independent variable and also affects the dependent variable.

4. What happens if I incorrectly identify the dependent and independent variables?

If the dependent and independent variables are incorrectly identified, it can lead to inaccurate results and conclusions. It is important to carefully consider the relationship between the variables and accurately label them in order to conduct a valid experiment.

5. Can I change the dependent and independent variables during an experiment?

It is not recommended to change the dependent and independent variables during an experiment, as it can affect the reliability and validity of the results. However, if necessary, changes should be carefully documented and justified in the experimental design.

Similar threads

  • Set Theory, Logic, Probability, Statistics
Replies
9
Views
507
  • Set Theory, Logic, Probability, Statistics
Replies
3
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
2
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
5
Views
969
Replies
1
Views
2K
Replies
1
Views
2K
Replies
80
Views
3K
  • Quantum Interpretations and Foundations
2
Replies
45
Views
3K
Replies
12
Views
724
  • Set Theory, Logic, Probability, Statistics
Replies
2
Views
1K
Back
Top