Cleaning/Reordering towards regression

AI Thread Summary
To perform a regression analysis of variable A on variable B using Excel, it is essential that the data for both variables is properly paired by a common identifier, such as "Person." If the data is currently ordered by different criteria (e.g., height and weight), it must be rearranged to align the corresponding entries. This can be achieved by using Excel functions like MATCH and INDEX to create a unified dataset that combines height and weight for each individual. Specifically, one can create a new column in the height data that uses the MATCH function to find the corresponding weight based on the person's name, and then use the INDEX function to retrieve the weight. This method allows for a straightforward pairing of the data without the need for advanced tools like Visual Basic. It is also important to ensure that there are no discrepancies in the key identifiers, as typos or missing entries can complicate the matching process.
WWGD
Science Advisor
Homework Helper
Messages
7,678
Reaction score
12,349
TL;DR Summary
Excel: How to regress variable A on variable B when values for A are not matched with those for B? I mean, we have a population P for which we obtain values A=(a_i) and B=(b_i), but order in which A is presented is different than that in which B is presented. Example: Say I have separately the age and weight of 100 people but not given by :( Name, Age, Weight), but rather ordered separately by Age, Weight. How to easily re-order the latter to appear as triplets ( Name, Age, Weight)?
I have quantitative data on all countries on two variables, say A,B in Excel and I am trying to regress A on B. Problem is that data are ordered based on the magnitude of A, B , rather than Alpha by country. Is there a reasonable way of ordering by country for each and then regress A on B? If I rearrange Alpha by country, it will not, Afaik, match for A,B fittingly.
 
Computer science news on Phys.org
I'm no clear on what you mean. Are A and B paired in the data or not? What does "ordered separately by Age, Weight" mean? What do your records look like? What columns of information are in the records?
The regression does not care about the order of the data, but it can not do a regression if the variables are not paired up. You may need to use names to get the data into the same order.
Are you saying that you have two tables (age, name) and (weight, name), each ordered by the first variable? If so, sort both tables by "name" and get the table (name, age, weight). Then do a regression of the variables "age" and "weight".

PS. It is very common that two tables that are supposed to have the same key, like "name", are not really accurate on the key variable. The tables have typos, missing entries, etc. that make problems in matching up the two tables. You may have to deal with that in some way.
 
Last edited:
  • Like
Likes WWGD
Yes, clearly I have to pair it. I'm just wondering if there is a " reasonably nice" way of doing it within Excel and without requiring, e.g., Visual Basic or other tools I'm not conversant with. I have data the likes of some 200 data points of :(Person, Height), ordered by height and ( Person, Weight), ordered by weight and . I want to obtain a triple ( Person, Height, Weight) from the first two, so that the ith person is matched with their height and weight. Just wondering if there is a " nice" way of doing this within Excel.
 
Let's say you have a Heights tab and a Weights tab. Names are in column A and Heights or Weights in column B. Data start on row 5.

In Heights!C5 enter the formula =match($A5,Weights!$A:$A,0).
In Heights!D5 enter the formula =index(Weights!B:B,$C5).

Copy the formulae down.

Edit: example on Google Sheets.
 
Last edited:
  • Like
Likes FactChecker and WWGD
WWGD said:
Yes, clearly
The whole situation wasn't clear to me. But I am not the sharpest knife in the drawer. ;-)
I have data the likes of some 200 data points of :(Person, Height), ordered by height and ( Person, Weight), ordered by weight and . I want to obtain a triple ( Person, Height, Weight) from the first two, so that the ith person is matched with their height and weight. Just wondering if there is a " nice" way of doing this within Excel.​
Now it is clear.
 
FactChecker said:
The whole situation wasn't clear to me. But I am not the sharpest knife in the drawer. ;-)
I have data the likes of some 200 data points of :(Person, Height), ordered by height and ( Person, Weight), ordered by weight and . I want to obtain a triple ( Person, Height, Weight) from the first two, so that the ith person is matched with their height and weight. Just wondering if there is a " nice" way of doing this within Excel.​
Now it is clear.
Sorry if I was abrupt ; it was pre-caffeine ;).
 
  • Haha
Likes FactChecker
WWGD said:
Sorry if I was abrupt ; it was pre-caffeine ;).
Ha! No problem. I completely understand. :cool:
 
  • Like
Likes WWGD
Back
Top