Cleaning/Reordering towards regression

Click For Summary

Discussion Overview

The discussion revolves around the challenge of pairing quantitative data on individuals' heights and weights, which are stored in separate Excel tables ordered by different criteria. Participants explore methods to align these datasets for regression analysis without using advanced tools like Visual Basic.

Discussion Character

  • Technical explanation
  • Homework-related

Main Points Raised

  • One participant seeks a method to reorder and pair data on height and weight for regression analysis, expressing concern about maintaining the correct associations between individuals.
  • Another participant questions whether the data for heights and weights are paired and clarifies that the order of data does not affect regression as long as the variables are correctly matched.
  • A participant confirms the need to pair the data and asks for a straightforward Excel solution, emphasizing their lack of familiarity with more complex tools.
  • A suggestion is made to use Excel formulas to match names between the height and weight tables, allowing for the creation of a combined dataset.
  • Several participants express confusion about the initial problem but later clarify their understanding of the data structure and requirements.

Areas of Agreement / Disagreement

Participants generally agree on the need to pair the data correctly for regression analysis, but there is no consensus on the best method to achieve this within Excel without using advanced tools.

Contextual Notes

Participants note potential issues with data accuracy, such as typos or missing entries, which could complicate the pairing process.

WWGD
Science Advisor
Homework Helper
Messages
7,785
Reaction score
13,076
TL;DR
Excel: How to regress variable A on variable B when values for A are not matched with those for B? I mean, we have a population P for which we obtain values A=(a_i) and B=(b_i), but order in which A is presented is different than that in which B is presented. Example: Say I have separately the age and weight of 100 people but not given by :( Name, Age, Weight), but rather ordered separately by Age, Weight. How to easily re-order the latter to appear as triplets ( Name, Age, Weight)?
I have quantitative data on all countries on two variables, say A,B in Excel and I am trying to regress A on B. Problem is that data are ordered based on the magnitude of A, B , rather than Alpha by country. Is there a reasonable way of ordering by country for each and then regress A on B? If I rearrange Alpha by country, it will not, Afaik, match for A,B fittingly.
 
Computer science news on Phys.org
I'm no clear on what you mean. Are A and B paired in the data or not? What does "ordered separately by Age, Weight" mean? What do your records look like? What columns of information are in the records?
The regression does not care about the order of the data, but it can not do a regression if the variables are not paired up. You may need to use names to get the data into the same order.
Are you saying that you have two tables (age, name) and (weight, name), each ordered by the first variable? If so, sort both tables by "name" and get the table (name, age, weight). Then do a regression of the variables "age" and "weight".

PS. It is very common that two tables that are supposed to have the same key, like "name", are not really accurate on the key variable. The tables have typos, missing entries, etc. that make problems in matching up the two tables. You may have to deal with that in some way.
 
Last edited:
  • Like
Likes   Reactions: WWGD
Yes, clearly I have to pair it. I'm just wondering if there is a " reasonably nice" way of doing it within Excel and without requiring, e.g., Visual Basic or other tools I'm not conversant with. I have data the likes of some 200 data points of :(Person, Height), ordered by height and ( Person, Weight), ordered by weight and . I want to obtain a triple ( Person, Height, Weight) from the first two, so that the ith person is matched with their height and weight. Just wondering if there is a " nice" way of doing this within Excel.
 
Let's say you have a Heights tab and a Weights tab. Names are in column A and Heights or Weights in column B. Data start on row 5.

In Heights!C5 enter the formula =match($A5,Weights!$A:$A,0).
In Heights!D5 enter the formula =index(Weights!B:B,$C5).

Copy the formulae down.

Edit: example on Google Sheets.
 
Last edited:
  • Like
Likes   Reactions: FactChecker and WWGD
WWGD said:
Yes, clearly
The whole situation wasn't clear to me. But I am not the sharpest knife in the drawer. ;-)
I have data the likes of some 200 data points of :(Person, Height), ordered by height and ( Person, Weight), ordered by weight and . I want to obtain a triple ( Person, Height, Weight) from the first two, so that the ith person is matched with their height and weight. Just wondering if there is a " nice" way of doing this within Excel.​
Now it is clear.
 
FactChecker said:
The whole situation wasn't clear to me. But I am not the sharpest knife in the drawer. ;-)
I have data the likes of some 200 data points of :(Person, Height), ordered by height and ( Person, Weight), ordered by weight and . I want to obtain a triple ( Person, Height, Weight) from the first two, so that the ith person is matched with their height and weight. Just wondering if there is a " nice" way of doing this within Excel.​
Now it is clear.
Sorry if I was abrupt ; it was pre-caffeine ;).
 
  • Haha
Likes   Reactions: FactChecker
WWGD said:
Sorry if I was abrupt ; it was pre-caffeine ;).
Ha! No problem. I completely understand. :cool:
 
  • Like
Likes   Reactions: WWGD

Similar threads

  • · Replies 13 ·
Replies
13
Views
4K
  • · Replies 4 ·
Replies
4
Views
2K
  • · Replies 4 ·
Replies
4
Views
2K
  • · Replies 30 ·
2
Replies
30
Views
4K
  • · Replies 2 ·
Replies
2
Views
2K
  • · Replies 8 ·
Replies
8
Views
3K
  • · Replies 2 ·
Replies
2
Views
2K
  • · Replies 9 ·
Replies
9
Views
24K
  • · Replies 4 ·
Replies
4
Views
2K
  • · Replies 5 ·
Replies
5
Views
2K