Cleaning/Reordering towards regression

Click For Summary
SUMMARY

This discussion focuses on performing regression analysis in Excel using two datasets: one containing heights and the other containing weights, both ordered by the individual’s name. The user seeks a method to combine these datasets into a single table containing the person’s name, height, and weight without using advanced tools like Visual Basic. The solution involves using Excel functions MATCH and INDEX to align the data correctly, ensuring that the variables are paired accurately for regression analysis.

PREREQUISITES
  • Basic knowledge of Excel functions, specifically MATCH and INDEX
  • Understanding of regression analysis concepts
  • Familiarity with data organization in spreadsheets
  • Ability to troubleshoot data mismatches and errors
NEXT STEPS
  • Learn how to use Excel's MATCH function for data alignment
  • Explore the INDEX function for retrieving data from different tables
  • Study regression analysis techniques in Excel
  • Investigate data cleaning methods to resolve mismatches in datasets
USEFUL FOR

This discussion is beneficial for data analysts, Excel users, and anyone involved in statistical analysis who needs to combine and analyze paired datasets efficiently.

WWGD
Science Advisor
Homework Helper
Messages
7,779
Reaction score
13,023
TL;DR
Excel: How to regress variable A on variable B when values for A are not matched with those for B? I mean, we have a population P for which we obtain values A=(a_i) and B=(b_i), but order in which A is presented is different than that in which B is presented. Example: Say I have separately the age and weight of 100 people but not given by :( Name, Age, Weight), but rather ordered separately by Age, Weight. How to easily re-order the latter to appear as triplets ( Name, Age, Weight)?
I have quantitative data on all countries on two variables, say A,B in Excel and I am trying to regress A on B. Problem is that data are ordered based on the magnitude of A, B , rather than Alpha by country. Is there a reasonable way of ordering by country for each and then regress A on B? If I rearrange Alpha by country, it will not, Afaik, match for A,B fittingly.
 
Computer science news on Phys.org
I'm no clear on what you mean. Are A and B paired in the data or not? What does "ordered separately by Age, Weight" mean? What do your records look like? What columns of information are in the records?
The regression does not care about the order of the data, but it can not do a regression if the variables are not paired up. You may need to use names to get the data into the same order.
Are you saying that you have two tables (age, name) and (weight, name), each ordered by the first variable? If so, sort both tables by "name" and get the table (name, age, weight). Then do a regression of the variables "age" and "weight".

PS. It is very common that two tables that are supposed to have the same key, like "name", are not really accurate on the key variable. The tables have typos, missing entries, etc. that make problems in matching up the two tables. You may have to deal with that in some way.
 
Last edited:
  • Like
Likes   Reactions: WWGD
Yes, clearly I have to pair it. I'm just wondering if there is a " reasonably nice" way of doing it within Excel and without requiring, e.g., Visual Basic or other tools I'm not conversant with. I have data the likes of some 200 data points of :(Person, Height), ordered by height and ( Person, Weight), ordered by weight and . I want to obtain a triple ( Person, Height, Weight) from the first two, so that the ith person is matched with their height and weight. Just wondering if there is a " nice" way of doing this within Excel.
 
Let's say you have a Heights tab and a Weights tab. Names are in column A and Heights or Weights in column B. Data start on row 5.

In Heights!C5 enter the formula =match($A5,Weights!$A:$A,0).
In Heights!D5 enter the formula =index(Weights!B:B,$C5).

Copy the formulae down.

Edit: example on Google Sheets.
 
Last edited:
  • Like
Likes   Reactions: FactChecker and WWGD
WWGD said:
Yes, clearly
The whole situation wasn't clear to me. But I am not the sharpest knife in the drawer. ;-)
I have data the likes of some 200 data points of :(Person, Height), ordered by height and ( Person, Weight), ordered by weight and . I want to obtain a triple ( Person, Height, Weight) from the first two, so that the ith person is matched with their height and weight. Just wondering if there is a " nice" way of doing this within Excel.​
Now it is clear.
 
FactChecker said:
The whole situation wasn't clear to me. But I am not the sharpest knife in the drawer. ;-)
I have data the likes of some 200 data points of :(Person, Height), ordered by height and ( Person, Weight), ordered by weight and . I want to obtain a triple ( Person, Height, Weight) from the first two, so that the ith person is matched with their height and weight. Just wondering if there is a " nice" way of doing this within Excel.​
Now it is clear.
Sorry if I was abrupt ; it was pre-caffeine ;).
 
  • Haha
Likes   Reactions: FactChecker
WWGD said:
Sorry if I was abrupt ; it was pre-caffeine ;).
Ha! No problem. I completely understand. :cool:
 
  • Like
Likes   Reactions: WWGD

Similar threads

  • · Replies 13 ·
Replies
13
Views
4K
  • · Replies 4 ·
Replies
4
Views
2K
  • · Replies 4 ·
Replies
4
Views
2K
  • · Replies 30 ·
2
Replies
30
Views
4K
  • · Replies 2 ·
Replies
2
Views
2K
  • · Replies 8 ·
Replies
8
Views
3K
  • · Replies 2 ·
Replies
2
Views
2K
  • · Replies 9 ·
Replies
9
Views
24K
  • · Replies 4 ·
Replies
4
Views
2K
  • · Replies 5 ·
Replies
5
Views
2K