Why are all values NaN after mapping 'player_name' column in Pandas Data Frame?

Click For Summary

Discussion Overview

The discussion revolves around the issue of obtaining NaN values in a new column 'player_name' in a Pandas DataFrame (df2) after attempting to map it from another DataFrame (df1) using a common key 'player_api_id'. Participants explore the reasons for this outcome and suggest alternative methods for achieving the desired result.

Discussion Character

  • Technical explanation
  • Debate/contested
  • Mathematical reasoning

Main Points Raised

  • One participant describes their attempt to map 'player_name' from df1 to df2 using the 'player_api_id' column, resulting in all NaN values.
  • Another participant questions whether the two DataFrames are the same size and if they share a primary key, suggesting a merge as a potential solution.
  • A participant provides a link to a dataset on Kaggle, indicating their interest in specific columns and confirming that the DataFrames are not the same size but may share a primary key.
  • One participant suggests creating a dictionary from df1 that maps 'player_api_id' to 'player_name' and using this dictionary for the mapping in df2, assuming no duplicates exist in the Player table.
  • A later reply confirms that the dictionary approach works for adding player names to df2.

Areas of Agreement / Disagreement

Participants express differing views on the initial mapping approach, with some suggesting alternative methods. There is no consensus on the best solution, but the dictionary method is confirmed to work by one participant.

Contextual Notes

Participants note the importance of ensuring that the key columns are correctly aligned and that duplicates in the Player table could affect the mapping process.

Who May Find This Useful

Data analysts and programmers working with Pandas DataFrames, especially those dealing with data merging and mapping issues in Python.

Arman777
Insights Author
Gold Member
Messages
2,163
Reaction score
191
I have two data frames df1 and df2

df1 has two columns 'player_name' and 'player_id'.

Similarly df2 has 'player_id' column.

From this configuration I want to pass 'player_name' column to df2 by using 'player_id'. For this reason I have tried something like this,

Code:
df2['player_name'] = df2['player_api_id'].map(df1['player_name'])

The code runs without error and I obtain 'player_name' column in df2 but all the values are NaN. I did not understand why this happens.
 
Technology news on Phys.org
Two questions. Are the two dataframes the same size and do they share a column that acts like a primary key? If so, then I would use a merge with just the column that you want to add and its key from the other dataframe.
 
You can look at the data from here

https://www.kaggle.com/hugomathien/soccer

I am only interested in Player and Player_Attributes datas. In those data as you can see there are two columns that has the same name; player_api_id.

So as I have said before I want to move player_name from the Player data to Player_Attributes by using the player_api_id.

Borg said:
Are the two dataframes the same size
Nope

Borg said:
ey share a column that acts like a primary key?
I guess so
 
Sorry, I was responding from my phone and didn't read closely enough.
Arman777 said:
df2['player_name'] = df2['player_api_id'].map(df1['player_name'])
If you want to add player names to df2 from df1, you would need to replace the df1['player_name'] part with a dictionary of IDs and player names from df1. Assuming that the Player table has no duplicates, something like this:
player_name_dictionary = dict(zip(df1.player_api_id, df1.player_name)) df2['player_name'] = df2['player_api_id'].map(player_name_dictionary)
 
  • Like
Likes   Reactions: Arman777
Borg said:
Sorry, I was responding from my phone and didn't read closely enough.

If you want to add player names to df2 from df1, you would need to replace the df1['player_name'] part with a dictionary of IDs and player names from df1. Assuming that the Player table has no duplicates, something like this:
player_name_dictionary = dict(zip(df1.player_api_id, df1.player_name)) df2['player_name'] = df2['player_api_id'].map(player_name_dictionary)
thanks a lot. It works
 
  • Like
Likes   Reactions: Borg

Similar threads

  • · Replies 8 ·
Replies
8
Views
2K
  • · Replies 7 ·
Replies
7
Views
2K
Replies
42
Views
9K
  • · Replies 2 ·
Replies
2
Views
2K
  • · Replies 4 ·
Replies
4
Views
2K
  • · Replies 29 ·
Replies
29
Views
3K
Replies
2
Views
2K
Replies
1
Views
2K
  • · Replies 18 ·
Replies
18
Views
6K
  • · Replies 10 ·
Replies
10
Views
26K