merge dataframe in python pandas

How to Merge Dataframes in Pandas Based on Columns

Python Pandas is a powerful library that allows you to work with data organized in tabular manner as rows & columns. Often you may need to merge dataframes in Pandas based on columns. In this article, we will learn how to merge dataframes in Pandas based on columns. You can easily merge pandas dataframe using the merge function.

DataFrame.merge(right, how=’inner’, on=None, left_on=None, right_on=None, left_index=False, right_index=False, sort=False, copy=True, indicator=False, validate=None)

However, the above function is quite versatile and can be used to merge dataframes in multiple ways.


How to Merge Dataframes in Pandas Based on Columns

We will look at several examples to merge dataframes in Python Pandas.

1. Perform Inner Join or Equi-join

Here is the most basic example to merge dataframes in Python pandas. Let us say you have the following 2 data frames.

# importing modules
import pandas as pd

# creating a dataframe
df1 = pd.DataFrame({'Name':['Jay', 'John', 'Jim', 'Tim', 'Sam'],
					'Marks':[80, 90, 75, 88, 59]})

# creating another dataframe with different data
df2 = pd.DataFrame({'Name':['Jay', 'John', 'Jim', 'Tim'],
					'Grade':['A', 'A', 'B', 'A'],
					'Rank':[3, 1, 4, 2 ],
					'Gender':['Male', 'Male', 'Male', 'Female']})
# display df1
display(df1)

# display df2
display(df2)

Here is the command to merge these two dataframes.

# applying merge
df1.merge(df2[['Name', 'Grade', 'Rank']])

In the above command, we will retain all columns of df1 and only some columns (Name, Grade, Rank) of df2. That is, the result will contain Name, Marks, Grade, Rank. Since both dataframes have different number of rows, the result will contain only the common rows.

This is like an inner join or equijoin in databases.

2. Left Join

In this case, we will do a left join using merge function. That is, we will retain all rows of left dataframe df1 and only matching rows of right dataframe df2. You can easily do this using the following command.

# applying merge with more parameters
df1.merge(df2[['Grade', 'Name']], on = 'Name', how = 'left')

In the above command, we are merging df1 with df2 on column ‘Name’ and merge type specified by how=’Left’. So all rows of df1 will be present in result while only matching rows of df2 will be present. The result will contain Name, Marks & Grade columns.

3. Reverse Merge

In this case, we merge df1 with df2 instead of the other way around. Here we merge df2 with df1 with common values based on key columns.

# applying merge with more parameters
df2.merge(df1[['Marks', 'Name']])

The result will contain Name, Grade, Rank, Gender, Marks columns.

In this article, we have learnt how to merge two dataframes in python pandas. The merge function in python pandas is very useful to combine dataframes according to different conditions. If you have worked with SQL databases, then you will realize that it allows you to perform inner joins/equijoins, left as well as right merges.

Also read:

How to Change Default Display Manager in Ubuntu
How to Fix “Please Install All Available Updates”
How to Change Login Screen Background in Ubuntu
Most Common Fdisk Commands in Linux
How to Convert Home Directory into Partition in Linux

Leave a Reply

Your email address will not be published.