drop duplicates in pandas

How to Remove Duplicates in Python Pandas

Python pandas library allows you to easily import data from various sources, analyze & transform it quickly, and export data in different formats. Often while importing data you may notice that there are duplicate rows in it. You can easily remove duplicates in Python Pandas. In this article, we will learn how to remove duplicates in Python Pandas.


How to Remove Duplicates in Python Pandas

Let us say you have the following pandas dataframe.

    A   B   C
0   joe 0   A
1   joe 1   A
2   joe 1   B
3   jim 1   A

Here is the code to create such as Pandas dataframe.

import pandas as pd
df = pd.DataFrame({"A":["joe", "joe", "joe", "jim"], "B":[0,1,1,1], "C":["A","A","B","A"]})

If you want to remove duplicates from columns A and C, you can use the drop_duplicates() function.

df1=df.drop_duplicates(subset=['A', 'C'], keep=False)

So now your dataframe will be as shown below.

    A   B   C
2   joe 1   B
3   jim 1   A

In the above function, we have used keep=False argument. It can take 3 values.

first : Drop duplicates except for the first occurrence.

last : Drop duplicates except for the last occurrence.

False : Drop all duplicates.

Please note, the following commands will remove duplicates but store result in another dataframe.

df.drop_duplicates(subset=['A', 'C'],keep=False)
or

df.drop_duplicates(subset=['A', 'C'],keep=False, inplace=False)

If you want to update the same dataset, use the following code instead.

df.drop_duplicates(subset=['A', 'C'],keep=False, inplace=True)

If you remove subset argument from above function, it will drop only rows where all column values are duplicates.

If your dataframe columns are not labelled, then you can use df.columns[…] to identify them by index, the first column has index 0, the next one has index 1 and so on.

df.drop_duplicates(subset=[df.columns[0:2]], keep = False)

In this article, we have learnt how to drop duplicate rows in Pandas dataframe, and also several ways to use it.

Also read:

How to Check if String is Integer in Python
How to Shuffle List of Objects in Python
How to Find Local IP Address in Python
How to Fix ValueError: Invalid Literal in Python
How to Drop Rows in Pandas with NaN Values

Leave a Reply

Your email address will not be published. Required fields are marked *