group by multiple columns in python pandas

How to Group by Multiple Columns in Python Pandas

Python pandas library makes it easy to work with data and files using Python. Often you may need to group by specific columns in your data. In this article, we will learn how to group by multiple columns in Python pandas.

How to Group by Multiple Columns in Python Pandas

Let us say you have the following data.

import pandas as pd
df = pd.DataFrame([['A','C','A','B','C','A','B','B','A','A'], [1,2,1,1,1,2,1,2,1,3]]).T
df.columns = [['col1','col2']]
print(df)   #printing dataframe.

You will see the following output.

   col1      col2
0    A        1
1    C        2
2    A        1
3    B        1
4    C        1
5    A        2
6    B        1
7    B        2
8    A        1
9    A        3

Here is a simple command to group by multiple columns col1 and col2 and get count of each unique values for col1 and col2. In this case, we need to create a separate column, say, COUNTER, which counts the groupings.

df['COUNTER'] =1       #initially, set that counter to 1.
group_data = df.groupby(['col1','col2'])['COUNTER'].sum() #sum function

Here is the output you will get.

col1 col2 
A     1     3
      3     1
      2     1
B     1     2
      2     1
C     1     1
      2     1

Alternatively, you can also use size() function for the above output, without using COUNTER variables.

df.groupby(['col1', 'col2']).size() #size function

Similarly, you can use sum() function to get sum,

df.groupby(['col1', 'col2'])['COUNTER'].sum() #sum function

In this short article, we have learnt how to easily group data by multiple columns in Python pandas. You can modify the code as per your requirement.

Also read:

How to Access Environment Variables in Python
How to Password Protect PDF in Python
How to Read Inputs as Numbers in Python
How to Split List into Evenly Sized Chunks
How to Fix Temporary Failure in Name Resolution Issue

Leave a Reply

Your email address will not be published.