Python pandas library makes it easy to work with data and files using Python. Often you may need to group by specific columns in your data. In this article, we will learn how to group by multiple columns in Python pandas.
How to Group by Multiple Columns in Python Pandas
Let us say you have the following data.
import pandas as pd df = pd.DataFrame([['A','C','A','B','C','A','B','B','A','A'], [1,2,1,1,1,2,1,2,1,3]]).T df.columns = [['col1','col2']] print(df) #printing dataframe.
You will see the following output.
col1 col2 0 A 1 1 C 2 2 A 1 3 B 1 4 C 1 5 A 2 6 B 1 7 B 2 8 A 1 9 A 3
Here is a simple command to group by multiple columns col1 and col2 and get count of each unique values for col1 and col2. In this case, we need to create a separate column, say, COUNTER, which counts the groupings.
df['COUNTER'] =1 #initially, set that counter to 1. group_data = df.groupby(['col1','col2'])['COUNTER'].sum() #sum function print(group_data)
Here is the output you will get.
col1 col2 A 1 3 3 1 2 1 B 1 2 2 1 C 1 1 2 1
Alternatively, you can also use size() function for the above output, without using COUNTER variables.
df.groupby(['col1', 'col2']).size() #size function
Similarly, you can use sum() function to get sum,
df.groupby(['col1', 'col2'])['COUNTER'].sum() #sum function print(group_data)
In this short article, we have learnt how to easily group data by multiple columns in Python pandas. You can modify the code as per your requirement.
Also read:
How to Access Environment Variables in Python
How to Password Protect PDF in Python
How to Read Inputs as Numbers in Python
How to Split List into Evenly Sized Chunks
How to Fix Temporary Failure in Name Resolution Issue
Related posts:
Sreeram has more than 10 years of experience in web development, Python, Linux, SQL and database programming.