Sometimes you may need to filter rows in your Pandas Dataframe using column values, pretty much like running a SELECT SQL query against a database table. There are several ways to do this in Python Pandas. In this article, we will learn how to select rows from dataframe using column values in Python Pandas.
How to Select Rows from Dataframe Using Column Values
Here are the different ways to filter rows from dataframe using column values.
1. Using loc() function
Here is the command to select rows with column value equal to scalar ‘value’, use == operator.
df.loc[df['column_name'] == value]
Here is an example to filter rows where column A=’foo’.
import pandas as pd import numpy as np df = pd.DataFrame({'A': 'foo bar foo bar foo bar foo foo'.split(), 'B': 'one one two three two two one three'.split(), 'C': np.arange(8), 'D': np.arange(8) * 2}) print(df) # A B C D # 0 foo one 0 0 # 1 bar one 1 2 # 2 foo two 2 4 # 3 bar three 3 6 # 4 foo two 4 8 # 5 bar two 5 10 # 6 foo one 6 12 # 7 foo three 7 14 print(df.loc[df['A'] == 'foo'])
Here is the output you will see.
A B C D 0 foo one 0 0 2 foo two 2 4 4 foo two 4 8 6 foo one 6 12 7 foo three 7 14
On the other hand, if you want to select rows where column value does not equal a specific value, use the != operator.
df.loc[df['column_name'] != value]
If you want to select rows where column has one of multiple values, use isin() function.
df.loc[df['column_name'].isin(some_values)]
Here is an example to print rows where column B is either ‘one’ or ‘three’.
print(df.loc[df['B'].isin(['one','three'])])
Here is the output you will see.
A B C D 0 foo one 0 0 1 bar one 1 2 3 bar three 3 6 6 foo one 6 12 7 foo three 7 14
On the other hand, if you want to select rows where column value is not one of the list of values, use ~ operator at the beginning of your command.
df.loc[~df['column_name'].isin(some_values)]
Here is an example.
print(~df.loc[df['B'].isin(['one','three'])])
If you want to combine multiple conditions, you can combine them using & and parentheses. Here is an example.
df.loc[(df['column_name'] >= A) & (df['column_name'] <= B)]
Please note, you need to add parentheses around each condition separately to avoid conflict and ensure correct parsing.
2. Using query() function
You can also use query() function to easily filter rows based on required condition. Here is an example to select rows based on condition A==’foo’.
df.query('A == "foo"') A B C D 0 foo one 0 0 2 foo two 2 4 4 foo two 4 8 6 foo one 6 12 7 foo three 7 14
In this article, we have learnt several ways to easily select rows from python dataframe. You can modify them as per your requirement.
This is very useful to filter rows from your dataframe using one or more conditions, especially for people who are used to working SQL queries in a database.
Also read:
How to Rename Columns in Pandas
How to Create Pandas Dataframe from Dictionary
How to Create Pandas Dataframe from Lists
How to Access Index of Last Element in Pandas Dataframe
MySQL Insert or Update if Exists