select rows in pandas

How to Select Rows from Dataframe Using Column Values

Sometimes you may need to filter rows in your Pandas Dataframe using column values, pretty much like running a SELECT SQL query against a database table. There are several ways to do this in Python Pandas. In this article, we will learn how to select rows from dataframe using column values in Python Pandas.


How to Select Rows from Dataframe Using Column Values

Here are the different ways to filter rows from dataframe using column values.


1. Using loc() function

Here is the command to select rows with column value equal to scalar ‘value’, use == operator.

df.loc[df['column_name'] == value]

Here is an example to filter rows where column A=’foo’.

import pandas as pd
import numpy as np
df = pd.DataFrame({'A': 'foo bar foo bar foo bar foo foo'.split(),
                   'B': 'one one two three two two one three'.split(),
                   'C': np.arange(8), 'D': np.arange(8) * 2})
print(df)
#      A      B  C   D
# 0  foo    one  0   0
# 1  bar    one  1   2
# 2  foo    two  2   4
# 3  bar  three  3   6
# 4  foo    two  4   8
# 5  bar    two  5  10
# 6  foo    one  6  12
# 7  foo  three  7  14

print(df.loc[df['A'] == 'foo'])

Here is the output you will see.

     A      B  C   D
0  foo    one  0   0
2  foo    two  2   4
4  foo    two  4   8
6  foo    one  6  12
7  foo  three  7  14

On the other hand, if you want to select rows where column value does not equal a specific value, use the != operator.

df.loc[df['column_name'] != value]

If you want to select rows where column has one of multiple values, use isin() function.

df.loc[df['column_name'].isin(some_values)]

Here is an example to print rows where column B is either ‘one’ or ‘three’.

print(df.loc[df['B'].isin(['one','three'])])

Here is the output you will see.

     A      B  C   D
0  foo    one  0   0
1  bar    one  1   2
3  bar  three  3   6
6  foo    one  6  12
7  foo  three  7  14

On the other hand, if you want to select rows where column value is not one of the list of values, use ~ operator at the beginning of your command.

df.loc[~df['column_name'].isin(some_values)]

Here is an example.

print(~df.loc[df['B'].isin(['one','three'])])

If you want to combine multiple conditions, you can combine them using & and parentheses. Here is an example.

df.loc[(df['column_name'] >= A) & (df['column_name'] <= B)]

Please note, you need to add parentheses around each condition separately to avoid conflict and ensure correct parsing.


2. Using query() function

You can also use query() function to easily filter rows based on required condition. Here is an example to select rows based on condition A==’foo’.

df.query('A == "foo"')

     A      B  C   D
0  foo    one  0   0
2  foo    two  2   4
4  foo    two  4   8
6  foo    one  6  12
7  foo  three  7  14

In this article, we have learnt several ways to easily select rows from python dataframe. You can modify them as per your requirement.

This is very useful to filter rows from your dataframe using one or more conditions, especially for people who are used to working SQL queries in a database.

Also read:

How to Rename Columns in Pandas
How to Create Pandas Dataframe from Dictionary
How to Create Pandas Dataframe from Lists
How to Access Index of Last Element in Pandas Dataframe
MySQL Insert or Update if Exists

Leave a Reply

Your email address will not be published.