Python Pandas is a popular library that allows you to easily process data and organize them effectively. Often you may need to delete rows from dataframe based on conditions. In this article, we will learn how to delete rows from dataframe based on conditions in python.
How to Delete Rows from Dataframe Based on Condition
Let us say you have the following dataframe in python, which contains the columns Name, Team, Number, Position, Age, Height, Weight, College, Salary.
# importing pandas as pd import pandas as pd # Read the csv file and construct the # dataframe df = pd.read_csv('data.csv') # Visualize the dataframe print(df.head(15) # Print the shape of the dataframe print(df.shape)
Let us say you want to filter rows where age>=25 years. Here is the python code to do this.
# Filter all rows for which the player's # age is greater than or equal to 25 df_filtered = df[df['Age'] >= 25] # Print the new dataframe print(df_filtered.head(15) # Print the shape of the dataframe print(df_filtered.shape)
In the above code, we use mathematical operator and indexes to filter required rows. In this case, it will return a copy of the original data where age>=25 in each row, and store the result in df_filtered dataframe object. It will not alter the original table df. The basic syntax to achieve it is as follows.
df=df[condition] For example, df = df[df.Age != 0] OR df = df[df['Age']>=25]
Here is another syntax to delete rows based on condition, using loc function.
Here is an example of the above command.
You can also use another way to delete rows from dataframe. In this case, it will alter the original table.
# importing pandas as pd import pandas as pd # Read the csv file and construct the # dataframe df = pd.read_csv('data.csv') # First filter out those rows which # does not contain any data df = df.dropna(how = 'all') # Filter all rows for which the player's # age is greater than or equal to 25 df.drop(df[df['Age'] < 25].index, inplace = True) # Print the modified dataframe print(df.head(15)) # Print the shape of the dataframe print(df.shape)
In this case, we first drop all rows which do not contain any data, using dropna() function. Next, we use mathematical operator and indexes to filter rows where age<25. We further use drop() function to drop these records.
In this case, the original dataframe in python will be modified since we are working with a view of the original dataframe and not its copy. If you don’t want to modify the original dataframe then first copy it to another object before you run the above code.
Delete Rows Based on Multiple Conditions
The above examples show you how to delete rows based on single condition. If you want to delete rows based on multiple conditions you can combine them using bitwise OR (|) and bitwise AND (&) operators. Here is an example syntax for demonstration. We have used 3 bitwise OR and 1 bitwise AND operators.
df = df[(condition1) | (condition2) | (condition3) & (condition4)]
Here is an example for the above command. We are filtering rows with age>=25 and less than 50.
df = df[(df.Age >=25 0) & (df.Age < 50)]
Here also you can use mathematical and logical operators, and even combine them with each other.
In this short article, we have learnt a couple of simple ways to delete rows from dataframe, using conditions. You can use logical as well as mathematical operators to specify your conditions. It is important to note that the first method creates a new dataframe with filtered rows while the old one modifies the existing dataframe. Another thing to keep in mind is that, when you use logical or mathematical operators to filter rows, it will not delete empty rows. In order to delete empty rows from dataframe you need to use dropna() function.
How to Iterate Over Rows in Pandas
How to Fix SettingWithCopyWarning in Pandas
How to Get Row Count of Pandas Dataframe
How to Merge Dataframes in Pandas Based on Columns
How to Change Default Display Manager in Ubuntu