Python allows you to easily work with files, do data analysis and manipulation. Many times you may need to sort contents of CSV file in Python, since they are commonly used for data transfer in organizations. We can sort CSV file in python using the powerful pandas library, or the usual csv library. In this article, we will look at how to sort CSV file in python.
How to Sort CSV File in Python
Let us say you have the following CSV file /home/ubuntu/data.csv
ID, Name, Age 3, John, 43 1, Jim,56 2, Jane,28
Now we will look at different ways to sort this file.
1. Using sort_values()
The pandas library provides sort_values() function to easily sort CSV files. Here is the code to sort our CSV file
# importing pandas package import pandas as pd # assign dataset data = pd.read_csv("/home/ubuntu/data.csv") # sort data frame data.sort_values(["Age"], axis=0, ascending=[False], inplace=True) # displaying sorted data frame print("\nAfter sorting:") print(data)
In the above code, we import pandas package and use its read_csv function to read the csv file into a dataset. Then we use sort_values function to sort it based on “Age” column, in descending order. Finally, we display the result
After sorting: ID, Name, Age 1, Jim,56 3, John, 43 2, Jane,28
Let us look at sort_values function in more detail. Its 1st argument is the column name. Next argument is the axis field – 0 means row sorting and 1 means column sorting. Next, is whether you want to sort in ascending or descending manner. If it is true, then values will be sorted in ascending order. If it is false, then they will be sorted in descending order.
If you want to sort based on multiple columns, you can mention then as a list in 1st argument. In fact, you can even specify the sort order of each column as a list. Here is an example to sort the data by Name, Age. We will sort Name in Ascending order while Age in descending order.
data.sort_values(["Name","Age"], axis=0, ascending=[True,False], inplace=True)
Here is the full code.
# importing pandas package import pandas as pd # assign dataset data = pd.read_csv("/home/ubuntu/data.csv") # sort data frame data.sort_values(["Name","Age"], axis=0, ascending=[True,False], inplace=True) # displaying sorted data frame print("\nAfter sorting:") print(data)
Here is the output of above code.
ID, Name, Age 2, Jane,28 1, Jim,56 3, John, 43
In the above cases, we have assumed that our data has 1st row as header. If your data does not have headers then you may refer your columns using index – 0 for 1st column, 1 for second column and so on. Here is the command to sort the data according to column 1 and column 3.
data.sort_values([data.columns,data.columns], axis=0, ascending=[True,False], inplace=True)
2. Using sorted()
You may also use sorted() in csv library but it allows you to sort only one column at a time.
# import modules import csv ,operator # load csv file data = csv.reader(open('/home/ubuntu/data.csv'),delimiter=',') # sort data on the basis of age data = sorted(data, key=operator.itemgetter(2)) # displaying sorted data print('After sorting:') display(data)
In the above code, we first import required modules. Then we load CSV file using csv.reader() function. Then we sort the loaded data based on column with index 2, referenced using operator.itemgetter(2). It will sort data by ‘Age’ column in ascending order.
If you want to sort data in descending order, just add reverse=True in sorted function as shown below.
data = sorted(data, key=operator.itemgetter(2), reverse=True)
In this article, we have looked at how to sort CSV file using sort_values in pandas library and sorted() function in csv package. The recommended way to sort CSV data is to use sort_values function in pandas package.