JSON is a popular data format used by many websites & applications. However, sometimes you may need to combine multiple JSON files to CSV in python. This may be required if you wish to offer ‘export as JSON’ feature on your website/application. In this article, we will learn how to combine JSON to CSV in Python.
How to Combine JSON Files to CSV in Python
We will look at different use cases to concatenate JSON Files to CSV in Python. We will be using python pandas library for our purpose, since it makes things easy while working data and files.
1. When All Columns Match
We will use this approach when all JSON files have same keys (columns). In this case, we will load all JSON files with help of Pandas dataframe one by one. Then we will concatenate the dataframes into one dataframe. Finally, we will convert the concatenated dataframe into CSV file.
Let us say you have the following 2 json files which have same columns.
# File1.json { "ID":{ "0":23, "1":43, }, "Name":{ "0":"Ram", "1":"Deep", }, "Marks":{ "0":89, "1":97, }, "Grade":{ "0":"B", "1":"A", } } #File2.json { "ID":{ "0":90, "1":56, }, "Name":{ "0":"Akash", "1":"Chalsea", }, "Marks":{ "0":81, "1":87, }, "Grade":{ "0":"B", "1":"B", } }
Here is the python code to convert JSON to CSV.
# importing packages import pandas as pd # load json file using pandas df1 = pd.read_json('file1.json') # view data print(df1) # load json file using pandas df2 = pd.read_json('file2.json') # view data print(df2) # use pandas.concat method df = pd.concat([df1,df2]) # view the concatenated dataframe print(df) # convert dataframe to csv file df.to_csv("CSV.csv",index=False) # load the resultant csv file result = pd.read_csv("CSV.csv") # and view the data print(result)
If you run the above code, you will get the following output.
ID Name Marks Grade 0 23 Ram 89 B 1 43 Deep 97 A ID Name Marks Grade 0 90 Akash 81 B 1 56 Chalsea 87 B ID Name Marks Grade 0 23 Ram 89 B 1 43 Deep 97 A 0 90 Akash 81 B 1 56 Chalsea 87 B ID Name Marks Grade 0 23 Ram 89 B 1 43 Deep 97 A 2 90 Akash 81 B 3 56 Chalsea 87 B
In the above code, we first print first file, then second file, then concatenated dataframe, and finally the CSV file. Please note, you can use concat() function only if both the JSON files have same keys/columns. Otherwise, you need to use the next approach.
2. When Some Columns Match
If only some of the keys/columns match between the JSON files, we will merge these files to create a single dataframe which contains of a union of keys in both JSON files.
Let us say you have two JSON files file1.json containing keys id, name, marks and file2.json with keys id, name, grade.
# File1.json { "ID":{ "0":23, "1":43, "2":12, "3":13, "4":67, "5":89 }, "Name":{ "0":"Ram", "1":"Deep", "2":"Yash", "3":"Aman", "4":"Arjun", "5":"Aditya" }, "Marks":{ "0":89, "1":97, "2":45, "3":78, "4":56, "5":76 } } #File2.json { "ID":{ "0":23, "1":43, "2":12, "3":67, "4":89 }, "Name":{ "0":"Ram", "1":"Deep", "2":"Yash", "3":"Arjun", "4":"Aditya" }, "Grade":{ "0":"B", "1":"A", "2":"F", "3":"E", "4":"C" } }
Here is the code to combine the two JSON files in python. Python pandas allows you to merge dataframes using inner/outer/left/right joins. Finally, we will write the combine dataframe into CSV file.
# importing packages import pandas as pd # load json file using pandas df1 = pd.read_json('file1.json') # view data print(df1) # load json file using pandas df2 = pd.read_json('file2.json') # view data print(df2) # use pandas.merge method df_inner = pd.merge(df1, df2, how='inner', left_on=[ 'ID', 'Name'], right_on=['ID', 'Name']) df_outer = pd.merge(df1, df2, how='outer', left_on=[ 'ID', 'Name'], right_on=['ID', 'Name']) df_left = pd.merge(df1, df2, how='left', left_on=[ 'ID', 'Name'], right_on=['ID', 'Name']) df_right = pd.merge(df1, df2, how='right', left_on=[ 'ID', 'Name'], right_on=['ID', 'Name']) # convert dataframe to csv file df_inner.to_csv("CSV_inner.csv", index=False) df_outer.to_csv("CSV_outer.csv", index=False) df_left.to_csv("CSV_left.csv", index=False) df_right.to_csv("CSV_right.csv", index=False) # load the resultant csv file result_inner = pd.read_csv("CSV_inner.csv") result_outer = pd.read_csv("CSV_outer.csv") result_left = pd.read_csv("CSV_left.csv") result_right = pd.read_csv("CSV_right.csv") # and view the data print(result_outer) print(result_inner) print(result_left) print(result_right)
Here is the output you will see.
ID Name Marks 0 23 Ram 89 1 43 Deep 97 2 12 Yash 45 3 13 Aman 78 4 67 Arjun 56 5 89 Aditya 76 ID Name Grade 0 23 Ram B 1 43 Deep A 2 12 Yash F 3 67 Arjun E 4 89 Aditya C ID Name Marks Grade 0 23 Ram 89 B 1 43 Deep 97 A 2 12 Yash 45 F 3 13 Aman 78 NaN 4 67 Arjun 56 E 5 89 Aditya 76 C ID Name Marks Grade 0 23 Ram 89 B 1 43 Deep 97 A 2 12 Yash 45 F 3 67 Arjun 56 E 4 89 Aditya 76 C ID Name Marks Grade 0 23 Ram 89 B 1 43 Deep 97 A 2 12 Yash 45 F 3 13 Aman 78 NaN 4 67 Arjun 56 E 5 89 Aditya 76 C ID Name Marks Grade 0 23 Ram 89 B 1 43 Deep 97 A 2 12 Yash 45 F 3 67 Arjun 56 E 4 89 Aditya 76 C
In the above code, we have first displayed contents of file1.json and file2.json when loaded into dataframes. Then we have displayed the outputs of inner, outer, left and right joins while merging the two dataframes.
In this article, we have learnt a couple of simple ways to combine JSON files into CSV using python pandas library. You can also do the same thing by simply looping through each file and writing their content in a comma separated manner to CSV file. It is just that python pandas library contains in-built functions to merge data sets easily and efficiently. The main advantage of using python pandas over loops is that it will easily merge JSON data whether they have common keys or not. It has different kinds of options to combine datasets using merge function.
Also read:
How to Convert JSON to CSV in Python
How to Find All Sudo Users in Linux
How to Check if Program Exists in Shell Script
How to Insert Text At Beginning of File in Linux
How to Get My Public SSH Keys