combine json to csv python

How to Combine JSON Files to CSV in Python

JSON is a popular data format used by many websites & applications. However, sometimes you may need to combine multiple JSON files to CSV in python. This may be required if you wish to offer ‘export as JSON’ feature on your website/application. In this article, we will learn how to combine JSON to CSV in Python.


How to Combine JSON Files to CSV in Python

We will look at different use cases to concatenate JSON Files to CSV in Python. We will be using python pandas library for our purpose, since it makes things easy while working data and files.


1. When All Columns Match

We will use this approach when all JSON files have same keys (columns). In this case, we will load all JSON files with help of Pandas dataframe one by one. Then we will concatenate the dataframes into one dataframe. Finally, we will convert the concatenated dataframe into CSV file.

Let us say you have the following 2 json files which have same columns.

# File1.json
{
    "ID":{
        "0":23,
        "1":43,
        },
    "Name":{
        "0":"Ram",
        "1":"Deep",
        },
    "Marks":{
        "0":89,
        "1":97,
        },
    "Grade":{
        "0":"B",
        "1":"A",
        }
}

#File2.json

{
    "ID":{
        "0":90,
        "1":56,
         },
    "Name":{
        "0":"Akash",
        "1":"Chalsea",
        },
    "Marks":{
        "0":81,
        "1":87,
        },
    "Grade":{
        "0":"B",
        "1":"B",
        }
}

Here is the python code to convert JSON to CSV.

# importing packages
import pandas as pd

# load json file using pandas
df1 = pd.read_json('file1.json')

# view data
print(df1)

# load json file using pandas
df2 = pd.read_json('file2.json')

# view data
print(df2)

# use pandas.concat method
df = pd.concat([df1,df2])

# view the concatenated dataframe
print(df)

# convert dataframe to csv file
df.to_csv("CSV.csv",index=False)

# load the resultant csv file
result = pd.read_csv("CSV.csv")

# and view the data
print(result)

If you run the above code, you will get the following output.

   ID    Name  Marks Grade
0  23     Ram     89     B
1  43    Deep     97     A

   ID     Name  Marks Grade
0  90    Akash     81     B
1  56  Chalsea     87     B

   ID     Name  Marks Grade
0  23      Ram     89     B
1  43     Deep     97     A
0  90    Akash     81     B
1  56  Chalsea     87     B

    ID     Name  Marks Grade
0   23      Ram     89     B
1   43     Deep     97     A
2   90    Akash     81     B
3   56  Chalsea     87     B

In the above code, we first print first file, then second file, then concatenated dataframe, and finally the CSV file. Please note, you can use concat() function only if both the JSON files have same keys/columns. Otherwise, you need to use the next approach.


2. When Some Columns Match

If only some of the keys/columns match between the JSON files, we will merge these files to create a single dataframe which contains of a union of keys in both JSON files.

Let us say you have two JSON files file1.json containing keys id, name, marks and file2.json with keys id, name, grade.

# File1.json
{
    "ID":{
        "0":23,
        "1":43,
        "2":12,
        "3":13,
        "4":67,
        "5":89
        },
    "Name":{
        "0":"Ram",
        "1":"Deep",
        "2":"Yash",
        "3":"Aman",
        "4":"Arjun",
        "5":"Aditya"
        },
    "Marks":{
        "0":89,
        "1":97,
        "2":45,
        "3":78,
        "4":56,
        "5":76
        }
}

#File2.json
{
    "ID":{
        "0":23,
        "1":43,
        "2":12,
        "3":67,
        "4":89
        },
    "Name":{
        "0":"Ram",
        "1":"Deep",
        "2":"Yash",
        "3":"Arjun",
        "4":"Aditya"
        },
    "Grade":{
        "0":"B",
        "1":"A",
        "2":"F",
        "3":"E",
        "4":"C"
        }
}

Here is the code to combine the two JSON files in python. Python pandas allows you to merge dataframes using inner/outer/left/right joins. Finally, we will write the combine dataframe into CSV file.

# importing packages
import pandas as pd

# load json file using pandas
df1 = pd.read_json('file1.json')

# view data
print(df1)

# load json file using pandas
df2 = pd.read_json('file2.json')

# view data
print(df2)

# use pandas.merge method
df_inner = pd.merge(df1, df2, how='inner', left_on=[
					'ID', 'Name'], right_on=['ID', 'Name'])
df_outer = pd.merge(df1, df2, how='outer', left_on=[
					'ID', 'Name'], right_on=['ID', 'Name'])
df_left = pd.merge(df1, df2, how='left', left_on=[
				'ID', 'Name'], right_on=['ID', 'Name'])
df_right = pd.merge(df1, df2, how='right', left_on=[
					'ID', 'Name'], right_on=['ID', 'Name'])

# convert dataframe to csv file
df_inner.to_csv("CSV_inner.csv", index=False)
df_outer.to_csv("CSV_outer.csv", index=False)
df_left.to_csv("CSV_left.csv", index=False)
df_right.to_csv("CSV_right.csv", index=False)

# load the resultant csv file
result_inner = pd.read_csv("CSV_inner.csv")
result_outer = pd.read_csv("CSV_outer.csv")
result_left = pd.read_csv("CSV_left.csv")
result_right = pd.read_csv("CSV_right.csv")

# and view the data
print(result_outer)
print(result_inner)
print(result_left)
print(result_right)

Here is the output you will see.

ID    Name  Marks
0  23     Ram     89
1  43    Deep     97
2  12    Yash     45
3  13    Aman     78
4  67   Arjun     56
5  89  Aditya     76

   ID    Name Grade
0  23     Ram     B
1  43    Deep     A
2  12    Yash     F
3  67   Arjun     E
4  89  Aditya     C

   ID    Name  Marks Grade
0  23     Ram     89     B
1  43    Deep     97     A
2  12    Yash     45     F
3  13    Aman     78   NaN
4  67   Arjun     56     E
5  89  Aditya     76     C

   ID    Name  Marks Grade
0  23     Ram     89     B
1  43    Deep     97     A
2  12    Yash     45     F
3  67   Arjun     56     E
4  89  Aditya     76     C

   ID    Name  Marks Grade
0  23     Ram     89     B
1  43    Deep     97     A
2  12    Yash     45     F
3  13    Aman     78   NaN
4  67   Arjun     56     E
5  89  Aditya     76     C

   ID    Name  Marks Grade
0  23     Ram     89     B
1  43    Deep     97     A
2  12    Yash     45     F
3  67   Arjun     56     E
4  89  Aditya     76     C

In the above code, we have first displayed contents of file1.json and file2.json when loaded into dataframes. Then we have displayed the outputs of inner, outer, left and right joins while merging the two dataframes.

In this article, we have learnt a couple of simple ways to combine JSON files into CSV using python pandas library. You can also do the same thing by simply looping through each file and writing their content in a comma separated manner to CSV file. It is just that python pandas library contains in-built functions to merge data sets easily and efficiently. The main advantage of using python pandas over loops is that it will easily merge JSON data whether they have common keys or not. It has different kinds of options to combine datasets using merge function.

Also read:

How to Convert JSON to CSV in Python
How to Find All Sudo Users in Linux
How to Check if Program Exists in Shell Script
How to Insert Text At Beginning of File in Linux
How to Get My Public SSH Keys

Leave a Reply

Your email address will not be published. Required fields are marked *