How to Convert PDF to CSV in Python

Python is a powerful language that offers tons of features. Sometimes you may have received data in PDF file format but you may want to import it into another software like Excel that works with CSV file formats. In such cases, you will need to convert PDF to CSV. Python makes it easy to transform PDF to CSV files. There are several packages that allow you to easily convert PDF to CSV files in Python. In this article, we will learn how to convert PDF to CSV in Python using tabula-py module.

How to Convert PDF to CSV in Python

Here are the steps to convert PDF to CSV in Python.

1. Install Java

tabula-py requires Java to be installed on your system. So go to this link, download and install Java on your system by following the steps mentioned there.

2. Install tabula-py

Run the following command to install tabula-py.

$ pip install tabula-py

3. Read PDF File

Next read the file using read_pdf() function. It will return a Python Pandas Dataframe. Replace pdf_file_location with the location of PDF file.

read_pdf("pdf_file_location", pages=number)

4. Generate CSV File

Once you have a dataframe, you can export it to CSV file using to_csv() function.

df.to_csv('Excel File Path')

Here is a code snippet that puts together the above functions. Replace the file paths to PDF and CSV files as per your requirement.

# Import the required Module
import tabula
# Read a PDF File
df = tabula.read_pdf("/home/ubuntu/test.pdf", pages='all')[0]
# convert PDF into CSV
df.to_csv('/home/ubuntu.test.csv', encoding='utf-8')
print(df)

In this article, we have learnt how to convert PDF to CSV using python. You can use this code in your application or script as per your requirement.

The key is to properly import your PDF data into Python dataframe using tabula package. Once you have the dataframe ready, you can easily export it to CSV using to_csv() function.

How to List All Files in Directory in Python

How to Create Python Function with Optional Arguments

How to Download Images in Python

What Is name In Python

How to Get File Size in Python

How to Count Repeated Characters in String in Python

How to Store JSON to File in Python

How to Extract Tables from PDF in Python

Sreeram Sreenivasan

Sreeram has more than 10 years of experience in web development, Python, Linux, SQL and database programming.

How to Convert PDF to CSV in Python