merge pdf files in python

How to Merge PDF Files Using Python

Python provides numerous ways to work files including PDF files. Sometimes you may need to combine multiple PDF files into single file. In this article, we will learn how to merge PDF files using python.


How to Merge PDF Files Using Python

Here are the different ways to merge PDF files using Python. For this purpose, we will use PyPDF2 library.


1. Install PyPDF2

Open terminal and run the following command to install PyPDF2 in python.

$ pip install PyPDF2


2. Merge PDF Files

PyPDF2 provides several ways to merge PDF files. We will look at them one by one.

File Concatenation

Let us say you have PDF files file1.pdf, file2.pdf, and file3.pdf. In this case, we import PDfFileMerger from PyPDF2 and use append() to append one file to another.

from PyPDF2 import PdfFileMerger

pdfs = ['file1.pdf', 'file2.pdf', 'file3.pdf']

merger = PdfFileMerger()

for pdf in pdfs:
    merger.append(pdf)

merger.write("result.pdf")
merger.close()

In the above code, we append file1.pdf, file2.pdf, and file3.pdf into result.pdf file. We create a PdfFileMerger() object first, and then basically loop through the list containing filenames, appending each of them to the PdfFileMerger() object. Finally, we call write() function to write the appended content into a single file result.pdf. Lastly, we call close() function to close both input and output files. Please note, if you mention only filenames in pdfs list above, python code will look for them relative to its location. So you may want to use full paths instead of relative paths.

pdfs = ['/home/ubuntu/file1.pdf', '/home/ubuntu/file2.pdf', '/home/ubuntu/file3.pdf']

File Merging

You can also use merge() function to append pdf file. It allows you to specify an insertion point in output file. In this case, you can specify the page number after which the insertion needs to take place.

from PyPDF2 import PdfFileMerger

pdfs = ['file1.pdf', 'file2.pdf', 'file3.pdf']

merger = PdfFileMerger()

for pdf in pdfs:
    merger.merge(2,pdf)

merger.write("result.pdf")
merger.close()

In this case, we use merge function to insert every pdf after the 2nd page.

Using Page Ranges

The above examples append one PDF fully with other PDF documents. If you want to append only specific pages and not the entire document, you can use pages keyword argument and pass a tuple of the format (start, end [,step]) to specify the page range to be appended.

from PyPDF2 import PdfFileMerger

pdfs = ['file1.pdf', 'file2.pdf', 'file3.pdf']

merger = PdfFileMerger()

for pdf in pdfs:
    merger.append(pdf, pages=(0, 3))

merger.write("result.pdf")
merger.close()

In the above code, we append only first 3 pages of each document to create a single document. Here is another example, where we append alternate pages 1,3,5

#another example
merger.append(pdf, pages=(0, 6, 2)) # pages 1,3, 5

It is important to remember to call the PDfFileMerger module’s close() method when you have completed writing PDF files. This will ensure that both input and output files are closed properly.

In this article, we have seen how to easily merge PDF files using python. You can customize these examples as per your requirement.

Also read:

How to Do Incremental Backup in MySQL
How to Pass SSH Password in Shell
MySQL Change Table Engine from InnoDB to MyISAM
How to Install Fonts in Ubuntu
How to Increment & Decrement Shell Variable

Leave a Reply

Your email address will not be published. Required fields are marked *