read large file in python

How to Read Large Files in Python

Python is a powerful programming language that offers many modules and functions to easily work with files. Sometimes you may need to read large files in Python. If you use the conventional functions such as readlines() to do this, it can be time consuming, or take up a lot of memory on your system. These functions are useful when you are reading small files. So in case of large files, you need to use an iterator to iterate over the entire file and perform required operations. Iterators don’t take up much memory space and very time-efficient, since they don’t require to load the entire file in memory. Also, files happen to be iterable making them directly eligible for this method. In this article, we will learn how to read large files in Python.

How to Read Large Files in Python

We will learn a couple of ways to iterate over large files.

1. Using fileinput

In this method, we use fileinput module in python. We will call input() method of fileinput module to read files. This method does not load the entire file into memory unlike readlines() and is therefore memory efficient. fileinput.input() function takes a list of multiple filenames and if not filename is provided, it can also accept input from standard input. It returns an iterator for the file, which contains individual lines of the file. Here is a sample code for this purpose.

# import module
import fileinput
import time

#time at the start of program is noted
start = time.time()

#keeps a track of number of lines in the file
count = 0
for lines in fileinput.input(['sample.txt']):
	count = count + 1
#time at the end of program execution is noted
end = time.time()

#total time taken to print the file
print("Execution time in seconds: ",(end - start))
print("No. of lines printed: ",count)

In the above code, we use fileinput.input() function to read the file sample.txt. We also use time module to time the task of reading the entire file. First, we start the timer using time.time() and then read the file using fileinput.input() function. We use a for loop to go through each line of the file one by one, and print the line, using the iterator. Finally, call time.time() function to stop the timer.

2. Using open() function

In this method, we use open() function to get an iterator to the file object. Open() function returns a file object. Next, we use an iterator to this file object. We will write the entire code in ‘with’ block so that the file is automatically closed after it is read. Here is a sample code for this purpose.

import time

start = time.time()
count = 0
with open("sample.txt") as file:
	for line in file:
	count = count + 1
end = time.time()
print("Execution time in seconds: ",(end-start))
print("No of lines printed: ",count)

In this code, we store the file object’s iterator in file variable. Then we run a for loop through it and print the lines of file one by one. As done before, we also use time.time() to time the entire operation. Please note, this method is faster than the previous method.

In this article, we have learnt a couple of simple ways to read large files in Python, using iterators. You can customize them as per your requirement. The key thing to remember is to use iterators to read large files, since they don’t load the entire file in memory and are therefore fast and memory efficient.

Also read:

Display Command Output & File Content in Column Format
How to Create Nested Directory in Python
How to Add Blank Directory in Git Repository
How to Iterate Through Files in Directory in Python
How to Find All Text Files in Directory in Python

Leave a Reply

Your email address will not be published.