split file in python

How to Split File in Python

Python is a powerful programming language that allows you to easily work with files and data. Often you may need to split a file in Python, based on delimiter, size, lines, or column. In this article, we will learn how to split file in Python in different ways.


How to Split File in Python

Here are the different ways to split file in Python. Let us say you have a file data.txt that you want to split in Python.

Split File by Lines

In this case, we will split the contents of data.txt by lines. For example, let us say you have the following content in data.txt.

First Line
Second Line

You can easily split a file in Python by lines using the built-in function splitlines(). Here is the code to do this.

f = open("data.txt", "r")
content = f.read()

content_list = content.splitlines()
f.close()
print(content_list)

Here is the output you will see when you run the above code. It will be a list, where each element is a line in your file data.txt

['First Line', 'Second Line']

Let us look at the above code in detail. First, we open the file data.txt using open() function and store in a python object using read() function. We call splitlines() on this function, which returns a list, where each line in your file is a list item. Then we close the file using close() function and finally we print the contents our list using print() function.


Split File by Delimiter

In this case, we will split file based on a delimiter, also known as a separator. Typically, we get text files with tab delimited data and want to convert it into CSV file, or split it. For this purpose, we will use split() function, which allows you to split strings using separator. Let us say you have the following data.txt file with employee information.

Lana Anderson 585-3094-88 Electrician
Elian Johnston 851-5845-87 Interior Designer
Henry Johnston 877-6561-52 Astronomer

Here is a simple code to split the above file based on tab/space.

with open("data.txt",'r') as data_file:
    for line in data_file:
        data = line.split()
        print(data)

Here is the output you will see when you run the above code.

['Lana', 'Anderson', '485-3094-88', 'Electrician']
['Elian', 'Johnston', '751-5845-87', 'Interior', 'Designer']
['Henry', 'Johnston', '777-6561-52', 'Astronomer']

Let us look at the above code in detail. First, we open the file using open() function. Then we loop through the lines of the file using for loop. In each iteration, we call split() function on the line, which basically splits the string present in the line by ‘space’ separator. Finally, we print it using print() function.

Let us say you already have comma-separated strings on each line and want split() function to split each line using comma separator.

Janet,100,50,69
Thomas,99,76,100
Kate,102,78,65

Here is a simple code to use split() function to split such a file.

with open("data.txt",'r') as file:
    for line in file:
        data = line.strip().split(',')
        print(data)

Here is the output you will see.

['Janet', '100', '50', '69']
['Thomas', '99', '76', '100']
['Kate', '102', '78', '65']

In the above code, we open the file using open() function and run a for loop through its lines. In each iteration we call split() function and specify comma (,) as delimiter. This will split each line’s strings using comma separator. Finally, we call print() function to print its data.


Split File by Size

If you want to split a file by chunks or size then you need to use read() function to read fixed amount of file data and then work with it. Here is an example to do the same.

test_file = 'data.txt'


def chunks(file_name, size=10000):
    with open(file_name) as f:
        while content := f.read(size):
            yield content


if __name__ == '__main__':
    split_files = chunks(test_file)
    for chunk in split_files:
        print(len(chunk))

In the above code we define chunks() function that opens the file and reads specific amount of data from it and keeps returning the data as long as there is no more data to be read. We call this function and store the file chunks in split_lines list. We finally loop through split_lines list and print each chunk.

To be honest, is you are using Linux, it is advisable to simply use split command to split the file, based on size. Here is a command to easily do the above task in just 1 line.

$ split -l 10000 file.txt

In the above article, we have learnt how to split file in Python in various ways – by lines, delimiter and size. You can use any of the above code as per your requirement.

Also read:

How to Show File Without Comments in Linux
How to Enable Screen Sharing in Ubuntu
How to Show Active Connections on Port
How to Install CSF in CentOS & Ubuntu
How to Install Visual Studio Code in Ubuntu

Leave a Reply

Your email address will not be published.