iterate through files in directory

How to Iterate Through Files in Directory in Python

Often you may need to iterate through files in directory in your python script, website or application. There are several ways to do this in Python. In this article, we will learn how to iterate through files in directory in Python. You can easily use it to traverse through the files in any directory or even filter specific types of files such as pdf, txt, csv, etc.


How to Iterate Through Files in Directory in Python

Here are the steps to iterate through files in directory in Python.


1. Using os.listdir()

os.listfir() function returns a list of all files and directories in a specific directory. Here is a simple code snippet to iterate through files in directory in Python. Replace /path/to/dir with the path to your directory.

import os

for filename in os.listdir("/path/to/dir/"):
    if filename.endswith(".pdf") or filename.endswith(".txt"): 
        print(os.path.join(directory, filename))

In the above code, we use os.listdir() function to get a list of all files and directories in input directory. We run a for loop through this list, and in each iteration, we call endswith() function to determine if the file ends with .pdf or .txt. You can add more calls to endswith() function if you want to check for more file types. Else you can omit the if condition if you want to view all files. If we find matching files, we call print() function to print their file path.

If you are using Python 3.6, you may need to modify your code slightly to retrieve the filename from strings, using fsencode() function.

import os

directory = os.fsencode(directory_path)
 
for file in os.listdir(directory):
     filename = os.fsdecode(file)
     if filename.endswith(".pdf") or filename.endswith(".txt"): 
         print(os.path.join(directory, filename))


2. Using pathlib

You can also use pathlib in combination with glob function to list all files in directory.

from pathlib import Path

directory_in_string='/home/ubuntu/data'

pathlist = Path(directory_in_str).glob('**/*.pdf')
for path in pathlist:
     # because path is object not string
     path_in_str = str(path)
     print(path_in_str)

In the above code, we store the string path to directory in directory_in_string. We call Path() function to this string, to get an object to directory path. Further, we call glob() function on it and specify pattern to match .pdf files in the directory. The glob function will return a list of all pdf files in our specified directory. We loop through this list and print the file path of each file.


3. Using os.walk()

So far we have learnt how to list immediate files in a given directory. If you want to list all descendant files, not just immediate children of directory.

import os

for subdir, dirs, files in os.walk(rootdir):
    for file in files:
        #print os.path.join(subdir, file)
        filepath = subdir + os.sep + file

        if filepath.endswith(".pdf"):
            print (filepath)

In the above code, we use os.walk() function to get a list of all files & directories in a given directory. We loop through this list one by one and in each iteration, we construct the full file path. We print the file path if the file ends with .pdf extension. You can customize it as per your requirement.

In this article, we have learnt how to list all files in directory using Python. You can use any of the above methods.

Also read:

How to Find All Text Files in Directory in Python
How to Convert CSV to Tab Delimited Files in Python
How to Stress Test Linux Server
Schedule Cron Job Every 1 Hour in Linux
How to Send Message to Logged User In Linux

Leave a Reply

Your email address will not be published. Required fields are marked *