recursively download files & folders in wget

How to Recursively Download Files & Folders in Wget

Wget is a popular command used to download files & folders in Linux. Typically, people use wget to download a single file from a given URL. But sometimes you may need to download multiple files in a folder location. In this article, we will learn how to recursively download files & folders in wget. You can use these commands on almost every Linux distribution.


How to Recursively Download Files & Folders in Wget

Wget offers numerous useful options to recursively download files & folders. We will look at different use cases to download content recursively using wget command.


Download Files & Folders Recursively

We will need to use -r or –recursive option to recursively download files in a folder. Here is an example to download all files & folders recursively from www.example.com/products/

$ wget -r http://www.example.com/products/
OR
$ wget --recursive http://example.com/products/


Exclude Parent Directory

But it is important to note that when you use -r or –recursive option with wget, it will also download parent folder and files, in case it is able to find a link to them from your folder or its files, or directory index of the site.

To avoid this, use -np or –no-parent option as shown below.

$ wget -np -r http://www.example.com/products/
OR
$ wget --no-parent --recursive http://example.com/products/


Exclude Specific Files & Folders

In the above commands, wget will download only files & folders present at www.example.com/products.

When you use -r or –recursive option with wget, it will download all files & folders and recursively, without any filters. If you don’t want to download specific files or folders, you exclude them using -R or –reject option, followed by the file or folder name to be excluded.

Here is an example, to download all files & folders in www.example/products/ except data.pdf

$ wget -np -r -R "data.pdf" http://www.example.com/products/
OR
$ wget --no-parent --recursive --reject "data.pdf" http://example.com/products/


Download Specific Subfolder Level

wget will download all files & subfolders under your folder URL. If you want to selectively download on the target folder and not its subfolders, use -l1 option.

$ wget -np -r -l1 http://www.example.com/products/
OR
$ wget --no-parent --recursive -l1 http://example.com/products/

If you want to download the folder and level 1 subfolder (e.g. www.example.com/products/category) use -l2 option.

$ wget -np -r -l2 http://www.example.com/products/
OR
$ wget --no-parent --recursive -l2 http://example.com/products/

By default, wget will crawl up to 5 levels of subfolders, if they are available.


Force Recursive Download

Please note, by default, wget will download recursively only if the robots.txt file of target website allows it. This is because recursive download is sort of crawling and therefore controlled by robots.txt file. Nevertheless, if you want to force download a file, irrespective or what robots.txt file on the site says, use -e robots=off option.

$ wget -np -r -e robots=off http://www.example.com/products/
OR
$ wget --no-parent --recursive -e robots=off http://example.com/products/

In this article, we have learnt how to recursively download files & folders using wget. We also learnt how to avoid downloading parent folders, exclude specific files & folders, selectively download specific level of subfolders, and also force recursive downloads. You can customize these commands as per your requirement.

Also read:

Bash Script to Run Commands on Remote Server
How to Add Text in Image in Python
How to Find & Copy Files in Linux
How to Rename Downloaded File in Wget
Wget Limit Download Speed Rate

Leave a Reply

Your email address will not be published. Required fields are marked *