Wget is a popular command used to download files & folders in Linux. Typically, people use wget to download a single file from a given URL. But sometimes you may need to download multiple files in a folder location. In this article, we will learn how to recursively download files & folders in wget. You can use these commands on almost every Linux distribution.
How to Recursively Download Files & Folders in Wget
Wget offers numerous useful options to recursively download files & folders. We will look at different use cases to download content recursively using wget command.
Download Files & Folders Recursively
We will need to use -r or –recursive option to recursively download files in a folder. Here is an example to download all files & folders recursively from www.example.com/products/
$ wget -r http://www.example.com/products/ OR $ wget --recursive http://example.com/products/
Exclude Parent Directory
But it is important to note that when you use -r or –recursive option with wget, it will also download parent folder and files, in case it is able to find a link to them from your folder or its files, or directory index of the site.
To avoid this, use -np or –no-parent option as shown below.
$ wget -np -r http://www.example.com/products/ OR $ wget --no-parent --recursive http://example.com/products/
Exclude Specific Files & Folders
In the above commands, wget will download only files & folders present at www.example.com/products.
When you use -r or –recursive option with wget, it will download all files & folders and recursively, without any filters. If you don’t want to download specific files or folders, you exclude them using -R or –reject option, followed by the file or folder name to be excluded.
Here is an example, to download all files & folders in www.example/products/ except data.pdf
$ wget -np -r -R "data.pdf" http://www.example.com/products/ OR $ wget --no-parent --recursive --reject "data.pdf" http://example.com/products/
Download Specific Subfolder Level
wget will download all files & subfolders under your folder URL. If you want to selectively download on the target folder and not its subfolders, use -l1 option.
$ wget -np -r -l1 http://www.example.com/products/ OR $ wget --no-parent --recursive -l1 http://example.com/products/
If you want to download the folder and level 1 subfolder (e.g. www.example.com/products/category) use -l2 option.
$ wget -np -r -l2 http://www.example.com/products/ OR $ wget --no-parent --recursive -l2 http://example.com/products/
By default, wget will crawl up to 5 levels of subfolders, if they are available.
Force Recursive Download
Please note, by default, wget will download recursively only if the robots.txt file of target website allows it. This is because recursive download is sort of crawling and therefore controlled by robots.txt file. Nevertheless, if you want to force download a file, irrespective or what robots.txt file on the site says, use -e robots=off option.
$ wget -np -r -e robots=off http://www.example.com/products/ OR $ wget --no-parent --recursive -e robots=off http://example.com/products/
In this article, we have learnt how to recursively download files & folders using wget. We also learnt how to avoid downloading parent folders, exclude specific files & folders, selectively download specific level of subfolders, and also force recursive downloads. You can customize these commands as per your requirement.
Also read:
Bash Script to Run Commands on Remote Server
How to Add Text in Image in Python
How to Find & Copy Files in Linux
How to Rename Downloaded File in Wget
Wget Limit Download Speed Rate
Related posts:
How to Pass Variable in cURL Command
Shell Script to Count Number of Words in File
How to Find & Replace String in VI Editor
How to Use Boolean Variables in Shell Script
How to Undo & Redo in Nano Editor
How to Use SCP with PEM File (SSH Key)
How to Remove PPA in Ubuntu/Debian Linux
How To Search in VI Editor
Sreeram has more than 10 years of experience in web development, Python, Linux, SQL and database programming.