convert to utf8 in linux

How to Convert Files to UTF-8 in Linux

Every file has a character encoding that tells the computer operating system, or any program that uses it, about the file. When we store data in a file, the program that you are using to store data, encodes all the information in a specific format. This format is used by all other programs that read this file. Sometimes you may need to convert files to UTF-8 format, which is universally recognized by most applications. In this article, we will learn how to convert files to UTF-8 in Linux.


How to Convert Files to UTF-8 in Linux

There are many tools that allow you to convert files from one character encoding to another. We will use iconv for our purpose.


1. Check its present encoding

Open terminal and run the file command to check its present coding. Let us say you have sample.txt file.

$ sudo file -i sample.txt


2. Convert Files to UTF-8

iconv is already installed on most Linux systems by default. Here is the command to convert character encoding of file using iconv command.

$ iconv -f fro_encoding -t to_encoding sample.txt -o out.txt

In the above command you need to specify the present encoding of file in place of from_encoding and the new encoding of file in place of to_encoding.

Here is the command to convert sample.txt from ISO-8859 to UTF-8 format.

$ iconv -f ISO-8859-1 -t UTF-8//TRANSLIT sampl.txte -o out.txt

Next, you can check its new character encoding with the file command.

$ file -i out.txt


3. Convert Multiple Files to UTF-8

If you want to convert multiple files in a folder to UTF-8 using iconv, then use a for loop to run the iconv individually on each file. We will create a shell script for it.

$ sudo vi encoding.sh

Add the following lines to it.

#!/bin/bash
#enter input encoding here
FROM_ENCODING=$1
#output encoding(UTF-8)
TO_ENCODING="UTF-8"
#convert
CONVERT=" iconv  -f   $FROM_ENCODING  -t   $TO_ENCODING"
#loop to convert multiple files 
for  file  in  $2; do
     $CONVERT   "$file"   -o  "${file%.txt}.utf8.converted"
done
exit 0

Save and close the file.

Make it executable

$ sudo chmod +x encoding.sh

Run the above script with the following command. The first argument is the present encoding of files in your folder and the second argument is the folder location containing files.

$ ./encoding.sh ISO-8859-1 /home/data

The above script will convert all .txt files in the specified folder into UTF-8 and create a separate copy of each file with the extension .utf8.converted.

In this article, we have learnt how to convert files to UTF-8 format. You can use the above steps to convert one or more files.

Also read:

How to Set or Change Hostname in CentOS/RHEL
How to Convert Webpage to PDF in Linux
How to Find & Remove Duplicate Files in Linux
How to Shutdown/Reboot Remote Linux System
How to Setup DNS Caching Server in CentOS/RHEL

Leave a Reply

Your email address will not be published. Required fields are marked *