convert files to utf8

How to Convert Files to UTF8 Encoding in Linux

Every file or data has a character encoding that allows receiving computers to interpret the information in proper manner. It is because all information is stored in bits (0 or 1) and transferred as bits as well. It is only using the encoding information that computers know which number of bits to be used for each character and how. There are various encoding available such as ASCII, ANSI, Unicode. UTF-8 is the most popular character encoding since it is universal in nature and accommodates a wide range of character sets. Often when using files or sending them in Linux, you may need to convert them into UTF8 encoding. In this article, we will learn how to convert files to UTF8 encoding in Linux.


How to Convert Files to UTF8 Encoding in Linux

You can get the current encoding of a file using file command, along with -i or –mime option. Here is an example to get encoding information of file data.txt.

$ file -i data.txt
data.txt: text/x-c++; charset=us-ascii

If you don’t convert the file to the right character encoding, used by the application that is meant to process the file, it will be unable to understand the information present in your file.

You can use iconv utility to convert file from one encoding to another. Here is the command to list all supported encoding by iconv.

$ iconv -l

Here is the command to change character encoding of file using iconv.

$ iconv options -f from-encoding -t to-encoding inputfile(s) -o outputfile 

In the above command, we need to specify present encoding of file after -f option, and expected encoding of file after -t option. Here is the command to convert file from ISO-8859-1 to UTF-8 for input.file

$ iconv -f ISO-8859-1 -t UTF-8 input.file -o out.file

You can also add more options to your command. If you want to use transliteration wherever possible, use //TRANSLIT option after to-encoding type. In this case, if an input character cannot be represented by a character in target encoding, it will be approximated to another similar character.

$ iconv -f ISO-8859-1 -t UTF-8//TRANSLIT input.file -o out.file

If you want to ignore error messages in case iconv is unable to convert certain characters, use //IGNORE option.

$ iconv -f ISO-8859-1 -t UTF-8//IGNORE input.file -o out.file

If you want to convert character encoding of multiple files, you can create a shell script for it.

Create an empty shell script using the following command.

$ vi bulk_convert.sh

Add the following lines to it. Replace value_here with the character encoding of input files. Replace *.txt with the folder location of files whose encoding you want to convert.

#!/bin/bash
#enter input encoding here
FROM_ENCODING="value_here"
#output encoding(UTF-8)
TO_ENCODING="UTF-8"
#convert
CONVERT=" iconv  -f   $FROM_ENCODING  -t   $TO_ENCODING"
#loop to convert multiple files 
for  file  in  *.txt; do
     $CONVERT   "$file"   -o  "${file%.txt}.utf8.converted"
done
exit 0

Save and close the file. In the above code, we simply loop through all the .txt files in present working directory and run iconv command for each file, one by one.

Make it executable with the following command.

$ sudo chmod +x bulk_convert.sh

You can run the script with the following command.

$ ./bulk_convert.sh

In this article, we have learnt how to convert character encoding of files in Linux.

Also read:

How to Reduce Inode Usage in Linux
How to Use Port Knocking to Secure SSH in Linux
How to Enable & Disable Line Numbers in Vim
How to Read Audit Logs in Linux
How to Check User Details in Linux

Leave a Reply

Your email address will not be published. Required fields are marked *