How to Convert string to UTF-8 in Python

Sometimes you may need to convert string to UTF-8 in Python, especially for your web application to make it work across browsers. This can also happen if you are using python 2.x that works with ASCII encoding by default, instead of utf8. There are multiple ways to convert string to UTF8. We will look at each of them one by one. By the way, this is not a problem in Python 3.x since all strings in it are Unicode.


How to Convert string to UTF-8 in Python

Here are the different ways to convert string to UTF8 in Python.

Let us say you have the following string.

>>> test="abc"
>>> type(test)
<type 'str'>

You can convert string into utf-8 using unicode function.

>>> unitest = unicode(test)
>>> unitest
>>> u'abc'
>>> type(unitest)
>>> <type 'unicode'>

Also read : How to Undo Git Commit

You can also convert string to utf-8 using decode() function as shown below.

>>> unitest = test.decode()
>>> unitest
>>> u'abc'
>>> type(unitest)
>>> <type 'unicode'>

If you want to convert it utf-8 to string just use encode() function as shown below.

>>> string_test = unitest.encode()
>>> string_test
>>> 'abc'
>>> type(string_test)
>>> <type 'str'>

Also read : How to Delete lines in vi editor

If you are using python in a web application or mobile app, then instead of calling the above functions every time, it is advisable to add the following line to the top of your .py file.

# -*- coding: utf-8 -*-

This will ensure that all data transfer happens with UTF-8 encoding. Otherwise, you may end up getting “UnicodeDecodeError: ‘utf8’ codec can’t decode byte” error.