remove unicode characters from string in javascript

Remove Unicode Characters from String

Unicode is a widely used character format, employed by websites and organizations all over the world. However, sometimes you may have received a Unicode string but your existing software platforms support only ASCII characters. In such cases, you will need to remove Unicode characters from string. In this article, we will learn how to do this using JavaScript.


Remove Unicode Characters from String

Let us say you have the following Unicode string.

s = 'Göödnight'

As you can see, the above string contains non-ascii characters like ö.

ASCII character set has integer value from 0-127 so you can use replace() function available in each JavaScript string by default, to replace Unicode characters with ASCII ones, as shown below.

s.replace(/[^\x00-\x7F]/g, "");
console.log(s); //Goodnight

In the above code, \x00 and \x7F are UTF equivalent of 0 and 127 respectively. We use ^ character to indicate inversion, that is, NOT operator. Basically we are saying that we want to replace all characters that are NOT having integer value between 0 – 127 (ASCII character set), with empty space, that is, remove them.

Alternatively, you can use the following regular expression in replace() function. Here we use regular expression specifying Unicode character set, instead of

s = s.replace(/[\u{0080}-\u{FFFF}]/gu,"");

In this article, we have learnt how to remove Unicode characters from string. For example, you can use them in websites or web applications to dynamically remove Unicode characters before accepting user input of before rendering them on your web pages.

Also read:

Remove Accents/Diatrics from String in Python
Remove Accents/Diatrics from String in JavaScript
How to Convert Array to JS Object
How to Test for Empty JS Object
How to Convert RGB to hex and Hex to RGB in JavaScript

Leave a Reply

Your email address will not be published. Required fields are marked *