non-ascii characters in mysql

How to Find Non-ASCII Characters in MySQL

MySQL is a popular database that allows you to store different kinds of character sets such as ASCII, UTF-8, etc. Sometimes your database may contain characters such as á, ä, etc. that may not be ASCII characters (A-Z, a-z, 0-9). In such cases, how do you select rows which do not contain ASCII characters? In this article, we will learn how to find Non-ASCII characters in MySQL.


How to Find Non-ASCII Characters in MySQL

There are different ways to find Non-ASCII characters in MySQL.

Let us say you have column called full_name in your table data that contains non-ASCII characters then here is the query to select all rows from data that contain non-ASCII characters. Replace full_name with the column you want to check, and replace data with the table name that you want to check.

mysql> select * 
       from data
       where full_name <> convert(full_name USING ASCII)

In the above query, we basically select rows where the full_name column’s value does not match the ASCII-converted value of full_name column. We use convert() function to convert full_name column into ASCII characters. It converts unconvertable characters into replacement characters. So the unconverted and ASCII-converted values will not match. Please note, convert() function only the output of convert function is converted, and the actual column in database.

You can use any character set in place of ASCII above. For example, if you want to find characters that do not conform to cp1257 character set, then use the following query.

mysql> select * 
       from data
       where full_name <> convert(full_name USING cp1257)

Alternatively, you can also use regular expressions to find non-ASCII characters. ASCII character set is captured using regex [A-Za-z0-9]. You can use this regex in your query as shown below, to find non-ASCII characters.

mysql> SELECT * FROM tableName WHERE columnToCheck NOT REGEXP '[A-Za-z0-9]';

For our example above, the query will be

mysql> SELECT * FROM data WHERE full_name NOT REGEXP '[A-Za-z0-9]';

You can also customize regex to include certain characters. For example, if you don’t want to select rows that contain underscore, just add it to the above regex.

mysql> SELECT * FROM data WHERE full_name NOT REGEXP '[A-Za-z0-9_]';

In this short article, we have learnt how to find non-ASCII characters in MySQL. MySQL supports plenty of different character sets and encodings. You can use the above queries to find rows that do not conform to your required character set.

Also read:

How to Use Reserved Word as Column Name in MySQL
How to Show Banned IP Address in Fail2ban
How to Unban IP Address in Fail2ban
NGINX Allow Local Network
How to Prevent Direct Access to Images in NGINX

Leave a Reply

Your email address will not be published. Required fields are marked *