case insensitive comparison in python

How to Do Case Insensitive String Comparison in Python

By default, python performs case sensitive string comparison. In other words, if you try to compare ‘Hello’ with ‘hello’ it will return false. But sometimes you may want to do case insensitive string comparison in Python. In this article, we will learn how to do this in Python.


How to Do Case Insensitive String Comparison in Python

Let us say you have two strings as shown below

str1="Hello"
str2="hello"

If you try to compare them using ‘==’ comparison operator you will get result as false.

>>> str1 == str2
false

If your strings do not contain Unicode characters but only ASCII characters, then you can use lower() (or upper()) functions to convert both the strings to same case and then to the comparison, to determine if they are equal or not. Please note, lower() and upper() functions do not change the original strings but only returns a lower or upper case version of strings. Here is an example to perform string comparison this way.

if str1.lower() == str2.lower():
    print("The strings are the same")
else:
    print("The strings are NOT the same")

If you are using Python 3.3+, then you can also use casefold() function for the same purpose. Here is an example for it.

if str1.casefold() == str2.casefold():
    print("The strings are the same")
else:
    print("The strings are NOT the same")

casefold() function is more thorough compared to upper() or lower(), when it comes to caselessness. For example, the German lowercase letter ‘ß’ is equal to “ss”. Since it is already lowercase, lower() function would do nothing to ‘ß’; casefold() converts it to “ss”.

If you have Unicode characters in your string, then you need to use unicodedata module to normalize the strings before comparing them. Let us say you have the following 2 strings with Unicode character.

str1= "ê" 
str2= "ê"

They may appear to be same but if you try to compare them you will get result as False.

>>> "ê" == "ê"
False

They may seem same because of your font rendered but in reality their accents are different. You can check it with unicodedata module.

import unicodedata

>>> [unicodedata.name(char) for char in "ê"]
['LATIN SMALL LETTER E WITH CIRCUMFLEX']

>>> [unicodedata.name(char) for char in "ê"]
['LATIN SMALL LETTER E', 'COMBINING CIRCUMFLEX ACCENT']

In such cases, you can easily normalize both strings using unicodedata.normalize() function.

>>> unicodedata.normalize("NFKD", str1) == unicodedata.normalize("NFKD", str2)
True

OR

>>> unicodedata.normalize("NFKD", "ê") == unicodedata.normalize("NFKD", "ê")
True

In this article, we have learnt how to do case insensitive string comparison in Python. Depending on whether your strings contain Unicode characters or not, you can use choose to normalize them before comparing, or directly use lower()/casefold() functions to perform comparison.

Also read:

How to Remove Duplicates From Array of Objects in JavaScript
How to Listen to Variable Changes in JavaScript
How to Reset Primary Key Sequence in PostgreSQL
How to Pass Parameter to SetTimeout Callback
How to Generate Random String Characters in JavaScript

Leave a Reply

Your email address will not be published.