Check if String is Substring of Items in List

How to Check if Substring is in List of Strings

Python List is a great way to store a number of strings in a compact manner. It also supports many useful functions to work with string items. Generally, Python developers need to check if a substring is a part of a larger string or text. This can be done easily. But sometimes they need to check if a substring is contained in a list of strings. In this case, you will have to completely change your approach. Luckily, there are several simple ways to do this. In this article, we will learn 5 different ways to check if substring is list of strings in Python.

Check if Substring is in List of Strings

Let us say you have the following list of strings in Python.

test = ['abc-123', 'xyz-456', 'pqr-789', 'abc-456']

Let us say you want to check if string ‘abc’ is in substring of items in list. Typically, developers use the following way to check this.

if 'abc' in test:
   #do something

But the above approach does not work because python will check the list to see if there is an element exactly equal to ‘abc’, and not if ‘abc’ is a substring of it. So the above if condition will return False.

So let us look at the different ways to solve this problem.

1. Using join()

In this method, we use join() function to concatenate the list items into a single large string and then check if the substring is a part of the combined string. If present, then it must also be present in any of the list items.

test = ['abc-123', 'xyz-456', 'pqr-789', 'abc-456']
combined='\t'.join(test)

>>> 'abc' in combined
True
>>> 'der' in combined
False

In the above code, we have combined the list strings using whitespace character \t. You can also use plain space ‘ ‘ as per your requirement.

The time complexity of this approach is O(n), where n is the total length of all list strings put together.

This is one of the fastest and most recommended way to quickly check if a substring is present in a list of strings. It works well with short as well as large lists. But will also require temporary space of O(n) to store the combined string.

2. Using any()

In this solution, we loop through the list and check if the substring is present in any of them. Since each iteration of the loop will return True/False, we will pass the list of results to any() function that checks if any of the return values is True.

test = ['abc-123', 'xyz-456', 'pqr-789', 'abc-456']
check = 'abc'
result = any(check in sub for sub in test) //Output is True

>>> any('abc' in sub for sub in test)
True
>>> any('der' in sub for sub in test)
False

In the above code, the part inside any() function is a loop that iterates through our list and checks each item for the substring. This loop will return a list of boolean values, one for each list item. E.g [True, False, False, True]. The any() function will go through this list of boolean values and return True if any of the list items is True.

It is slower than using join() function since you need to iterate through the list twice, once using loop and then using any() function. Even in each iteration, you need to use in operator to check substring. But its auxiliary space requirements are not as big as that of using join() since in this case, we just need to store a list of boolean values, instead of storing the combined string, as in the case of using join() function.

3. Using filter and lambda

Filter() function filters a list for a given function or condition and returns an array of those items that return true for the filter function or condition. We can replace this function with a lambda function which is basically an anonymous function, to save memory. Here is an example to illustrate this.

test = ['abc-123', 'xyz-456', 'pqr-789', 'abc-456']
check = 'abc'
result = any(filter(lambda x: check in x, test)) //Output is True

>>> any(filter(lambda x: 'abc' in x, test))
True
>>> any(filter(lambda x: 'der' in x, test))
False

In the above code, the lambda function checks if the substring is present in a string. The filter() function applies this function for each item of original list and returns a list of items that pass this filter condition. Lastly, any() checks if the list is empty or not. If it contains at least one item, the result is true indicating that the substring is present in at least one item. This method also has time complexity of O(n) but it does not require much auxiliary space since we work with lambda function.

4. Using find()

In this method, we loop through the list and call find() function on each item, to check if the substring exists in the item. find() function iterates through the list and returns the first occurrence of the substring. Otherwise, it returns -1.

test = ['abc-123', 'xyz-456', 'pqr-789', 'abc-456']
check = 'abc'
result = False

for i in test:
if(i.find(check)!=-1):
result=True

print(result) // Output is True

5. Using List Comprehension

All the above solutions tell us if a substring is present in a list of strings or not. They do not tell us the actual item that contains the substring. To get all items that contain substring ‘abc’, you need to check it in the following way using list comprehension.

>>> subs = [s for s in test if "abc" in s]
>>> print subs
['abc-123', 'abc-456']

In this case, if you want a boolean result then you can simply check the list length to obtain True/False value.

>>> subs = [s for s in test if "abc" in s]
>>> print subs>0
True
>>> subs = [s for s in test if "der" in s]
>>> print subs>0
False

In this article, we have learnt how to check if string is substring in Python list.

Conclusion

In this article, we have learnt several simple ways to easily check if a substring is present in a list of strings. As mentioned earlier, you cannot directly use any of the methods used to check if a substring is present in a string. Instead, you need to loop through the list and then check if substring is present in any of the items. Therefore, almost every method has a time complexity of O(n). But the solution using join() is fastest since in this method we call ‘in‘ operator only once on the combined string. In all other methods, we have to call it on each item individually. The ‘in’ operator along with loop slows down the operation, especially for large lists. On the flipside, using join() requires us to store the combined string, which may take up a lot of space in case of large lists. So if you do not mind using some memory, use method #1. Otherwise, you can use any of the other methods.

Also read:

How to Check if Column is Empty or Null in MySQL
How to Modify MySQL Column to Allow Null
How to Schedule Multiple Cron Jobs in One Crontab
How to POST JSON Data in Python
How to Remove SSL and SSH Passphrase

Leave a Reply

Your email address will not be published. Required fields are marked *