remove punctuation from string

How to Remove Punctuation from String in Python

Sometimes you may need to remove punctuation marks from string in Python. This is commonly required during text translation, parsing and string manipulation. There are several ways to do this. In this article, we will learn a couple of ways on how to remove punctuation from string in Python.


How to Remove Punctuation from String in Python

Let us say you have the following python string.

s = "string. with. punctuation?"

We will look at some of the different ways to remove punctuation characters from our string.

The easiest way to do this is using regular expressions along with re.sub() function. Here is an example to replace all punctuation characters with blank.

import re
s = re.sub(r'[^\w\s]','',s)

In the above code, we import re module and use its sub() function to replace all occurrences of strings that match our regular expression. \w and \s are regular expressions for alphanumeric characters and whitespace characters respectively. We are instructing re.sub() to replace any non-alphanumeric and non-whitespace character with blank. This approach is commonly used and flexible, since you can easily decide whether to replace all punctuation marks or only some specific ones, just by changing the regular expression used. For example, you can selectively remove dots and retain commas if you want, with this approach, by altering your regular expression.

Alternatively, you can also use string module with its translate() function to remove punctuation marks from string. Here is an example.

import string
s.translate(None, string.punctuation)

For newer python version, you can use the following syntax instead.

import string
s.translate(str.maketrans('', '', string.punctuation))

This approach is very fast since it performs raw C operations with a lookup table, under the hood. But this approach has the limitation that it will remove all punctuation marks in your text, since it uses a ready made loookup table supplied by string.punctuation. For example, you cannot selectively remove dots and retain commas if you want, with this approach. But if you want to remove all punctuation marks, then this is probably the fastest method out there.

Of course you can also use replace() function or list comprehensions for this purpose but they are much slower than the above two approaches, especially as the string becomes longer.

In this article, we have learnt a couple of simple yet powerful ways to remove punctuation from string in python. Of course, there are many other ways to do this but the above mentioned ones are quite easy and very fast, in most cases.

Also read:

How to Do Case Insensitive Comparison in Python
How to Remove Duplicates from Array of Objects in JavaScript
How to Listen to Variable Changes in JavaScript
How to Reset Primary Key Sequence in PostgreSQL
How to Pass Parameter to SetTimeout Callback

Leave a Reply

Your email address will not be published.