strip html from text in javascript

How to Strip HTML from Text in JavaScript

JavaScript allows you to perform a wide range of validations on user input on your website. Often you may need to strip HTML from text in JavaScript. This is especially important if your web pages have form submissions. This is because malicious users and bots can use them to inject HTML code into your database, which when rendered back on your website, may get executed. In this article, we will learn how to strip HTML from text in JavaScript.


How to Strip HTML from Text in JavaScript

There are several ways to remove HTML from text in JavaScript. We will look at a couple of simple ways to do this.

Here is a function that removes HTML from strings.

function stripHtml(html)
{
   let tmp = document.createElement("DIV");
   tmp.innerHTML = html;
   return tmp.textContent || tmp.innerText || "";
}

The above function accepts a string, and saves it as the inner HTML of a div. Then it extracts the text content and inner HTML from the the DIV and returns them, thereby automatically removing HTML content from string.

If you happen to use jQuery on your web page, then here is a simple function call that does the same thing as above function.

jQuery(html).text();

Please note, both the above functions MUST NOT be used on strings obtained from user input since they can contain malicious HTML code, which need to thoroughly parsed. In such cases, you can use the following approach using DOMParser, which is a web API that allows you to easily parse XML and HTML strings into DOM document. It is supported by most, if not all, modern browsers.

Here is a simple function to use DOMParser to strip HTML from string.

function strip(html){
   let doc = new DOMParser().parseFromString(html, 'text/html');
   return doc.body.textContent || "";
}

The above function uses parseFromString() function to convert string to HTML and then return only its textContent, without creating a DOM element.

These solutions work only in web browsers.

If you are using a non-browser platform like NodeJS and need to remove HTML from string, then you need to use regular expressions to identify HTML tags in your string, and then use replace() function to remove them.

Here is an example function for this purpose.

function removeTags(str) {
    if ((str===null) || (str===''))
        return false;
    else
        str = str.toString();
          
    return str.replace( /(<([^>]+)>)/ig, '');
}
document.write(removeTags(
    '<html>Welcome to Hello World.</html>'));;

The above function first converts input to string, and then uses regular expression /(<([^>]+)>)/ to identify HTML tags in string. We call replace function to replace all instances of matching HTML tags with blanks strings, effectively removing them.

In this article, we have learnt several ways to remove HTML tags from string.

Also read:

How to Preload Images With jQuery
How to Make Python Dictionary From Two Lists
How to Check if JavaScript Object Property is Undefined
How to get Difference Between Two Dates in JavaScript
How to Shuffle Array in JavaScript

Leave a Reply

Your email address will not be published. Required fields are marked *