Open In App

How to remove empty tags using BeautifulSoup in Python?

Last Updated : 26 Nov, 2020
Summarize
Comments
Improve
Suggest changes
Share
Like Article
Like
Report

Prerequisite: Requests, BeautifulSoup, strip

The task is to write a program that removes the empty tag from HTML code. In Beautiful Soup there is no in-built method to remove tags that has no content.

Module Needed:

  • bs4: Beautiful Soup(bs4) is a Python library for pulling data out of HTML and XML files. This module does not come built-in with Python. To install this type the below command in the terminal.
pip install bs4
  • requests:  Requests allows you to send HTTP/1.1 requests extremely easily. This module also does not comes built-in with Python. To install this type the below command in the terminal.
pip install requests

Approach:

  • Get HTML Code
  • Iterate through each tag
    • Fetching text from the tag and remove whitespaces using the strip.
    • After removing whitespace, check If the length of the text is zero remove the tag from HTML code.

Example 1: Remove empty tag.

Python3
# Import Module
from bs4 import BeautifulSoup

# HTML Object
html_object = """

<p>
<p></p>
<strong>some<br>text<br>here</strong></p>

"""

# Get HTML Code
soup = BeautifulSoup( html_object , "lxml")

# Iterate each line
for x in soup.find_all():

    # fetching text from tag and remove whitespaces
    if len(x.get_text(strip=True)) == 0:
        
        # Remove empty tag
        x.extract()

# Print HTML Code with removed empty tags
print(soup)

Output:

<html><body><strong>sometexthere</strong>
</body></html>

Example 2: Remove empty tag from a given URL.

Python3
# Import Module
from bs4 import BeautifulSoup
import requests

# Page URL
URL = "https://www.geeksforgeeks.org/"

# Page content from Website URL
page = requests.get( URL )

# Get HTML Code
soup = BeautifulSoup( page.content , "lxml" )

# Iterate each line
for x in soup.find_all():

    # fetching text from tag and remove whitespaces
    if len( x.get_text ( strip = True )) == 0:

        # Remove empty tag
        x.extract()

# Print HTML Code with removed empty tags
print(soup)

Output:


Next Article

Similar Reads