How to remove empty tags using BeautifulSoup in Python?
Last Updated :
26 Nov, 2020
Improve
Prerequisite: Requests, BeautifulSoup, strip
The task is to write a program that removes the empty tag from HTML code. In Beautiful Soup there is no in-built method to remove tags that has no content.
Module Needed:
- bs4: Beautiful Soup(bs4) is a Python library for pulling data out of HTML and XML files. This module does not come built-in with Python. To install this type the below command in the terminal.
pip install bs4
- requests: Requests allows you to send HTTP/1.1 requests extremely easily. This module also does not comes built-in with Python. To install this type the below command in the terminal.
pip install requests
Approach:
- Get HTML Code
- Iterate through each tag
- Fetching text from the tag and remove whitespaces using the strip.
- After removing whitespace, check If the length of the text is zero remove the tag from HTML code.
Example 1: Remove empty tag.
# Import Module
from bs4 import BeautifulSoup
# HTML Object
html_object = """
<p>
<p></p>
<strong>some<br>text<br>here</strong></p>
"""
# Get HTML Code
soup = BeautifulSoup( html_object , "lxml")
# Iterate each line
for x in soup.find_all():
# fetching text from tag and remove whitespaces
if len(x.get_text(strip=True)) == 0:
# Remove empty tag
x.extract()
# Print HTML Code with removed empty tags
print(soup)
Output:
<html><body><strong>sometexthere</strong> </body></html>
Example 2: Remove empty tag from a given URL.
# Import Module
from bs4 import BeautifulSoup
import requests
# Page URL
URL = "https://www.geeksforgeeks.org/"
# Page content from Website URL
page = requests.get( URL )
# Get HTML Code
soup = BeautifulSoup( page.content , "lxml" )
# Iterate each line
for x in soup.find_all():
# fetching text from tag and remove whitespaces
if len( x.get_text ( strip = True )) == 0:
# Remove empty tag
x.extract()
# Print HTML Code with removed empty tags
print(soup)
Output: