How to Remove tags using BeautifulSoup in Python?
Prerequisite- Beautifulsoup module
In this article, we are going to draft a python script that removes a tag from the tree and then completely destroys it and its contents. For this, decompose() method is used which comes built into the module.
Syntax:
Beautifulsoup.Tag.decompose()
Tag.decompose() removes a tag from the tree of a given HTML document, then completely destroys it and its contents.
Implementation:
Example 1:
# import module
from bs4 import BeautifulSoup
# URL for scraping data
markup = '<a href="https://www.geeksforgeeks.org/">Welcome to <i>geeksforgeeks.com</i></a>'
# get URL html
soup = BeautifulSoup(markup, 'html.parser')
# display before decompose
print("Before Decompose")
print(soup.a)
# decomposing the
# soup data
new_tag = soup.a.decompose()
print("After decomposing:")
print(new_tag)
Output:
Before Decompose
<a href="https://www.geeksforgeeks.org/">Welcome to <i>geeksforgeeks.com</i></a>
After decomposing:
None
Example 2: Implementation of given URL to scrape the HTML document.
# import module
from bs4 import BeautifulSoup
import requests
# Get URL html
# Scraping the data from
# Html doc
url = 'https://www.geeksforgeeks.org/'
reqs = requests.get(url)
soup = BeautifulSoup(reqs.text, 'html.parser')
# Before decomposing
print("Before Decomposing")
print(soup)
# decompose the soup
result = soup.decompose()
print("After decomposing:")
print(result)
Output:
Before Decomposing
<!DOCTYPE html>
<!--[if IE 7]>
<html class="ie ie7" lang="en-US" prefix="og: http://ogp.me/ns#">
<![endif]-->
<!--[if IE 8]>
<html class="ie ie8" lang="en-US" prefix="og: http://ogp.me/ns#">
<![endif]-->
<!--[if !(IE 7) | !(IE 8) ]><!-->
<html lang="en-US" prefix="og: http://ogp.me/ns#">
<!--<![endif]-->
<head>
<meta charset="utf-8"/>..
......
After decomposing:
None