How to Import BeautifulSoup in Python
Beautiful Soup is a Python library used for parsing HTML and XML documents. It provides a simple way to navigate, search, and modify the parse tree, making it valuable for web scraping tasks. In this article, we will explore how to import BeautifulSoup in Python.
What is BeautifulSoup?
BeautifulSoup is a Python library used for parsing HTML and XML documents. It creates a parse tree that allows you to navigate and manipulate elements within these documents effortlessly. This library is commonly used for web scraping tasks, where data needs to be extracted from web pages. BeautifulSoup handles malformed HTML well and provides a convenient interface to extract data by searching for tags, attributes, and more complex patterns within the document structure.
Import BeautifulSoup in Python
Below, we will explain step-by-step how to import BeautifulSoup in Python.
Create a Virtual Environment
Open VSCode and navigate to the directory where you want to work. Create a virtual environment using the terminal in VSCode.
Install BeautifulSoup Library
With the virtual environment activated, install BeautifulSoup using pip:
pip install beautifulsoup4

Import BeautifulSoup in Python Script
Once installed, you can import BeautifulSoup into your Python script or interactive session.
from bs4 import BeautifulSoup
Verify BeautifulSoup Import
New we will write a simple Python script to verify if the BeautifulSoup module is successfully imported in Python or not. Let us see a few different examples of using the module.
Extracting Text from HTML
In this example, we use requests to fetch the HTML content of a webpage about Python from GeeksforGeeks. Using BeautifulSoup, we parse this HTML and demonstrate accessing and printing the title tag of the webpage, its tag name (title), and the name of its parent tag (head).
# import modules
import requests
from bs4 import BeautifulSoup
# requrst to the ULR
r = requests.get('https://www.geeksforgeeks.org/python-programming-language/')
# parsing HTML content
soup = BeautifulSoup(r.content, 'html.parser')
# printing details from the HTML
print(soup.title)
print(soup.title.name)
print(soup.title.parent.name)
Output:

Parsing HTML String
In this example, we are parsing a simple HTML string (html_doc) using BeautifulSoup. It extracts the title, the text of the first paragraph (<p>), and all items in a list (<ul>).
# import module
from bs4 import BeautifulSoup
# HTML document
html_doc = """
<html>
<head>
<title>GeeksforGeeks</title>
</head>
<body>
<h1>Welcome to GFG!</h1>
<p>This is BeautifulSoup Example</p>
<ul>
<li>Python</li>
<li>Java</li>
<li>C++</li>
</ul>
</body>
</html>
"""
# parsing HTML document
soup = BeautifulSoup(html_doc, 'html.parser')
# extracting details
title = soup.title
print(f"Title: {title.text}")
paragraph = soup.p
print(f"First Paragraph: {paragraph.text}")
list_items = soup.find_all('li')
print("List Items:")
for item in list_items:
print(item.text)
Output:

Conclusion
In conclusion, importing BeautifulSoup in Python is easier process that begins with creating a virtual environment and installing the library using pip. Once imported, BeautifulSoup allows efficient parsing, navigation, and extraction of data from HTML and XML documents, making it essential for web scraping tasks and beyond.