Controlling the Web Browser with Python

Last Updated : 23 Sep, 2021

In this article, we are going to see how to control the web browser with Python using selenium. Selenium is an open-source tool that automates web browsers. It provides a single interface that lets you write test scripts in programming languages like Ruby, Java, NodeJS, PHP, Perl, Python, and C#, etc.

To install this module, run these commands into your terminal:

pip install selenium

For automation please download the latest Google Chrome along with chromedriver from here.

Here we will automate the authorization at "https://auth.geeksforgeeks.org" and extract the Name, Email, Institute name from the logged-in profile.

Initialization and Authorization

First, we need to initiate the web driver using selenium and send a get request to the url and Identify the HTML document and find the input tags and button tags that accept username/email, password, and sign-in button.

To send the user given email and password to the input tags respectively:

driver.find_element_by_name('user').send_keys(email)
driver.find_element_by_name('pass').send_keys(password)

Identify the button tag and click on it using the CSS selector via selenium webdriver:

driver.find_element_by_css_selector('button.btn.btn-green.signin-button').click()

Scraping Data

Scraping Basic Information from GFG Profile

After clicking on Sign in, a new page should be loaded containing the Name, Institute Name, and Email id. Identify the tags containing the above data and select them.

container = driver.find_elements_by_css_selector('div.mdl-cell.mdl-cell--9-col.mdl-cell--12-col-phone.textBold')

Get the text from each of these tags from the returned list of selected css selectors:

name = container[0].text
try:
    institution = container[1].find_element_by_css_selector('a').text
except:
    institution = container[1].text
email_id = container[2].text

Finally, print the output:

print({"Name": name, "Institution": institution, "Email ID": email})

Scraping Information from Practice tab

Click on the Practice tab and wait for few seconds to load the page.

driver.find_elements_by_css_selector('a.mdl-navigation__link')[1].click()

Find the container containing all the information and select the grids using CSS selector from the container having information.

container = driver.find_element_by_css_selector('div.mdl-cell.mdl-cell--7-col.mdl-cell--12-col-phone.whiteBgColor.mdl-shadow--2dp.userMainDiv')

grids = container.find_elements_by_css_selector('div.mdl-grid')

Iterate each of the selected grids and extract the text from it and add it to a set/list for output.

res = set()
for grid in grids:
    res.add(grid.text.replace('\n',':'))

Below is the full implementation:

Python3

# Import the required modules
from selenium import webdriver
import time

# Main Function
if __name__ == '__main__':

    # Provide the email and password
    email = 'example@example.com'
    password = 'password'

    options = webdriver.ChromeOptions()
    options.add_argument("--start-maximized")
    options.add_argument('--log-level=3')

    # Provide the path of chromedriver present on your system.
    driver = webdriver.Chrome(executable_path="C:/chromedriver/chromedriver.exe",
                              chrome_options=options)
    driver.set_window_size(1920,1080)

    # Send a get request to the url
    driver.get('https://auth.geeksforgeeks.org/')
    time.sleep(5)

    # Finds the input box by name in DOM tree to send both 
    # the provided email and password in it.
    driver.find_element_by_name('user').send_keys(email)
    driver.find_element_by_name('pass').send_keys(password)
    
    # Find the signin button and click on it.
    driver.find_element_by_css_selector(
        'button.btn.btn-green.signin-button').click()
    time.sleep(5)

    # Returns the list of elements
    # having the following css selector.
    container = driver.find_elements_by_css_selector(
        'div.mdl-cell.mdl-cell--9-col.mdl-cell--12-col-phone.textBold')
    
    # Extracts the text from name, 
    # institution, email_id css selector.
    name = container[0].text
    try:
        institution = container[1].find_element_by_css_selector('a').text
    except:
        institution = container[1].text
    email_id = container[2].text

    # Output Example 1
    print("Basic Info")
    print({"Name": name, 
           "Institution": institution,
           "Email ID": email})

    # Clicks on Practice Tab
    driver.find_elements_by_css_selector(
      'a.mdl-navigation__link')[1].click()
    time.sleep(5)

    # Selected the Container containing information
    container = driver.find_element_by_css_selector(
      'div.mdl-cell.mdl-cell--7-col.mdl-cell--12-col-phone.\
      whiteBgColor.mdl-shadow--2dp.userMainDiv')
    
    # Selected the tags from the container
    grids = container.find_elements_by_css_selector(
      'div.mdl-grid')
    
    # Iterate each tag and append the text extracted from it.
    res = set()
    for grid in grids:
        res.add(grid.text.replace('\n',':'))

    # Output Example 2
    print("Practice Info")
    print(res)

    # Quits the driver
    driver.close()
    driver.quit()