Home Scrape HDFC Netbanking to get balance details
Post
Cancel

Scrape HDFC Netbanking to get balance details

Web scraping financial websites can be a tricky and possibly legally questionable activity. So, this post is strictly hypothetical and for educational purposes only.

Ever wanted to keep tabs on your account balance without the hassle of logging into the netbanking site every single time? Here’s something tech-savvy to scrape your HDFC netbanking balance and feed it directly into a homescreen widget using KWGT.

Because sharing your bank login is the hottest new trend—totally safe, right? Just assume your bank has an invisible force field protecting your info. What could possibly go wrong? YOU HAVE BEEN WARNED!

Prerequisites

Install the required dependencies:

  • selenium
1
pip install selenium
  • geckodriver

Download Mozilla’s geckodriver (Chrome Webdriver will also work) and paste the binary in a folder, in this case /home/nevin/hdfc/geckodriver

  • Environment Variables:

Ensure the relevent environment variables are set on the OS to ensure the credentials are not valid

Process in Detail

This Python script automates the process of logging into HDFC Bank’s NetBanking portal, retrieving the user’s account balance, and sending it to a webhook endpoint. It uses Selenium for web automation, requests for making HTTP POST requests.

  1. Imports:
    • selenium.webdriver: Automates the browser. Specifically, it uses Firefox in this case.
    • By: A method to locate elements on the webpage (e.g., by name, ID, XPath).
    • Options: Used to configure the browser (e.g., headless mode, which runs the browser without a UI).
    • Keys: Sends keyboard inputs (like pressing the Enter key).
    • WebDriverWait and EC: Handle waiting for elements to load (such as the login form or balance).
    • requests: Sends HTTP requests to the webhook.
    • json: Converts data to JSON format for sending to the webhook.
    • os: Accesses environment variables for sensitive information (username, password).
    • sleep: Adds pauses between actions.
  2. The send_to_web function:
    • Creates a POST request to the specified url.
    • Converts the balance into JSON format.
    • Sends the data with a timeout of 5 seconds.
    • Prints the server’s response and handles exceptions (e.g., connection issues).
  3. The main function:
    Browser Setup:
    • Configures Firefox to run in headless mode.
    • Defines the location of the GeckoDriver (required to control Firefox).
    Login Process:
    • The browser is opened, and the HDFC NetBanking URL is loaded.
    • Switches to the login frame (switch_to.frame), then locates the username field and enters the login ID (retrieved from environment variables).
    • After submitting the username, the script waits for the password field to appear, then enters the password.
    Fetching Account Balance:
    • After logging in, it waits for the balance fields to become visible. The balance is split into two parts: integer and fraction (handled separately as on September 2024).
    • The balance parts are extracted as text, combined, and cleaned (removing commas).
    Send the Balance:
    • The balance is sent to the webhook via the send_to_web function.
    Logout Process:
    • After a 15-second delay, the script clicks the logout button and confirms the action.
  4. Error Handling:
    try-except blocks are used to catch any errors during browser interaction, network requests, or element locators, providing details about the error if one occurs.

    Learn more about Exceptions on Selenium and handle errors (TimeoutException or ElementNotVisibleException for example) like a pro but that is something for later.

  5. Process Cleanup:
    Whether an error occurs or not, the browser is properly closed with driver.quit() to prevent memory leaks.

Code

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
 
from selenium import webdriver
from selenium.webdriver.firefox.service import Service
from selenium.webdriver.common.by import By
from selenium.webdriver.firefox.options import Options
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

import requests
import json
from time import sleep
import os

def send_to_web(balance):
    url = 'https://your-webserver-endpoint/hdfc-balance'
    headers = {'Content-Type': 'application/json'}
    try:
        payload = {'balance': balance}
        print(payload)
        response = requests.post(url, headers=headers, data=json.dumps(payload), timeout=5)
        print(response.headers)
        print(response.status_code)
        print(response.json())            

    except Exception as e:
        print(f"Error sending to {url} : {e}")

def main():
    firefox_options = Options()
    firefox_options.add_argument("--headless")
    gecko_driver_path = "/home/nevin/hdfc/geckodriver"
    service = Service(gecko_driver_path)

    driver = None
    try:
        driver = webdriver.Firefox(service=service, options=firefox_options)
        driver.get("https://netbanking.hdfcbank.com/netbanking/")
        driver.implicitly_wait(2)

        # Switch to login frame and entet the username
        driver.switch_to.frame("login_page")
        driver.find_element(By.NAME, "fldLoginUserId").send_keys(
            os.environ.get("HDFC_USER"), Keys.ENTER
        )

        driver.switch_to.default_content()

        # Wait for password field to show up and send the passphrase
        pass_element = WebDriverWait(driver, 30).until(
            EC.element_to_be_clickable((By.XPATH, "//input[@id='keyboard']"))
        )
        pass_element.send_keys(os.environ.get("HDFC_PASS"), Keys.ENTER)

        # Wait for balance fields integer and fraction parts are seperate as of September 2024
        integer_part = WebDriverWait(driver, 30).until(
            EC.visibility_of_element_located(
                (By.XPATH, "//span[@ng-style=\"{'font-size': '22px'}\"]")
            )
        )
        fraction_part = WebDriverWait(driver, 30).until(
            EC.visibility_of_element_located(
                (By.XPATH, "//span[@ng-style=\"{'font-size': '16px'}\"]")
            )
        )

        # Extract balance
        integer_value = integer_part.text
        fraction_value = fraction_part.text

        print(f"Integer part: {integer_value}")
        print(f"Fraction part: {fraction_value}")

        # Prepare balance for webhook by replacing unnecessary chars so it can be manipulated as number on kwgt
        balance_to_webhook = f"{integer_value}{fraction_value}".replace(",", "")
        send_to_web(balance_to_webhook)

        #logout
        sleep(15)
        logout_button = WebDriverWait(driver, 30).until(
            EC.element_to_be_clickable(
                (By.XPATH, "//button[@class='btn btn-primary login-btn']")
            )
        )
        logout_button.click()

        #logout yes confirmation
        yes_button = WebDriverWait(driver, 30).until(
            EC.element_to_be_clickable((By.XPATH, "//a[contains(@class, 'yes-btn')]"))
        )
        yes_button.click()

        sleep(5)

    except Exception as e:
        print(f"An error occurred: {e}")

    finally:
        # Ensure the driver quits to prevent memory leaks
        if driver:
            driver.quit()

if __name__ == "__main__":
    main()

KWGT

Window shadow HDFC bank balance displayed on a KWGT widget

This post is licensed under CC BY 4.0 by the author.