Web scraping financial websites can be a tricky and possibly legally questionable activity. So, this post is strictly hypothetical and for educational purposes only.
Ever wanted to keep tabs on your account balance without the hassle of logging into the netbanking site every single time? Here’s something tech-savvy to scrape your HDFC netbanking balance and feed it directly into a homescreen widget using KWGT.
Because sharing your bank login is the hottest new trend—totally safe, right? Just assume your bank has an invisible force field protecting your info. What could possibly go wrong? YOU HAVE BEEN WARNED!
Prerequisites
Install the required dependencies:
- selenium
1
pip install selenium
- geckodriver
Download Mozilla’s geckodriver (Chrome Webdriver will also work) and paste the binary in a folder, in this case /home/nevin/hdfc/geckodriver
- Environment Variables:
Ensure the relevent environment variables are set on the OS to ensure the credentials are not valid
Process in Detail
This Python script automates the process of logging into HDFC Bank’s NetBanking portal, retrieving the user’s account balance, and sending it to a webhook endpoint. It uses Selenium
for web automation, requests
for making HTTP POST requests.
- Imports:
selenium.webdriver
: Automates the browser. Specifically, it uses Firefox in this case.
By
: A method to locate elements on the webpage (e.g., by name, ID, XPath).
Options
: Used to configure the browser (e.g., headless mode, which runs the browser without a UI).
Keys
: Sends keyboard inputs (like pressing the Enter key).
WebDriverWait
andEC
: Handle waiting for elements to load (such as the login form or balance).
requests
: Sends HTTP requests to the webhook.
json
: Converts data to JSON format for sending to the webhook.
os
: Accesses environment variables for sensitive information (username, password).
sleep
: Adds pauses between actions.
- The
send_to_web
function: - Creates a POST request to the specified
url
.
- Converts the balance into JSON format.
- Sends the data with a timeout of 5 seconds.
- Prints the server’s response and handles exceptions (e.g., connection issues).
- Creates a POST request to the specified
- The
- The
main
function: - Browser Setup:
- Configures Firefox to run in headless mode.
- Defines the location of the GeckoDriver (required to control Firefox).
- Login Process:
- The browser is opened, and the HDFC NetBanking URL is loaded.
- Switches to the login frame (
switch_to.frame
), then locates the username field and enters the login ID (retrieved from environment variables).
- After submitting the username, the script waits for the password field to appear, then enters the password.
- Fetching Account Balance:
- After logging in, it waits for the balance fields to become visible. The balance is split into two parts: integer and fraction (handled separately as on September 2024).
- The balance parts are extracted as text, combined, and cleaned (removing commas).
- Send the Balance:
- The balance is sent to the webhook via the
send_to_web
function.
- The balance is sent to the webhook via the
- Logout Process:
- After a 15-second delay, the script clicks the logout button and confirms the action.
- The
- Error Handling:
try-except
blocks are used to catch any errors during browser interaction, network requests, or element locators, providing details about the error if one occurs.
Learn more about Exceptions on Selenium and handle errors (
TimeoutException
orElementNotVisibleException
for example) like a pro but that is something for later.- Process Cleanup:
- Whether an error occurs or not, the browser is properly closed with
driver.quit()
to prevent memory leaks.
Code
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
from selenium import webdriver
from selenium.webdriver.firefox.service import Service
from selenium.webdriver.common.by import By
from selenium.webdriver.firefox.options import Options
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
import requests
import json
from time import sleep
import os
def send_to_web(balance):
url = 'https://your-webserver-endpoint/hdfc-balance'
headers = {'Content-Type': 'application/json'}
try:
payload = {'balance': balance}
print(payload)
response = requests.post(url, headers=headers, data=json.dumps(payload), timeout=5)
print(response.headers)
print(response.status_code)
print(response.json())
except Exception as e:
print(f"Error sending to {url} : {e}")
def main():
firefox_options = Options()
firefox_options.add_argument("--headless")
gecko_driver_path = "/home/nevin/hdfc/geckodriver"
service = Service(gecko_driver_path)
driver = None
try:
driver = webdriver.Firefox(service=service, options=firefox_options)
driver.get("https://netbanking.hdfcbank.com/netbanking/")
driver.implicitly_wait(2)
# Switch to login frame and entet the username
driver.switch_to.frame("login_page")
driver.find_element(By.NAME, "fldLoginUserId").send_keys(
os.environ.get("HDFC_USER"), Keys.ENTER
)
driver.switch_to.default_content()
# Wait for password field to show up and send the passphrase
pass_element = WebDriverWait(driver, 30).until(
EC.element_to_be_clickable((By.XPATH, "//input[@id='keyboard']"))
)
pass_element.send_keys(os.environ.get("HDFC_PASS"), Keys.ENTER)
# Wait for balance fields integer and fraction parts are seperate as of September 2024
integer_part = WebDriverWait(driver, 30).until(
EC.visibility_of_element_located(
(By.XPATH, "//span[@ng-style=\"{'font-size': '22px'}\"]")
)
)
fraction_part = WebDriverWait(driver, 30).until(
EC.visibility_of_element_located(
(By.XPATH, "//span[@ng-style=\"{'font-size': '16px'}\"]")
)
)
# Extract balance
integer_value = integer_part.text
fraction_value = fraction_part.text
print(f"Integer part: {integer_value}")
print(f"Fraction part: {fraction_value}")
# Prepare balance for webhook by replacing unnecessary chars so it can be manipulated as number on kwgt
balance_to_webhook = f"{integer_value}{fraction_value}".replace(",", "")
send_to_web(balance_to_webhook)
#logout
sleep(15)
logout_button = WebDriverWait(driver, 30).until(
EC.element_to_be_clickable(
(By.XPATH, "//button[@class='btn btn-primary login-btn']")
)
)
logout_button.click()
#logout yes confirmation
yes_button = WebDriverWait(driver, 30).until(
EC.element_to_be_clickable((By.XPATH, "//a[contains(@class, 'yes-btn')]"))
)
yes_button.click()
sleep(5)
except Exception as e:
print(f"An error occurred: {e}")
finally:
# Ensure the driver quits to prevent memory leaks
if driver:
driver.quit()
if __name__ == "__main__":
main()