Featured image of post Monitor Websites for Changes with Python and Playwright

Monitor Websites for Changes with Python and Playwright

Webscraping with intelligent change dectection

How to Monitor Websites for Changes Like a Cyber Detective

So, youโ€™ve got a bunch of websites you want to keep an eye on.

Maybe itโ€™s for price tracking, competitor analysis, or youโ€™re just super nosy (hey, no judgment).

The problem? Websites changeโ€”sometimes silently, without you even knowing.

And how to sense WHAT changed? You wonder.. well so do I.. so keep reading…

SIDENOTE: This technique is also great for simple automated smoke testing of website projects.. :)


๐Ÿš€ The Game Plan

To monitor websites, we need to:

  1. Fetch the webpage
  2. Detect changes
  3. Notify a human (because machines still need usโ€ฆ for now).

There are a bunch of ways to do this. Letโ€™s break them down, from the quick and dirty to the fully automated magic.


๐Ÿ•ต๏ธโ€โ™‚๏ธ 1. Web Scraping + Change Detection

If the website doesnโ€™t require a login, this is the easiest way.

๐Ÿ”ง How It Works

  1. Grab the webpageโ€™s HTML using Pythonโ€™s requests library.
  2. Parse it with BeautifulSoup to extract the juicy parts.
  3. Compare old vs. new results to detect changes.

๐Ÿ“ Code Example

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
import requests
from bs4 import BeautifulSoup

# The URL you want to monitor
url = "https://example.com/search-results"

# Fetch the page
response = requests.get(url)
soup = BeautifulSoup(response.text, "html.parser")

# Extract the search results (adjust selector based on the site)
results = soup.find_all("div", class_="result")

# Save or compare with previous results...
print(results)

๐Ÿ’ก Pros:
โœ… Fast
โœ… Simple
โœ… No browser needed

โš ๏ธ Cons:
โŒ Wonโ€™t work for JavaScript-heavy sites
โŒ Wonโ€™t work for sites that need a login


๐Ÿ” 2. What If the Site Requires Login?

If the website needs a login, we have to be sneaky. Here are your options:

๐ŸŽ๏ธ Option 1: Use requests.Session for Simple Logins

Some sites let you log in with plain old form data. If so, we can log in programmatically and scrape pages like a ninja.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
import requests

session = requests.Session()

# Login URL
login_url = "https://example.com/login"

# Your login data
payload = {
    "username": "your_username",
    "password": "your_password"
}

# Log in
session.post(login_url, data=payload)

# Now scrape the search results page
response = session.get("https://example.com/search-results")
print(response.text)

โœ… Works for simple login forms
โŒ Fails if the site uses JavaScript for authentication


๐Ÿ–ฅ๏ธ 3. Use Your Existing Chrome Login (The Easiest Hack!)

Letโ€™s say youโ€™re already logged into the site in Chrome.

Can we justโ€ฆ use that session?

Yes, we can. ๐ŸŽฉ

Option 1: Selenium with Your Chrome Profile

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
from selenium import webdriver
from selenium.webdriver.chrome.options import Options

chrome_profile_path = "C:/Users/YourUser/AppData/Local/Google/Chrome/User Data"

chrome_options = Options()
chrome_options.add_argument(f"user-data-dir={chrome_profile_path}")  
chrome_options.add_argument("profile-directory=Default")  

driver = webdriver.Chrome(options=chrome_options)
driver.get("https://example.com/search-results")

# Scrape the page...
print(driver.page_source)

driver.quit()

๐Ÿ’ก Why is this cool?

  • โœ… Uses your already logged-in session.
  • โœ… Works with JavaScript-heavy sites.
  • โœ… No need to log in every time.

โš ๏ธ Downsides?

  • โŒ Requires ChromeDriver installed.
  • โŒ Slower than requests.

โšก 4. The Playwright Alternative (Faster Than Selenium)

Selenium is great, but Playwright is faster and more stable.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
from playwright.sync_api import sync_playwright
from bs4 import BeautifulSoup

chrome_profile_path = "C:/Users/YourUser/AppData/Local/Google/Chrome/User Data"

with sync_playwright() as p:
    browser = p.chromium.launch_persistent_context(
        user_data_dir=chrome_profile_path,
        headless=False
    )
    page = browser.new_page()
    
    page.goto("https://example.com/search-results")

    soup = BeautifulSoup(page.content(), "html.parser")
    print(soup.prettify())

    browser.close()

โœ… Much faster than Selenium
โœ… Uses your existing Chrome session
โœ… Handles JavaScript


๐Ÿ”” How to Detect Changes & Alert a Human

Once you scrape the site, how do you detect changes?

Option 1: Compare Old vs. New Results

1
2
3
4
5
6
old_results = open("previous_results.txt").read()
new_results = str(soup)

if old_results != new_results:
    print("โš ๏ธ ALERT! Something changed!")
    open("previous_results.txt", "w").write(new_results)

Option 2: Git-Based Tracking

  • Store search results in text files.
  • Use Git to track changes.
  • When Git detects a change, it triggers an email or Slack alert.

How Does BeautifulSoup work?

1
soup = BeautifulSoup(page.content(), "html.parser")

Breaking It Down:

  1. page.content()

    • Retrieves the HTML content of the webpage from Playwright.
    • If using requests, you’d use response.text instead.
  2. BeautifulSoup(..., "html.parser")

    • BeautifulSoup is a library for parsing HTML and XML.
    • "html.parser" tells BeautifulSoup to use Pythonโ€™s built-in HTML parser, which is fast and doesnโ€™t require extra dependencies.

What It Does

  • It converts the raw HTML from the webpage into a structured, searchable format.
  • Now, you can easily extract data from the page using soup.find(), soup.select(), etc.

Example Usage

Before Parsing (Raw HTML)

1
2
3
4
5
6
<html>
  <body>
    <h1>Hello, World!</h1>
    <p class="message">This is a test.</p>
  </body>
</html>

After Parsing (Using BeautifulSoup)

1
2
3
4
5
6
7
8
from bs4 import BeautifulSoup

html = "<html><body><h1>Hello, World!</h1><p class='message'>This is a test.</p></body></html>"
soup = BeautifulSoup(html, "html.parser")

# Extract text
print(soup.h1.text)  # Output: Hello, World!
print(soup.p.text)   # Output: This is a test.

๐Ÿš€ Key Ideas

๐Ÿ† If the site is simple โ†’ requests + BeautifulSoup

๐ŸŽ๏ธ If the site has a login โ†’ requests.Session or use Chrome cookies

๐ŸŽญ If the site is JavaScript-heavy โ†’ Selenium or Playwright with Chrome Profile

๐Ÿ“ฃ If you need alerts โ†’ Store results & compare old vs. new