So, you’ve got a website.
It’s not pretty- or perhaps not complex, but it does the job.
I use Hugo for making this blog, and it has this weird behavior- if it tries to parse my markdown and the YAML has something it doesnt like, it prints a warning and keeps on going.
Many times- i have pushed the site to the server, but I have missing pages.. Maybe I edited the yaml for an article and missed a quote or something.. and now my article is not on the site..
Maybe you have a blog or an admin panel, a dashboard, or just a place where your backend magic happens.
And you, like me, would like to just quickly see if the basic text on the page exists like it did last time..
In this situation, we don’t need a full-blown test suite with Selenium, headless browsers, and a million dependencies…
We just want to know:
- Do the pages load?
- Is the content still what you expect?
- Can I test this quickly without spending three days setting up a test framework?
Yes. Yes, you can.
Enter Playwright, a slick automation tool that lets you scrape, test, and interact with web pages with minimal fuss. We’ll write a dead-simple Python script to:
- Visit a webpage.
- Save its text content.
- Store the content in a directory.
- Compare it with a baseline using a simple
diff
command.
If something changes, you’ll know immediately. If everything’s the same, you can go back to sipping your coffee.
The Magic Script: SavePageText.py
Here’s the whole deal:
|
|
How It Works
- Run it with a URL and (optionally) a subdirectory name.
- It grabs the text from the page and saves it in
./testing/{subdir}/
. - If you don’t specify a subdirectory, it makes one using the current date and time.
Running the Script
On Windows:
|
|
On Linux:
|
|
It saves the page’s text in ./testing/base/example_com.txt
. Run it for all your important pages, and you now have a snapshot of what they should look like.
Checking for Changes
Later, run the script again with a different subdirectory (or let it default to a timestamped one). Then, compare the results:
Windows (using fc
):
|
|
Linux (using diff
):
|
|
If there’s a difference, you’ll see it. If everything’s the same, no output means all is well.
Why This is Awesome
- No need for complex test frameworks.
- No waiting for UI elements to load.
- No need for an entire Selenium setup.
- Just a quick way to make sure your site isn’t broken.
It’s not fancy, but it gets the job done.
Happy hacking! 🚀
Key Ideas
Concept | Explanation |
---|---|
Playwright for Testing | Using Playwright to grab webpage text |
Simple Web Scraping | Extracting page content without a full test framework |
Quick Snapshot | Saving web page content for later comparison |
Command Line Diffing | Comparing files using fc (Windows) or diff (Linux) |
Low Ceremony Testing | Testing without setting up a huge test framework |