I recently set up urlwatch to alert me if some web pages I'm interested in are changed. It has a nice pushbullet integration and is pretty easy to set up. Too easy in fact. Pro tip, after configuring your preferred notification service and setting
enabled: true you're done. I spent a while faffing about thinking there had to be more to it. There isn't.
What I found however is that one of the pages I was monitoring had a dynamically generated
<script> tag in it which was triggering spurious notifications I wanted to suppress. There didn't seem to be an obvious way to ignore particular tags so I created a simple hook to do this.
from urlwatch import filters
from urlwatch import jobs
from urlwatch import reporters
from bs4 import BeautifulSoup
__kind__ = 'ignore'
def filter(self, data, subfilter=None):
if subfilter is None:
soup = BeautifulSoup(data, 'html.parser')
for element in soup.select(subfilter):
This adds a new filter type called
ignore which accepts a CSS selector as parameter. It then uses the magical BeautifulSoup HTML parser to find all the elements which match the selector and remove them before returning the remaining HTML.
Urlwatch then does its normal comparison against the previous run to see if anything has changed and carries as usual.
To use the filter update your config like so altering the CSS selector to suit your needs.
$ urlwatch --edit
name: "some site"
filter: "ignore:body > script:nth-of-type(2)"
This ignores the second
<script> tag beneath the