Reading Time: 9 minutes
I have been reivisiting how I secure my website. Over the years, I’ve dabbled with geo-blocking but it’s a lost cause. They are easy to bypass. I have done too much research on Russian websites that are geoblocked to know that you can’t keep determined people out. The ActivityPub connectivity I set up has highlighted that joining the Fediverse is to join the wider world. One of the things I stumbled on in my own setup was firewalling out the federated servers that would retrieve my content. It’s meant I’ve been reworking my firewall rules on Cloudflare.
Allow me a moment to give a bit of context. As I’ve discussed on other posts, you can stop bad actors at a bunch of places. My site uses WordPress. So I can stop them when they try to access WordPress itself, which is a lot of the activity. Or I can stop them when they hit the server on which WordPress is running, by using .htaccess files or other filtering tools on my website hosts CPanel: hotlink blocks, IP blocks, and so on.

My preference though is to stop the undesirable activity before it even gets to my website server. Why tax my server any more than I need to? Also, a lot of these blocking tools can have unintended consequences. I want people to access my images so a hotlink block might inhibit a subset of that audience but would do it at the expense of the entire audience since it can stop Google Images and Bing Images from indexing your photos.
This means leaning on Cloudflare’s firewall. I use their free plan which comes with a whole variety of useful tools. The one I’ve been tweaking has been the security area or web application firewall (WAF). It has rules that can stop bad behavior early.
The reason that I explain the context, though, is that if you are not stopping something at the Cloudflare end, it’s still getting to your server. It may be getting all the way to the server app—WordPress or your RSS feed reader server or your photo albums—too. So although you can see what is getting stopped at Cloudflare within its dashboard, you will need to look at your web server logs to see what is not being stopped.
Interrogating the Web Server
The biggest issue seems to be sites prodding WordPress to try to login or access resources that they can exploit. It is one reason I started to geo-block, since these inquisitive requests were coming from a regular subset of nations. But, as I found with more direct geo-blocking, it’s easy enough to bypass that sort of restriction. A country block also inhibits everyone, good or bad, from accessing your site. I used the Managed Challenge (think captcha) so that only the automated bots were halted entirely. I don’t like the need for a managed challenge, though.
There is one caveat. I use Cloudflare to redirect visitors from Russia to a particular URL. I think geo-location is still useful for content redirection (making it more relevant) but not for blocking access.
I downloaded a recent website activity log so that I could poke around in it to see what additional changes I could make. My goal was to shift from blocking IP addresses and countries and reverting to blocking URLs or user agents that might be more shape shifting.
If you haven’t looked at a website log, its filled with all of the requests tat have been made to your website. It will have IP addresses (that you can convert to greater detail using a WHOIS tool like Arin.net), dates and times, what was requested, whether it was found, that sort of thing.
I used to use more dedicated web log readers but Excel is actually not bad at working with a standard log file. You may need to open it manually (its extension may not be one that Excel will know it can open) but it’s just a text file. If you open it as a delimited file, with the space as the delimiter, and multiple delimiters treated as one, you’ll get a workable worksheet.
Once I’d opened it in Excel, I could set a filter on the response column, like you would do with any Excel worksheet. The response column tells me which files were successfully retrieved (200s and 300s) and which weren’t (400s and 500s). I could filter down just to the 404s, for example. When you see a 404 error in your web browser, this is just a representation of this response from the server. And I found a single domain that was querying a bunch of resources that aren’t on my website or server.

In the past, I might have tried to isolate the IP address but it wouldn’t have made much difference. It is much more effective to use a Cloudflare rule to block access to these non-existent folders—/static/, /content/, /images/, /dsniii/, and so on—so that the requests are shut down immediately. That way, I don’t need to know about the other IP addresses and try to block them as well.
The other thing I noticed is that, with the geo-blocks and IP blocks removed, I was getting a lot of Fediverse traffic that I hadn’t expected. There were the Friendica servers ….

and all of the Mastodon servers …

One thing that I noticed with the Fediverse activity was the wide range of user agents. Most of them were really good about displaying who they were (Mastodon, Friendica, etc.). Some were not. And unfortunately, some of those user-agents appeared both as friendly requests (asking for normal resources) and unfriendly (scraping or probing).
The important thing is to know about user-agents. Your browser reports a user-agent of Mozilla or Chrome or Safari when you visit a website. User-agent is how website analytics like Google Analytics or Matomo can tell you the type of device or web browser your visitors use. And user-agent is a filter you can use in a Cloudflare firewall.
I also noticed that a bunch of legal information harvesters—LexBlog and the LexBlog-powered Open Legal Blog Archive, Vable, etc.—were successfully accessing content. LexBlog had not been for about 6 weeks so I’m wondering if I’d inadvertently been blocking them.

All in all, I didn’t find anything I was particularly worried about. There were a few people slipping through who I didn’t want but they had left enough information for me to focus on blocking them in the future.
Reworking the Cloudflare Firewall Rules
I’ll repeat here that I turned off the IP- and country-based Cloudflare firewall rules a couple of days ago. I left the ones that were focused on behavior and they seemed to continue to trap bad requests. Now that I had reviewed the log files for the subsequent days, I was confident that what had gotten through was mostly acceptable. In other words, turning off the two types of rules hadn’t had a negative impact.

I did a lot of small tweaks but I’ll mention the one that is probably most useful to others. The first rule, as you can see above, focuses on unwanted WordPress login requests. I have been learning more about how I can use Cloudflare’s rules. It made me realize that I was splitting the work across Cloudflare and my website server in ways that were inefficient.
A lot of WordPress sites will have a login limiting plugin (I use Limit Login Attempts Reloaded). But increasingly I see the plugin as suspenders (as in belt-and-suspenders). I also use my .htaccess to restrict logins to specific IP addresses. So there really shouldn’t be a way for someone to accidentally reach the login page.
I mentioned earlier it is important to know about user-agents. I use user-agent filters extensively on my other Cloudflare rules. It’s an easy way to exclude bots or automated tools like curl or python which are not people. User-agent rules can also exclude OpenAI and ChatGPT harvesters.
In fact, if I look at my Limit Login Attempts plugin log (available inside the WordPress dashboard), it shows that the only time the plugin has been tripped was when I tripped it myself or lowered the security further out from the WordPress site.
The belt was the .htaccess file that allowed me to limit access to my home IP address, which is where I blog from and work on the site. This approach means that people can attempt to access the login page, though, and the server will have to deal with the request. I could tell some times that someone had made an attempt because the web server would cache the errored out login page. So I’d have to pull a new version of the page to replace the cached one so that I could log in.
I know now that I can have Cloudflare do the IP address controls as well as the other protection. This is what my rule looks like (you’ll need to put your own IP address or range in there). The rule is set to block anyone who is NOT in the IP range specified who ALSO asks for one of those URL paths (I’ve put them in bold to make them easier to read):
(not ip.src in {your_ip_address_or_range}) and ((http.request.uri.path contains "wp-login.php") or (http.request.uri.path contains "/xmlrpc.php") or (http.request.uri.path contains "/wp-admin/") or (http.request.uri.path contains "webmail") or (http.request.uri.path contains "/login") or (http.request.uri.path contains ".env") or (http.request.uri.path contains "/admin") or (http.request.uri.path contains "/install.php") or (http.request.uri.path contains "wlwmanifest.xml"))
Most of these are WordPress related but I’ve thrown in a couple that just seem to be regular requests (why xmlrpc is included and why wp-admin is). The ones for webmail or webdisk or .env just annoyances; they are services I’ve disabled on the web server anyway.
The benefit curse of the Cloudflare rules is that they are near instantaneous. So do not set up a new rule and turn it on and then leave for the weekend. #NoChangeFriday Once you turn it on, take a look at it and see if something immediately is colliding with it that you didn’t intend. And go back over the next few days to double check. Also, look again at your web logs and see if you’re actually seeing a difference.
For example, one of my rules has caught requests from a LexisNexis news harvesting service (Moreover):

I may or may not change that access. It’s sort of like Facebook …

who uses an external hit bot to poke at certain URLs or the Xitterbot that is trying to identify content for a site that I no longer want to support …

The primary goal for me is to secure my site against bad actors. The Cloudflare rules do a better job by focusing on the content than me trying to play whac-a-mole with countries. Here’s an example where automated attempts occured to use the WordPress login to register a new user over about the period of an hour:

You should disable that function in your WordPress site anyway. Our law library had dozens of spam-registered accounts that we cleared out when we were tightening up our own website security. But in addition to turning off the function on the WordPress dashboard, using a Cloudflare rule to enforce a limit on who can access the /wp-login page can stop those requests before they get to your website.
The additional benefit to fine-tuning these rules is that I freed up two of my 5 free rules on Cloudflare. They were not doing what I needed done or were doing it inefficiently. It was really useful to learn that as it means my current rules are working better than they had been before. And it means that I am also that much more aware of where a problem might arise if the connectivity I’m hoping to enable between my WordPress site and the Fediverse stops working.