Reading Time: 6 minutes

I may have been feeling too clever by half when I finally felt like I had my ActivityPub connection within WordPress sorted out. The first day of a new post rolled around (a new post will trigger awareness of the ActivityPub connector) and I saw the post appear as a Mastodon post. Then one of the folks who follows this blog let me know that something wasn’t quite right. The WordPress post that I had seen was now showing up as a JSON file. This is how I tried to figure out why.

I think probably everyone knows about caching now. When we visit a website and our web browser requests a view of the website’s content, the files required to present that view are downloaded to our computer. Unless you have made changes, the next time you click on a page from that website, it will only download new or changed files. The other ones remain on your computer. In a cache.

As soon as the person let me know about the unexpected, unwanted behavior of the web page, I duplicated it. And I immediately thought about caching, because the URL for the web page and the URL for the JSON file were identical. That meant that a request for that URL was pulling two different pieces of information. It suggested to me that the JSON file, which was generated because of ActivityPub, had somehow overwritten a cached copy of the actual WordPress post (HTML and not JSON).

But where? It wasn’t on the local computer cache. If you have ever been unsure if you have the latest copy of a file, you can usually force an entirely new copy by using CTRL + F5 or SHIFT + F5 (both Chrome and Edge suggest SHIFT but I have seen CTRL F5 work on Chrome). Doing that just retrieved another copy of the JSON file. NOT the web page.

Now that I’ve thought about this, I wonder if SHIFT F5 is preferable because people were CTRL F4’ing and closing their tab, rather than CTRL F5’ing to refresh it. F4 and F5 can be adjacent and poorly marked. On laptops, the buttons may be inverted so that you have to hold an FN key to use a function key as a function key, rather than brightening the screen or muting the sound.

Layers of Cache

Most websites use their own caching systems. They may also use content delivery networks that create additional copies and can place them on faster servers and closer (fewer hops) to the people requesting them. In bigger organizations, like a university, they may use edge caching for resources frequently requested by students or faculty. This can complicate finding out what system is caching what content when you have something you don’t want cached.

A simple chart showing how a visitor's request from a web browser might traverse cache locations.  The visitor's web browser is on the left.  The request then passes through an edge cache location, Cloudflare's cache, with a potential detour to the Internet Archive, and then through another edge or web server cache.  Finally, at the right, it might hit a WordPress cache.
A simple chart showing how a visitor’s request from a web browser might traverse cache locations.

One thing you can do is purge your cache. On your local web browser, you can go into the settings and clear the cache. But if a SHIFT or CTRL + F5 didn’t make a difference, that’s probably not the solution. In that case, you need to find the server cache that is holding the object and purge that cache. Cloudflare has a purge cache button and other caches have similar purge features.

In my case, I have a web site cache and I also use Cloudflare to keep my content available if my website server goes offline. This isn’t quite a content delivery network (Cloudflare’s CDN costs money and I use their free plan) but it is a backup. Cloudflare will cache a copy on their servers as requests are made from my website. This means that many requests for my website do not actually come from my web server; Cloudflare is providing the copy.

A chart showing a stacked graph over 30 days.  The darker area at the bottom shows files delivered from Cloudflare's cache.  The lighter blue area above it shows files delivered from my webserver.
A chart showing Cloudflare traffic. There has been nearly 12 gigabytes of traffic and Cloudflare has served over 3.5 gigabytes from its cache.

There is a more solid backup as well provided by Cloudflare. It is coordinated with the Internet Archive so that, in theory, if Cloudflare senses my website server is not internet accessible, it will pull a copy of the equivalent file (the last copy) from the Internet Archive.

When you enable Always Online with Internet Archive integration, Cloudflare shares your hostname and popular URL paths with the archive so that the Internet Archive’s crawler stores the pages you want archived. When submitting targets to the crawler, Cloudflare identifies the most popular URLs found among GET requests that returned a 200 HTTP status code in the previous five hours.

Note that Cloudflare does not save a copy of every page of your website, and it cannot serve dynamic content while your origin is offline. If the requested page is not in the Internet Archive’s Wayback Machine, the visitor sees the actual error page caused by the offline origin web server.

Cloudflare Developer Docs “Always Online

You can see the impact. My website has been at this URL for over 25 years (since 1997) and the Internet Archive has been archiving it since 1998. But I turned on the integration between Cloudflare and the Internet Archive in the last few years and the archiving is denser at that end of the chart.

A screenshot of the Internet Archive Wayback Machine dashboard showing a chronology.  The denser black lines at the far right show more intense archiving than the more sporadic archiving at the left.
A screenshot of the Internet Archive Wayback Machine dashboard showing a chronology. The denser black lines at the far right show more intense archiving than the more sporadic archiving at the left.

Here’s one thing I learned, though. Unless you proactively force it to cache your HTML, Cloudflare does not normally cache that. It mostly focuses on images, media, and larger files. It means that the larger, static files—including cascading style sheets and javascripts—will be cached but the actual blog post words won’t.

This meant that, wherever the cache issue was occurring with this JSON file, it wasn’t on Cloudflare. It must be on my web server somewhere.

WordPress Cache Plugins

I enjoy technology and like to experiment. But I also know my limitations. So I opened a ticket on the WordPress site with the ActivityPub plugin developer. I’ve noticed at least one other ticket since mine that had the identical problem. I wasn’t too worried about being an outlier—it’s not really likely that I’m the only one to experience an issue, even though there are folks who feel like they always find the bugs—but it helps to know that additional tickets might bring greater clarity or improvements to the plugin.

One of the first things I did when the JSON issue arose was to open my WordPress dashboard and empty the cache files collected by WordPress. If you are keeping count, this is the third or fourth (local browser, Cloudflare, Internet Archive, WordPress) cache location. But I knew I’d need to empty it if the JSON was in cache somewhere and it wasn’t in the other three locations.

I use WP-Optimize to provide some speed improvements for my website. I use it to clean up and compress the backend database, to post-load javascripts and to preload website pages. Since this was not one of the caching plugins the ActivityPub developer identified, I decided to keep the WP-Optimize plugin for other purposes but turn off its caching.

Instead, I added the WP-Super-Cache which is a plugin, like JetPack, from the developer of WordPress itself, Automattic. It recommends against some of the caching steps I was taking (like using minify) and creates full, compressed pages of every blog post. So far as I understand it, it does not cache any JSON. It means that, going forward, I should not have a JSON file sharing the same URL as a blog post and overlaying it in the cache.

I have posted two more blog posts since that first issue and have not experienced the same problem again. They appear properly in Mastodon and no JSON gets in the way of the web browser visitors. I cannot say I am thrilled with WP-Super-Cache though. I use Pagespeed to test my site’s web speed and responsiveness and I have seen a drop in the metrics. The performance number has dropped from 97 to the 70s. So I think I have more work to do on the whole caching issue. As another poster to the ActivityPub site noted, turning the plugin off and on seemed to fix the JSON problem. I am considering whether to return to using WP-Optimize to cache my content and then just toggle the ActivityPub plugin to get around any future JSON caching.