šŸ•µļø Investigating Suspicious Bot Traffic
Adam C. |

Over the past few days, we noticed a strange spike in real-time traffic on Swim Standards. Most of it was coming from unusual locations — like Bayingolin Mongol Autonomous Prefecture and Jiaxing, China — places that aren’t exactly our core audience for U.S. swimming.

Photo by Alex Knight on Unsplash

At first, it looked like standard Google Analytics spam, but it turned out to be much more real — and much more annoying.

šŸ” Tracing the Source

To investigate, I started by checking our Apache access logs directly on the server. I ran the logs through iplocate to get basic geoIP info, and then grepped for "China" to see just how much of our traffic was coming from there:

grep -i "China" /var/log/apache2/access.log

It quickly became clear: a huge chunk of traffic was coming from Chinese IPs, and not from real users. They were hitting all types of pages — not just rankings — and many of the requests were being made by Sogou's bot, among others.

🌐 Blocking at the Edge

Once I confirmed it wasn’t ghost spam (it was hitting our actual server), I turned to Cloudflare. Since we’re using the free plan, I set up a basic firewall rule to block all traffic from China:

Field: ip.geoip.country
Operator: equals
Value: CN
Action: Challenge

After activating the rule, the impact was immediate:

Real-time GA4 traffic dropped

Cities like Jiaxing and Bayingolin disappeared from the feed

Suspicious pageviews and scroll events stopped firing

This confirmed what I suspected — it was non-human bot traffic, hammering the site and triggering engagement events in GA4.

šŸ¤– What Were These Bots Doing?

From what I can tell, it wasn’t ad fraud or brute-force crawling of specific endpoints. It looked more like generic web scraping:

Crawlers hitting public pages to index swimmer data, times, and results

Possibly collecting full-page HTML to republish or analyze elsewhere

Ignoring robots.txt, or at least not respecting it

Sogou’s crawler in particular is known for being overly aggressive, and it was responsible for a noticeable percentage of these requests.

šŸ“‰ Why It Matters

Even though this traffic wasn’t actively attacking anything, it was:

Skewing our analytics (engagement time, scroll depth, session count)

Possibly inflating ad impressions (which can trigger invalid traffic flags)

Wasting server resources on non-human visits

In other words, it wasn’t malicious — but it wasn’t helpful either.

āœ… The Fix (So Far)

Here’s what I’ve done:

Blocked all traffic from China using Cloudflare firewall rules

Identified Sogou bot as a major culprit

Verified the drop in GA4 traffic after blocking

There’s no need to overcomplicate this with bot score thresholds or headless browser detection. In this case, a simple country block solved the problem.

🧠 Takeaways

If you’re seeing weird cities in GA4 Realtime (like Bayingolin or Jiaxing), it’s probably real bot traffic, not GA spoofing.

Use your Apache/Nginx logs + geoIP lookup to confirm.

Cloudflare’s free firewall rules are good enough for most bot filtering use cases.

Don’t assume ā€œpage_viewā€ and ā€œscrollā€ in GA4 mean human — bots can trigger them too.

I’ll keep monitoring for new patterns, but for now, blocking China has dramatically cleaned up our metrics and reduced unnecessary load on the server.

If you're running a site and seeing similar anomalies, don’t ignore them — even if it’s just bots, they can mess with your data and slow down your site.