Over the past few days, we noticed a strange spike in real-time traffic on Swim Standards. Most of it was coming from unusual locations ā like Bayingolin Mongol Autonomous Prefecture and Jiaxing, China ā places that arenāt exactly our core audience for U.S. swimming.
At first, it looked like standard Google Analytics spam, but it turned out to be much more real ā and much more annoying.
To investigate, I started by checking our Apache access logs directly on the server. I ran the logs through iplocate
to get basic geoIP info, and then grepped for "China"
to see just how much of our traffic was coming from there:
grep -i "China" /var/log/apache2/access.log
It quickly became clear: a huge chunk of traffic was coming from Chinese IPs, and not from real users. They were hitting all types of pages ā not just rankings ā and many of the requests were being made by Sogou's bot, among others.
Once I confirmed it wasnāt ghost spam (it was hitting our actual server), I turned to Cloudflare. Since weāre using the free plan, I set up a basic firewall rule to block all traffic from China:
Field: ip.geoip.country
Operator: equals
Value: CN
Action: Challenge
After activating the rule, the impact was immediate:
Real-time GA4 traffic dropped
Cities like Jiaxing and Bayingolin disappeared from the feed
Suspicious pageviews and scroll events stopped firing
This confirmed what I suspected ā it was non-human bot traffic, hammering the site and triggering engagement events in GA4.
From what I can tell, it wasnāt ad fraud or brute-force crawling of specific endpoints. It looked more like generic web scraping:
Crawlers hitting public pages to index swimmer data, times, and results
Possibly collecting full-page HTML to republish or analyze elsewhere
Ignoring robots.txt
, or at least not respecting it
Sogouās crawler in particular is known for being overly aggressive, and it was responsible for a noticeable percentage of these requests.
Even though this traffic wasnāt actively attacking anything, it was:
Skewing our analytics (engagement time, scroll depth, session count)
Possibly inflating ad impressions (which can trigger invalid traffic flags)
Wasting server resources on non-human visits
In other words, it wasnāt malicious ā but it wasnāt helpful either.
Hereās what Iāve done:
Blocked all traffic from China using Cloudflare firewall rules
Identified Sogou bot as a major culprit
Verified the drop in GA4 traffic after blocking
Thereās no need to overcomplicate this with bot score thresholds or headless browser detection. In this case, a simple country block solved the problem.
If youāre seeing weird cities in GA4 Realtime (like Bayingolin or Jiaxing), itās probably real bot traffic, not GA spoofing.
Use your Apache/Nginx logs + geoIP lookup to confirm.
Cloudflareās free firewall rules are good enough for most bot filtering use cases.
Donāt assume āpage_viewā and āscrollā in GA4 mean human ā bots can trigger them too.
Iāll keep monitoring for new patterns, but for now, blocking China has dramatically cleaned up our metrics and reduced unnecessary load on the server.
If you're running a site and seeing similar anomalies, donāt ignore them ā even if itās just bots, they can mess with your data and slow down your site.