Messing with Scraper Bots

HermanMartinus | 35 points

If you control your own Apache server and just want to shortcut to "go away" instead of feeding scrapers, the RewriteEngine is your friend, for example:

      RewriteEngine On

      # Block requests that reference .php anywhere (path, query, or encoded)
      RewriteCond %{REQUEST_URI} (\.php|%2ephp|%2e%70%68%70) [NC,OR]
      RewriteCond %{QUERY_STRING} \.php [NC,OR]
      RewriteCond %{THE_REQUEST} \.php [NC]
      RewriteRule .* - [F,L]
Notes: there's no PHP on my servers, so if someone asks for it, they are one of the "bad boys" IMHO. Your mileage may differ.
jcynix | an hour ago
s0meON3 | 44 minutes ago

Neat! Most of the offensive scrapers I met try and exploit WordPress sites (hence the focus on PHP). They don't want to see php files, but their outputs.

What you have here is quite close to a honeypot, sadly I don't see an easy way to counter-abuse such bots. If the attack is not following their script, they move on.

ArcHound | an hour ago

Interesting! It's nice to see people are experimenting with these, and I wonder if this kind of junk data generators will become its own product. Or maybe at least a feature/integration in existing software. I could see it going there.

localhostinger | an hour ago

Hm.. why not using dumbed down small, self-hosted LLM networks to feet the big scrapers with bullshit?

I'd sacrifice two CPU cores for this just to make their life awful.

NoiseBert69 | an hour ago

[dead]

leadgrids | 40 minutes ago