HNPWA with Next.js

Ask HN: How to stop an AWS bot sending 2B requests/month?

lgats | 282 points

> I've tried 30X redirects (which it follows)

301 response to a selection of very large files hosted by companies you don't like.

When their AWS instances start downloading 70000 windows ISOs in parallel, they might notice.

Hard to do with cloudflare but you can also tar pit them. Accept the request and send a response, one character at a time (make sure you uncork and flush buffers/etc), with a 30 second delay between characters.

700 requests/second with say 10Kb headers/response. Sure is a shame your server is so slow.

AdamJacobMuller | 4 days ago

Making the obviously-abusive bot prohibitively expensive is one way to go, if you control the terminating server.

gzip bomb is good if the bot happens to be vulnerable, but even just slowing down their connection rate is often sufficient - waiting just 10 seconds before responding with your 404 is going to consume ~7,000 ports on their box, which should be enough to crash most linux processes (nginx + mod-http-echo is a really easy way to set this up)

swiftcoder | 4 days ago

Main author of Anubis here. Have CloudFlare return a HTTP 200 response instead of a rejection at non-200. That makes the bots stop hammering until they get a 200 response.

xena | 4 days ago

I had this issue on one of my personal sites. It was a blog I used to write maybe 7-8 years ago. All of a sudden, I see insane traffic spikes in analytics. I thought some article went viral, but realized it was too robotic to be true. And so I narrowed it down to some developer trying to test their bot/crawler on my site. I tried asking nicely, several times, over several months.

I was so pissed off that I setup a redirect rule for it to send them over to random porn sites. That actually stopped it.

neya | 4 days ago

Return a 200 with the EICAR test string in the body. Nothing like some data poisoning for some vindictive fun

https://en.wikipedia.org/wiki/EICAR_test_file

yabones | 4 days ago

Do you receive, or expect to receive any legitimate traffic from AWS Singapore? If not, why not blackhole the whole thing?

bigfatkitten | 4 days ago

Singapore's comms regulator bans porn (even possessing it), serve up some softcore to the bot, e-mail the regulator and AWS.

scrps | 4 days ago

If it follows redirects, have you tried redirecting it to its own domain?

MrThoughtful | 4 days ago

Tell cloudflare it's abusive, and they will block it outside your account so it doesn't count against you.

jedberg | 4 days ago

I had a similar problem back in 2018, though at a smaller scale.

I wrote a quick-and-dirty program that reads the authoritative list of all AWS IP ranges from https://ip-ranges.amazonaws.com/ip-ranges.json (more about that URL at the blog post https://aws.amazon.com/blogs/aws/aws-ip-ranges-json/), and creates rules in Windows Firewall to simply block all of them. Granted, it was a sledgehammer, but it worked well enough.

Here's the README.md I wrote for the program, though I never got around to releasing the the code: https://markdownpastebin.com/?id=22eadf6c608448a98b6643606d1...

It ran for some years as a scheduled task on a small handful of servers, but I'm not sure if it's still in use today or even works anymore. If there's enough interest I might consider publishing the code (or sharing it with someone who wants to pick up the mantle). Alternatively it wouldn't be hard for someone to recreate that effort.

G'luck!

rkagerer | 3 days ago

I redirect such traffic to a subdomain with an IP address that isn't assigned (or legally assignable). The bots just wait for a response to connection requests but never gets them. This seems to typically cost 10s waiting. The traffic doesn't come to my servers and it doesn't risk legitimate users who might hit it by mistake.

bcwhite | 4 days ago

I ran into a similar situation a couple of years ago. It wasn't at the scale you describe, but it was an absurd number of requests for a ~80 MB software installer. I ended up redirecting the offending requests to a file named "please-stop.txt" that contained a short note explaining what was happening and asking them to stop. A short time later they did.

geraldcombs | 3 days ago

> Thankfully, CloudFlare is able to handle the traffic with a simple WAF rule and 444 response to reduce the outbound traffic.

This is from your own post, and is almost the best answer I know of.

I recommending you configure a Cloudflare WAF rule to block the bot - and then move on with your life.

Simply block the bot and move on with your life.

stevoski | 4 days ago

I am dealing with a similar situation and kinda screwed up as I managed to get Google Ads suspended due to blocking Singapore. I see a mix of traffic from AWS, Tencent and Huawei cloud at the moment. Currently Im just scanning server logs and blocking ip ranges.

locusm | 4 days ago

A 100% legal solution is to sue them and name Amazon as a party in the lawsuit.

Through discovery you can get the name of the parties involved from Amazon, but Amazon is very likely to drop them as a client solving the issue.

Retric | 3 days ago

Do you have any legitimate traffic coming from AWS? My thought is to just drop all traffic from their ASN. Once they can't contact you for a while they'll move along and you could unblock.

pickle-wizard | 3 days ago

As others have suggested you can try to fight back depending on the capabilities of your infrastructure. All crawlers will have some kind of queuing system. If you manage to cause for the queues to fill up then the crawler wont be able to send as many requests. For example, you can allow the crawler to open the socket but you only send the data very slowly causing the queues to get filled quickly with busy workers.

Depending on how the crawler is designed this may or may not work. If they are using SQS with Lambda then that will obviously not work but it will fire back nevertheless because the serverless functions will be running for longer (5 - 15 minutes).

Another technique that comes to mind is to try to force the client to upgrade the connection (i.e. websocket). See what will happen. Mostly it will fail but even if it gets stalled for 30 seconds that is a win.

_pdp_ | 4 days ago

Maybe add this IP to a blacklist? https://iplists.firehol.org/ It would be easier to pressure AWS when it is there

molszanski | 4 days ago

Dumb question but just cuz I didn’t see it mentioned have you tried using a Disallow: / in your robots.txt? Or Crawl-delay: 10? That would be the first thing I would try.

Sometimes these crawlers are just poorly written not malicious. Sometimes it’s both.

I would try a zip bomb next. I know there’s one that is 10 MB over the network and unzips to ~200TB.

n_u | 4 days ago

Just find a Hoster with low traffic egress cost, reverse proxy normal traffic to Cloudflare and reply with 2GB files for the bot, they annoy you/cost you money, make them pay.

Scotrix | 4 days ago

In addition to whatever other mitigations you do, you should put a deny rule for the bot's user-agent in robots.txt, and use a status code of 429 (Too Many Requests), even if the bot doesn't respect these. This will strengthen your case if you need to convince a third party (AWS, or a court, or a different part of the company that's operating the bot) that it's abusive.

jimrandomh | 3 days ago

Blocking before the traffic reaches the application servers (what you're doing) is the most effective and cost/time efficient.

It sounds like the bot operator is spending enough on AWS to withstand the current level of abuse reports.

If you really wanted to retaliate, you could try getting a warrant to force AWS to disclose the owners of that AWS instance.

Rothnargoth | 4 days ago

You don't even need to send a response. Just block the traffic and move on

Jean-Papoulos | 4 days ago

An idea I had was a custom kernel that replied ACK (or SYN+ACK) to every TCP packet. All connections would appear to stay open forever, eating all incoming traffic, and never replying, all while using zero resources of the device. Bots might wait minutes (or even forever) per connection.

bcwhite | 4 days ago

if it follows redirect, redirct him to a 10gb gzip bomb

shishcat | 4 days ago

If it follows the redirect I would redirect it to random binary files hosted by Amazon, then see if it continues to not require any further action

theginger | 4 days ago

CloudFlare page rule or similar to a custom internal URL with the max request timeout jacked up as high as possible (or whatever) set, stick a little async web server behind it that hangs every request after the first byte for say.. 1 hour. Give the aync web server a good chunk of RAM to waste. Most providers don't bill for time, only bytes, and most bots have some timeout tolerance, especially when the status headers and body are already being sent

Similarly, you can also try delivering one byte every 10 seconds or 30 seconds or whatever keeps the client on the other end hanging around for without hitting an internal timeout.

    for char in itertools.repeat(b"FUCKOFF"):
        await resp.send(char)
        await resp.flush()
        await asyncio.sleep(10)
        # etc

In the SMTP years we called this tarpitting IIRC

g-mork | 4 days ago

Block the AWS IP ranges. You will have reasonably good results blocking all datacenter ranges - cloud providers, VPSs etc., if you don't expect traffic from them. You can get the ranges from Udger (paid) and it isn't very bad w.r.to false positives. Alternatively just whitelist expected regions and block everything else. More false positives prone, but easier.

tushar-r | 3 days ago

I wrote about this a few weeks ago, because it really is quite insane.

I wish AWS would curtail abuse from their networks. My hope is to build some tools to automate detection and reporting of this sort of abuse, so we can force it into AWS's court.

https://wxp.io/blog/abuse-from-amazon-ip-networks-never-end

lucastech | 3 days ago

We've seen tons of illegitimate traffic emanating from SG. So much so, that it is a part of the standard WAF country block (along with CN).

1a527dd5 | 3 days ago

redirect it to the client ip, not abuse since you're just an innocent redirect to client-ip service and the (most probable) timeout should consider the service dead after a couple of days or even better they just overload their own servers if there is a page on the client ip or even better is that it causes automatic abuse trigger to kick in and shut down the service.

kachapopopow | 4 days ago

Hire a lawyer and have him send the bill for his services to them immediately with a note on the consequences of ignoring his notices. Bill them aggressively.

giardini | 4 days ago

'Mozilla/5.0 (compatible; crawler)'

Assuming one trusts the user-agent in this case one could reduce the traffic reply to them and avoid touching the disk or any applications in Nginx with something like:

    if ($http_user_agent ~ (crawler|some-other-bot) ) { return 200 '\n\n\n\nBot quota exceeded, check back in 2150 years.\n\n\n\n'; }

There are other variables to look for to see if something is a bot but such things should be very well tested. $http_accept_language, $http_sec_fetch_mode, etc...

I don't use CF but maybe they have a way to block the entire ASN for AWS on your account assuming one does not need inbound connections from them. I just blackhole their CIDR blocks [1] but that won't help someone using a CDN.

[1] - https://ip-ranges.amazonaws.com/ip-ranges.json

Bender | 3 days ago

[deleted]

| 4 days ago

update: thanks for all the suggestions

I decided to do some testing with redirecting to a small vps that just keeps the connections open and sends a byte every 10-30 seconds. This worked and the traffic substantially dropped off. After doing some more digging though, I got concerned this may be in itself an abuse of my VPS providers ToS. The risk did not outweigh the benefit. Gzip bombs fell under a similar category of concern.

lgats | a day ago

IANAL- sue them for DDoSing and disrupting your service.

> The traffic is hitting numbers that require me to re-negotiate my contract with CloudFlare and is otherwise a nuisance when reviewing analytics/logs.

So you're able to show financial hardship

2OEH8eoCRo0 | 3 days ago

I blocked the entirety of Singapore via Cloudflare for my personal site. I was seeing persistent weird traffic patterns and sometimes very odd if a little creepy. Not anymore though, the whole country is blocked.

lloydatkinson | 3 days ago

> I've tried 30X redirects (which it follows) to no avail

Make it follow redirects to some kind of illegal website. Be creative, I guess.

The reasoning being that if you can get AWS to trigger security measures on their side, maybe AWS will shut down their whole account.

znpy | 4 days ago

Have you tried redirecting the bot in a loop? That should allow it to keep making a ton of requests and hopefully generate traffic they'll have to pay for.

Another idea is replying with large cookies and seeing if the bot saves them and replies with them (once again, to eat traffic)

The idea is to increase their egress to the point someone notices (the bill)

nijave | 3 days ago

Have you considered EBPF filter that looks for 'Mozilla/5.0 (compatible; crawler)' and drops packets from that IP for 1 hr where it just straight drops packets. I.e, this is probably best way to handle bots, don't even reply so they have to timeout which usually is a few seconds.

janis1234 | 3 days ago

  iptables -A INPUT -s $bot_ip -j DROP

sph | 21 hours ago

What kind of website is this that makes it so lucrative to run so many requests?

nurettin | 4 days ago

Block the traffic from those ip address. You may use fail2ban to automate that if it becomes common.

TZubiri | 3 days ago

There might be some ideas to dig here: https://news.ycombinator.com/item?id=41923635

hamburgererror | 4 days ago

If you are using cloudflare, add a rule to do managed JS challenge. Your backend shouldn’t see the requests unless they pass challenge.

sp1982 | 3 days ago

[deleted]

| 4 days ago

Use a simple block rule, not a WAF rule, those are free.

hyperknot | 4 days ago

So far I've been able to get away with just blocking the data centers/countries that cause problems for my servers. Singapore and China are common causes for trouble.

As for trying to get them to stop, maybe redirect the bot to random IP:port combinations in a network that's less friendly to being scanned? I believe certain parts of DoD IP space tends to not look kindly upon attempts to scan them.

Depending on your setup, you could try to poison the bot's DNS for your domain. Send them the IP address of their local police force maybe.

My guess is that this is yet another AI scraper. There are others complaining about this bot online but all they seem to come up with is blocking the ASN in Cloudflare.

If there's no technical solution, if consider consulting with a legal professional to see if you can get Amazon to take action. Lawyers are expensive, but so is a Cloudflare bill when they decide you need to be on the "enterprise" tier.

jeroenhd | 3 days ago

tirreno(1) guy here.

I'd suggest taking a look into patterns and IP rotation (if any) and perhaps blocking IP CIDR at the web server level, if the range is short.

Why simple deny from 12.123.0.0/16 (Apache) is not working for you?

1. https://github.com/tirrenotechnologies/tirreno

reconnecting | 4 days ago

Silly suggestion: feed them bogus DNS info. See if you can figure out where their DNS requests are coming from.

ahazred8ta | 4 days ago

I started forwarding to amazon that worked.

ipaddr | 2 days ago

Ask a lawyer to send a hand-delivered letter to the AWS legal department demanding compensation or face court for damages. Mention of potential criminal proceedings for actively supporting ongoing cyber attacks might not hurt.

Instant results, I guarantee it.

Look up key AWS staff names in Singapore (blogs, talks, etc…) and mention them as plaintiffs.

Nobody cares about these things until they are directly impacted themselves.

Nothing has to actually happen! A letter is cheap.

But it’s the implication that matters. Just discovery can cost them more than the profit from some scummy web scraper.

jiggawatts | 2 days ago

if they have some service up on the machines the bot connect from then u can redirect them to themselves.

otherwise, maybe redirect to aws customer portal or something -_- maybe they will stop it if it hit themselves...

sim7c00 | 4 days ago

Redirect it to Trump's website. He will take care of it

pknerd | 4 days ago

Write to aws abuse team

brunkerhart | 4 days ago

This sounds like a fun project.

cactusplant7374 | 3 days ago

Null-route the entirety of AWS ip space.

snvzz | 4 days ago

[deleted]

| 4 days ago

Take a look at https://github.com/pingooio/pingoo

It's a reverse-proxy / load balancer with built-in firewall and automatic HTTPS. You will be able to easily block the annoying bots with rules (https://pingoo.io/docs/rules)

pingoo101010 | 4 days ago

block the IPs or setup an WAF on AWS if you cannot be on Cloudflare.

2000swebgeek | 4 days ago

Completely and utterly off topic: why on earth does HN use a dim gray font for the post description? It's so hard to read. I understand why downvoted comments are grayed out but why the post description???

throwaway127482 | 3 days ago

zip bomb it yeah !

realaaa | 3 days ago

Have ChatGPT write you a sternly worded cease and desist letter and send it to Amazon legal via registered mail.

AWS has become rather large and bloated and does stupid things sometimes, but they do still respond when you get their lawyers involved.

JCM9 | 4 days ago

What kind of content do you serve? 700 RPS is not a big number at all, for sure not enough to qualify as a DoS. I'm not surprised AWS did not take any action.

reisse | 4 days ago