Thanks to my sponsors: Gran PC, Justin Ossevoort, Lyssieth, joseph, Michał Bartoszkiewicz, SeniorMars, Guillaume E, Olivia Crain, Ivo Murrell, Yann Schwartz, Cole Kurkowski, Kamran Khan, Raphaël Thériault, Jörn Huxhorn, Peter Shih, Ronen Ulanovsky, Alex Krantz, Dimitri Merejkowsky, AdrianEddy, Daniel Strittmatter and 230 more
I won free load testing
👋 This page was last updated ~3 years ago. Just so you know.
Long story short: a couple of my articles got really popular on a bunch of sites, and someone, somewhere, went "well, let's see how much traffic that smart-ass can handle", and suddenly I was on the receiving end of a couple DDoS attacks.
It really doesn't matter what the articles were about — the attack is certainly not representative of how folks on either side of any number of debates generally behave.
My assumption is that it's a small group (maybe a Discord?) with a botnet, who wanted to have fun.
And, friends: fun was had.
The main attack
My main site ("the blog") received about 34M requests over 72h - in three spikes. It's behind Cloudflare at the time of this writing, so, here's a pretty graph (the granularity / bucket size is 1 hour):
To give you an idea of scale, the article that "blew up" on multiple news aggregators only got around 130K hits. This is what organic traffic looks like (with a long tail and everything):
In some ways, that attack was unsophisticated: it hit a single route, /
(the
front page), and two thirds of the requests used a single user agent:
Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/100.0.4896.127 Safari/537.36
Which is just the user-agent of Chrome's current version on Linux 64-bit. The traffic doesn't look like something like headless Chrome was used, (otherwise there'd be a lot more requests for assets like fonts, stylesheets, images etc.), but it could've been.
On the other hand, the attack was pretty well distributed:
Here are the top 15 AS the traffic came from:
- AS4134: China Telecom (China, backbone)
- AS61317: Heficed (cloud provider)
- AS398465: Rackdog (cloud provider)
- AS18209: Actcorp (India, telecom)
- AS14061: DigitalOcean (cloud provider)
- AS17639: Converge ICT (Philippines, telecom)
- AS9829: Bharat Sanchar Nigam Limited (India, backbone)
- AS7713: Telkom Indonesia (Indonesia, telecom)
- AS208294: CIA Triad LLC
- IP prefix is in the US
- Phone number has Australian prefix (+61)?
- Associated with Zwiebelfreunde who provide Tor exit nodes
- AS8452: TE-AS (Egypt, telecom)
- AS205100: F3 Netze, (Germany, associated with Freifunk Franken)
- AS9299: PLDT (Philippines, telecom)
- AS36947: Algérie Telecom (Algeria, telecom)
- AS141995: Contabo Asia (Singapore ops of a German cloud provider)
- AS27699: Telefonica (Brazil, telecom)
So, you know. The flip side of compute and bandwidth becoming a very affordable commodity worldwide is that... it's also affordable for bad actors.
Also, I'm sure Heficed, Rackdog, DigitalOcean and Contabo all say in their Terms of Service "don't use us to DDoS someone else", but it appears you can definitely do it for a few hours without them noticing (whether it was through compromised instances or not).
The secondary attack
The other target was my recently-launched video platform, which is a separate service, and runs on fly.io.
Here's that attack as seen from fly.io metrics (that I see as a user):
Here is the same attack as seen from Honeycomb, where data is self-reported (from multiple instances of my app) and sampled (not all traces are sent to Honeycomb):
Why yes, I do see something that stands out.
Let's use BubbleUp on that whole area there:
Single biggest asset (4K@60 video asset for a 50 minute video), makes sense.
They even left me a note! How thoughtful.
(That's a reference to one of the articles that got popular this week.)
I don't track which ranges were requested, but they did do some range requests, and some full requests. Not sure if that was random or just two individuals going at it:
That attack was a lot less distributed: it came from a bunch of Vultr IPs, and it stopped after banning the whole AS (just for my app).
How effective were the attacks?
To evaluate whether or not the attacks were successful, we need to discuss why people on the internet perform them.
Arguably, the only reason is "for the lulz" (for laughs), but the lulz has multiple tiers: the primary goal is to "deny service" - to overload the server(s) so that some content is not accessible anymore.
Then there's secondary goals: because providers typically bill for bandwidth, if it costs the target some money, that's even more fun. And if said providers decide that actually, they don't want a customer who's getting targeted like that, and boot them off their platform, then that's maximum fun.
And then of course there's the power trip, the idea that you have control over someone else, the ability to "punish" them for something they did. Just like any other form of harassment really.
So, were those goals met?
The primary goal definitely was: the site was inaccessible for a few hours over the course of Saturday, April 30th. As far as HTTP status codes go, I'm counting:
- 11M "499 Client Closed Request"
- 6.5M "503 Service Unavailable"
- 6M "403 Forbidden"
- 4M "200 OK"
- 3M "524 Origin Timeout"
- 2M "522 Origin Connection time-out"
- 1.5M "429 Too Many Requests"
- 500K "520 Origin Error"
- 100K "521 Origin Down"
A few people reached out to let me know about it, bummed they couldn't read the article for the time being. But I mean, as long as The Internet Archive is up, is anything really down? News aggregator users reflexively save articles that become somewhat popular, expecting them to go down, or in case they need receipts.
As for the secondary objectives: none of them were met.
It didn't cost me a single dollar:
- Hetzner doesn't bill for bandwidth, it's a fixed monthly price kind of deal
- Cloudflare eats bigger attacks for breakfast, and they don't charge for it either.
As for fly.io, well, I work there, so, they pay me.
It didn't get me in trouble either. On the contrary, every time something like that happens, I make new friends and hear back from old friends.
And honestly? Everyone brought popcorn and took notes. An attack like that is both entertaining and informative. It kinda gave me a kick in the butt to address some issues with my website. And it gave everyone ideas on how to better protect against this kind of thing.
Just like this article right here is good intel for anyone who is looking to DDoS me. But for me, sharing that info is part of the fun, and I don't really believe you can really achieve resiliency through obscurity.
Why the attack worked
A service "going down" can mean many things: being temporarily suspended by a cloud provider is one failure mode, for example. Having resource utilization increase so much that a machine is constantly swapping and everything is 1000x slower is another.
Here, there was a single point of failure: my origin, a single Hetzner dedicated server, running (some guessed it) some Rust code.
It's not exactly a secret: I outlined how my website is built back in 2020, and although I've made multiple incremental improvements since, it still works essentially the same way.
Before I wrote my own server, I was using static site generators (nanoc, Hugo, etc.), and deploying the result directly to an S3 bucket, configured for static site serving, behind Cloudflare. That was an easy, reliable setup.
My site wouldn't have gone down if I was still using that setup. However, I would also be looking at a rather large AWS bill right about now. (And I'd rather talk to AWS friends about anything other than billing).
Wait, a large AWS bill?
I thought you just said it was behind Cloudflare?
I know, I was surprised too. I'm fairly sure it wasn't always that way, but if you consult the Cloudflare docs, it clearly says:
Cloudflare only caches based on file extension and not by MIME type. The Cloudflare CDN does not cache HTML by default. Additionally, Cloudflare caches a website’s robots.txt.
And then, further down:
To cache additional content, see Page Rules to create a rule to cache everything.
Over the years, Cloudflare has saved me a lot of bandwidth costs (well, it would've if I was paying for it), but "only" for images, CSS, JS, etc.
Not video, see Section 2.8 of their Self-Serve Subscription Agreement:
2.8 Limitation on Serving Non-HTML Content
The Services are offered primarily as a platform to cache and serve web pages and websites. Unless explicitly included as part of a Paid Service purchased by you, you agree to use the Services solely for the purpose of (i) serving web pages as viewed through a web browser or other functionally equivalent applications, including rendering Hypertext Markup Language (HTML) or other functional equivalents, and (ii) serving web APIs subject to the restrictions set forth in this Section 2.8.
Use of the Services for serving video or a disproportionate percentage of pictures, audio files, or other non-HTML content is prohibited, unless purchased separately as part of a Paid Service or expressly allowed under our Supplemental Terms for a specific Service. If we determine you have breached this Section 2.8, we may immediately suspend or restrict your use of the Services, or limit End User access to certain of your resources through the Services.
(So, pissing off a kid with a botnet will not get you booted off of Cloudflare, but building a video platform on top of it will. They want you to use their product for that)
And, as it turns out, not for HTML.
Both top colors there are uncached requests: the orange spikes are the requests my origin did manage to serve, and the grey ones are the ones Cloudflare generated for me when the origin struggled too much (or when I stopped it altogether).
The first thing I tried to do was add a cache-control: public, max-age=120
response header. Still, the cache status was "dynamic", so every request went
to the origin.
I already had the cache-control
header set for static assets (images,
stylesheets, scripts), just not for HTML, because in two years of repeatedly
hitting the front page of various sites, it was never a performance concern.
Creating a page rule didn't immediately fix everything (but I'm convinced I was holding it wrong, since that's the thing they want you to do), but even if it had, at least on the Free and Pro plans, you can only match by URL: so, this would break the website for any logged-in users (who support me financially, and have access to some articles in advance).
Cool bear's hot tip
On the Business and Pro plans, there's a "Bypass Cache on Cookie" feature, which
even lets you specify regex patterns, like PHPSESSID=.*
.
However, it wouldn't take a very motivated attacker to figure out that they can hammer your origin with a well-formed cookie: there's no way for Cloudflare's edge to know whether the session cookie is actually valid or not, so it only helps for legitimate requests, not for attacks.
Long story short: without upgrading to the next plan over ($200/month), I can't use Cloudflare to cache HTML in a way that doesn't break my site. And that wouldn't help protect against attacks.
Meanwhile, back in my origin server...
$ sudo ss | wc -l
352950
Oh, 350K connections.
That's a lot of connections
That's certainly more connections than there should EVER be between Cloudflare and my server. But Cloudflare expects origin servers to signal when they're struggling.
Origins should return 429, or 503, or start refusing connections; they should do something, anything other than just accept a ridiculous number of concurrent connections and let them just sit there — that's just a bad deal for everybody.
And my site didn't do that! Because I half-assed that bit, and it worked for two years straight anyway. Almost as if... it was possible to move fast, delivering value, leaving some problems for later, even with Rust...
Amos...
Right, right, sorry.
So: the lack of caching wasn't actually that surprising to me. I do consider my HTML content dynamic, just like Cloudflare does: it is different for logged-in users, there's random bits ("what to read/watch next"), there's a full-text search engine (really just SQLite's, nothing fancy there yet).
However, I did expect Cloudflare's DDoS protection, their #1 selling point, to kick in.
And it mostly didn't.
Although Cloudflare did block/present a challenge for some fraction of the incoming traffic, it mostly did so when I turned on the I'm Under Attack mode, and even then, definitely not enough to let my origin serve legitimate requests again.
By the time the second wave happened, I was already in touch with an engineer at Cloudflare, who helped some, and let me know something interesting: the reason protection wasn't kicking in was that the attacker was staying just below their detection threshold.
Which seems to indicate the attacker knew what they were doing. But then why use a fixed user-agent? And hit just the one endpoint?
Because that's all it took?
Eh, fair enough.
Which brings us to another interesting point: the amount of requests is impressive in total, and very spikey, but it was still under Cloudflare's detection threshold most of the time, which means... my server probably should've handled it like a champ!
And just to be clear: it didn't, right?
Oh no, not at all. Memory usage was quite alright (hardly went past 5%), but all eight CPU cores were nearly maxed out, and, well, I couldn't get a single 200 OK out of it during the attack, even locally.
But then again I didn't sit there patiently for minutes waiting for my connection to be accepted.
See, at that time, my server was quite busy, doing a lot of stuff. Here's a
representative of what was going on, obtained with perf
(the exact command is just sudo perf top
):
As I mentioned, my website is mostly static content, but not just. And as I described in 2020, I'm not optimizing for "maximum server performance", I'm optimizing for "maximum content authoring convenience" (and also nice features for readers).
And so, Rust maximalism be damned, most of my website is actually powered by liquid templates and SQL queries.
Only the very base: file watching, hashing, DB management, the HTTP stack, and some very useful custom liquid filters (to render markdown, including my custom extensions, etc.), is actually written in Rust.
The front page, for example, runs several SQL queries to retrieve the latest videos, articles, and series. It then runs several liquid filters, not to process markdown (I cache that), but to truncate the HTML body so I can show just an excerpt with a "Read more" button.
For the curious, it looks something like this:
{% include "html/prologue.html" %}
{% capture sql_start %}
SELECT pages.*
FROM pages
JOIN revision_routes ON revision_routes.hapa = pages.hapa
WHERE revision_routes.revision = ?1
{% endcapture %}
{% capture sql_end %}
{% unless config.drafts %}
AND NOT pages.draft
{% endunless %}
{% unless config.future %}
AND datetime(pages.date) < datetime('now')
{% endunless %}
ORDER BY date DESC
{% endcapture %}
{% capture articles_sql %}
{{sql_start}}
AND revision_routes.parent_route_path = 'articles'
{{sql_end}}
LIMIT 3
{% endcapture %}
{% assign articles = articles_sql | query: revision %}
Later on, the page_listing
partial is invoked, which does stuff like:
<div class="page-section post-list">
{% for page in pages %}
{% assign page_html = page | page_markup %}
<div class="post-summary markup-container">
{{ page_html | truncate_html: max: 120 }}
</div>
{% endfor %}
(A lot of code was removed - this is a very noisy template).
Doing it that way is awesome. Liquid/Rust is my Python/C. It lets me prototype a ton of stuff really quickly, without recompiling anything, and it's absolutely been fast enough for two years of going viral.
The Rust codebase has no idea what the structure of content should be, nor does
it care: all it knows is that there's .md
assets, and those can be rendered
with pulldown-cmark, and some custom
template markup processor, and KaTeX, and, and -
those parts are built-in.
But series live entirely in the templates / SQL queries themselves: it's all path matching (with regexps, which I add to sqlite as a Rust function).
So, at the time of the attack, the P95 (95th percentile latency) for my index page was 168ms.
That's not fast: it's the ping between France and Brazil.
But it's also not slow: PageSpeed Insights seems very happy about a First Contentful Paint (FCP) of 1 second. There's more to the performance of a website than how long it takes to generate HTML, and I've spent a fair bit of effort taking care of everything else.
For example, I've recently created font subsets for Iosevka, which cut down several megabytes per visit, even though the font assets were always WOFF2.
Less recently, I've added support for avif and webp for images, instead of always serving png or jpg (depending on the kind of images).
And it was never a problem whenever one of my articles got popular, since the P95 for any single article was more like 30ms (and only involved a couple of very fast SQL queries, and no HTML rewriting/truncation whatsoever).
Fixing my origin
Adding cache-control
headers was a bust...
audible groan
...but I did make several other changes that should help next time, and I thought it'd be neat to show what they look like in code.
There's a whole bunch of Rust code there, if you want to skip it, search for "After the storm".
Yes, yes, I know, I should add anchor links for headers.
First, I added a limit to the maximum number of in-flight requests. My website uses warp as an HTTP framework (whereas my video platform uses axum), but they're both based on hyper, which means I can use standard tower layers with it (think "middleware").
Before in-flight requests limit:
async fn serve() {
// (omitted: everything else)
let addr: SocketAddr = config.address.parse()?;
// this turns a warp `Filter` into a tower `Service`
let svc = warp::service(all_routes.with(access_log));
let make_svc = hyper::service::make_service_fn(|_: _| {
// this gets called whenever a new connection is accepted: it gets its
// own clone of our HTTP service (this mostly increases reference counts)
async move { Ok::<_, Infallible>(svc.clone()) }
});
let server = hyper::Server::bind(&addr).serve(make_svc);
server.await?;
}
After in-flight requests limit:
use tower::{limit::GlobalConcurrencyLimitLayer, ServiceBuilder}
async fn serve() {
// (omitted: everything else)
let addr: SocketAddr = config.address.parse()?;
let svc = warp::service(all_routes.with(access_log));
let limit = GlobalConcurrencyLimitLayer::new(512);
let make_svc = hyper::service::make_service_fn(|_: _| {
// This increases reference counts for `GlobalConcurrencyLimitLayer`'s
// internal semaphore: the limit is
let svc = ServiceBuilder::new().layer(limit.clone()).service(svc.clone());
async move { Ok::<_, Infallible>(svc) }
});
}
I deployed this between two attack waves, and it did not help at all.
It does exactly what it says on the tin, but during an attack, the abnormally high volume means requests just keep piling up, taking up resources, and utilization is still as high as... well, as it can go.
Adding a load shed layer would've been smarter, I just didn't think of it at the time: it's one more line of code, and now, if there's more than 512 requests in-flight, the others just immediately error out.
Which feels like a bad thing at first, but once you've determined the maximum volume you can gracefully serve, that's what you want. Especially when there's an edge between you and the user-agent (in this case, Cloudflare) that's capable of backing off, retrying, etc.
The next thing I did was limit concurrent connections: I know Cloudflare has a lot of PoPs, but 350K connections still feels excessive.
I've covered this in Request coalescing in async Rust before, but who's got time for that, here's some code:
use std::convert::Infallible;
use futures::future::{ready, Ready};
use hyper::server::conn::AddrStream;
use tokio::sync::OwnedSemaphorePermit;
use tokio_util::sync::PollSemaphore;
use tower::Service;
pub struct ServiceFactory<S> {
pub inner: S,
pub semaphore: PollSemaphore,
pub permit: Option<OwnedSemaphorePermit>,
}
impl<S> Service<&AddrStream> for ServiceFactory<S>
where
S: Clone,
{
type Response = PermitService<S>;
type Error = Infallible;
type Future = Ready<Result<Self::Response, Self::Error>>;
fn poll_ready(
&mut self,
cx: &mut std::task::Context<'_>,
) -> std::task::Poll<Result<(), Self::Error>> {
if self.permit.is_none() {
self.permit = Some(futures::ready!(self.semaphore.poll_acquire(cx)).unwrap());
}
Ok(()).into()
}
fn call(&mut self, _req: &AddrStream) -> Self::Future {
let permit = self.permit.take().expect(
"you didn't drive me to readiness did you? you know that's a tower crime right?",
);
ready(Ok(PermitService {
inner: self.inner.clone(),
_permit: permit,
}))
}
}
pub struct PermitService<S> {
pub inner: S,
pub _permit: OwnedSemaphorePermit,
}
impl<S, R> Service<R> for PermitService<S>
where
S: Service<R>,
{
type Response = S::Response;
type Error = S::Error;
type Future = S::Future;
fn poll_ready(
&mut self,
cx: &mut std::task::Context<'_>,
) -> std::task::Poll<Result<(), Self::Error>> {
self.inner.poll_ready(cx)
}
fn call(&mut self, req: R) -> Self::Future {
self.inner.call(req)
}
}
And it's used like this:
let conns_limit = Semaphore::new(128);
let svc = ServiceBuilder::new()
.layer(GlobalConcurrencyLimitLayer::new(512))
.service(warp::service(all_routes.with(access_log)));
let factory = ServiceFactory {
inner: svc,
semaphore: PollSemaphore::new(Arc::new(conns_limit)),
permit: None,
};
let server = hyper::Server::bind(&addr).serve(factory);
info!("listening on {}", addr);
server.await?;
That one's significantly more verbose, but it's also very re-usable, that's kind of tower's deal. Which means it probably exists in a crate somewhere and I'm an idiot for rewriting it in every project. Or I should just publish my own crate.
But the elegance here is that... it's a Service
that takes an &AddrStream
and returns a Service
. That delicious meta bit can get confusing at times,
but it turns out to be super convenient.
Shortly after I deployed that change, things broke: the site appeared unavailable again. The attack hadn't resumed yet though — the available 512 connection slots had simply filled up, were idle, and Cloudflare edge nodes weren't able to establish any more connections.
My solution was to simply enforce idle read and write timeouts: any connection that hasn't been read from or written to in a few seconds gets reset.
Because I brought in a custom "acceptor" for that, it was also a good occasion
to close another hole: my server was listening on 0.0.0.0:80
. If you could
guess the IP (which, there's entire sites dedicated to that), you could hammer
it directly, without proxying through Cloudflare.
That didn't happen, but it sure could have.
Instead, we want the origin to only accept connections from Cloudflare IP ranges.
So, here's an acceptor that does both, and updates the IP ranges every now and then:
use std::{
collections::HashSet,
io,
net::{IpAddr, SocketAddr},
pin::Pin,
str::FromStr,
sync::Arc,
time::Duration,
};
use arc_swap::ArcSwap;
use futures::TryFutureExt;
use hyper::server::accept::{from_stream, Accept};
use ipnet::IpNet;
use tokio::net::TcpStream;
use tokio_io_timeout::{TimeoutReader, TimeoutWriter};
use tracing::{debug, info, warn};
pub const IPS_V4: &str = include_str!("ips-v4.txt");
pub const IPS_V6: &str = include_str!("ips-v6.txt");
pub const IPS_LOCAL: &str = "127.0.0.1/8";
pub fn timeout_acceptor(
addr: SocketAddr,
) -> impl Accept<Conn = Pin<Box<TimeoutWriter<TimeoutReader<TcpStream>>>>, Error = io::Error> {
let ip_nets = parse_ip_nets(&[IPS_V4, IPS_V6, IPS_LOCAL]).unwrap();
info!("Loaded {} ip nets", ip_nets.len());
let ip_nets = Arc::new(ArcSwap::from_pointee(ip_nets));
tokio::spawn({
let ip_nets = ip_nets.clone();
async move {
loop {
if let Err(e) = update_ip_nets(&ip_nets).await {
warn!("Could not update IP nets: {e}");
};
}
}
});
let localhost = IpAddr::from([127, 0, 0, 1]);
from_stream(
async move {
let ln = tokio::net::TcpListener::bind(addr).await?;
let stream = async_stream::stream! {
loop {
let (stream, addr) = ln.accept().await?;
if let Some(net) = ip_nets.load()
.iter()
.find(|net| net.contains(&addr.ip()))
{
debug!("Allowing {} through net {net}", addr.ip());
} else {
debug!("Disallowing {}", addr.ip());
continue;
}
let should_timeout = addr.ip() != localhost;
let mut stream = TimeoutReader::new(stream);
if should_timeout {
stream.set_timeout(Some(Duration::from_secs(5)));
}
let mut stream = TimeoutWriter::new(stream);
if should_timeout {
stream.set_timeout(Some(Duration::from_secs(5)));
}
yield Ok(Box::pin(stream))
}
};
Ok(stream)
}
.try_flatten_stream(),
)
}
fn parse_ip_nets(sources: &[&str]) -> color_eyre::Result<HashSet<IpNet>> {
let mut set: HashSet<IpNet> = Default::default();
for input in sources {
for line in input.lines() {
let line = line.trim();
if line.is_empty() || line.starts_with('#') {
continue;
}
let ip_net = IpNet::from_str(line).unwrap();
set.insert(ip_net);
}
}
Ok(set)
}
async fn update_ip_nets(ip_nets_handle: &ArcSwap<HashSet<IpNet>>) -> color_eyre::Result<()> {
let client = reqwest::Client::new();
tokio::time::sleep(Duration::from_secs(15)).await;
let mut sources = vec![IPS_LOCAL.to_string()];
for url in [
"https://www.cloudflare.com/ips-v4",
"https://www.cloudflare.com/ips-v6",
] {
sources.push(client.get(url).send().await?.text().await?);
}
let sources: Vec<_> = sources.iter().map(|x| x.as_str()).collect();
let ip_nets = parse_ip_nets(&sources[..])?;
info!(
"Loaded {} ip nets (had {})",
ip_nets.len(),
ip_nets_handle.load().len()
);
ip_nets_handle.store(Arc::new(ip_nets));
Ok(())
}
Featured here: reqwest for easy async HTTP
requests, arc-swap to avoid using an
Arc<Mutex<T>>
or Arc<RwLock<T>>
,
async-stream to be able to create
an async Stream
using generator syntax (yield
etc.).
It's pretty naive code, but I think it looks pretty neat.
A better way to only accept connection from Cloudflare IPs would be to set up firewall rules (and update them periodically). Then any disallowed connection could be simply refused (thus not taking space in the accept queue), or dropped (thus wasting the attacker's time).
But hey, gotta leave stuff to do for later!
Also, using it is a breeze:
let acceptor = timeout_acceptor(addr);
// instead of `Server::bind`:
let server = hyper::Server::builder(acceptor).serve(factory);
info!("listening on {}", addr);
server.await?;
Because it doesn't return an AddrStream
but instead a
Pin<Box<TimeoutWriter<TimeoutReader<TcpStream>>>>
...
Gesundheit.
...it broke ServiceFactory
with a gnarly error
message, but
I've been doing this long enough that I knew exactly to turn:
impl<S> Service<&AddrStream> for ServiceFactory<S> {
// etc.
}
Into:
impl<S, A> Service<&A> for ServiceFactory<S> {
// etc.
}
So that ServiceFactory
works for any acceptor (it doesn't care what the
actual IO type is).
I added some user-agent banning on my side too (in case I ever move away from Cloudflare), same tower boilerplate here:
use std::{collections::HashSet, sync::Arc};
use futures::future::{ready, Either, Ready};
use http::{Request, Response};
use tower::{Layer, Service};
// Layer
pub struct BanUserAgents {
banned: Arc<HashSet<String>>,
}
impl BanUserAgents {
pub fn new() -> Self {
let mut banned = HashSet::new();
banned.insert("Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/100.0.4896.127 Safari/537.36".to_string());
Self {
banned: Arc::new(banned),
}
}
}
impl Default for BanUserAgents {
fn default() -> Self {
Self::new()
}
}
impl<S> Layer<S> for BanUserAgents {
type Service = BanUserAgentsService<S>;
fn layer(&self, inner: S) -> Self::Service {
BanUserAgentsService {
inner,
banned: self.banned.clone(),
}
}
}
#[derive(Clone)]
pub struct BanUserAgentsService<S> {
inner: S,
banned: Arc<HashSet<String>>,
}
impl<S, B> Service<Request<B>> for BanUserAgentsService<S>
where
S: Service<Request<B>, Response = Response<hyper::Body>>,
{
type Response = S::Response;
type Error = S::Error;
type Future = Either<S::Future, Ready<Result<S::Response, S::Error>>>;
fn poll_ready(
&mut self,
cx: &mut std::task::Context<'_>,
) -> std::task::Poll<Result<(), Self::Error>> {
self.inner.poll_ready(cx)
}
fn call(&mut self, req: Request<B>) -> Self::Future {
let user_agent = req
.headers()
.get("user-agent")
.and_then(|h| h.to_str().ok())
.unwrap_or_default();
if self.banned.contains(user_agent) {
let res = Response::builder()
.status(403)
.body(hyper::Body::empty())
.unwrap();
Either::Right(ready(Ok(res)))
} else {
Either::Left(self.inner.call(req))
}
}
}
There's no good reason to code up an entire tower layer by hand to do this, I should've used something like ServiceExt::filter and it would've been only a couple lines.
Next up, I configured Sentry and (after checking with them that I'd be okay if the attack resumed) Honeycomb for the main site. I had both of these set up for the video platform, but never needed to look too closely to the main site up until now.
let config = Config::read(&args.path)?;
// this reports panics
let _guard = sentry::init((
SENTRY_DSN,
sentry::ClientOptions {
release: sentry::release_name!(),
environment: Some(config.env.to_string().into()),
..Default::default()
},
));
let service_name = "futile";
// this allows tracking spans across services through known HTTP headers
opentelemetry::global::set_text_map_propagator(TraceContextPropagator::new());
// this'll error out locally unless I spin up an otlp-exporter. in
// production, this points to an exporter I run myself, off-box.
let otlp_endpoint = config.otlp_endpoint.as_deref().unwrap_or("localhost:4317");
let tracer = opentelemetry_otlp::new_pipeline()
.tracing()
.with_trace_config(
opentelemetry::sdk::trace::config()
// all traces will have the same service name
.with_resource(Resource::new(vec![KeyValue::new(
opentelemetry_semantic_conventions::resource::SERVICE_NAME,
service_name,
)]))
// send 10% of traces
.with_sampler(Sampler::TraceIdRatioBased(0.1)),
)
.with_exporter(
// use GRPC to talk to the exporter
opentelemetry_otlp::new_exporter()
.tonic()
.with_endpoint(otlp_endpoint),
)
// send all of that asynchronously, using tokio tasks
.install_batch(opentelemetry::runtime::Tokio)?;
info!(%otlp_endpoint, "Setting up telemetry");
let telemetry = tracing_opentelemetry::layer().with_tracer(tracer);
// filter printed-out log statements according to the RUST_LOG environment
// variable, default to info
let rust_log_var = std::env::var("RUST_LOG").unwrap_or_else(|_| "info".to_string());
let log_filter = Targets::from_str(&rust_log_var)?;
// different filter for traces sent to honeycomb
let trace_filter = Targets::from_str("futile=info")?;
Registry::default()
.with(
tracing_subscriber::fmt::layer()
.with_ansi(true)
.with_filter(log_filter),
)
.with(
ErrorLayer::default()
.and_then(telemetry)
.with_filter(trace_filter),
)
.init();
And started instrumenting some functions with tracing, one of my favorite things, since it's so easy.
I re-used an IncomingHttpSpanLayer
I had made for the video platform:
//! Make an http span
// TODO: deduplicate with tube
use std::{
future::Future,
pin::Pin,
task::{Context, Poll},
};
use http::{Request, Response};
use hyper::Body;
use tower::{Layer, Service};
use tower_request_id::RequestId;
use tracing::{field, info_span, instrument::Instrumented, Instrument, Span};
/// Layer for [IncomingHttpSpanService]
#[derive(Default)]
pub struct IncomingHttpSpanLayer {}
impl<S> Layer<S> for IncomingHttpSpanLayer
where
S: Service<Request<Body>> + Clone + Send + 'static,
{
type Service = IncomingHttpSpanService<S>;
fn layer(&self, inner: S) -> Self::Service {
IncomingHttpSpanService { inner }
}
}
/// Extracts opentelemetry context from HTTP headers
#[derive(Clone)]
pub struct IncomingHttpSpanService<S>
where
S: Service<Request<Body>> + Clone + Send + 'static,
{
inner: S,
}
impl<S, B> Service<Request<Body>> for IncomingHttpSpanService<S>
where
S: Service<Request<Body>, Response = Response<B>> + Clone + Send + 'static,
{
type Response = S::Response;
type Error = S::Error;
type Future = PostFuture<Instrumented<S::Future>, B, S::Error>;
fn poll_ready(&mut self, cx: &mut Context<'_>) -> Poll<Result<(), Self::Error>> {
self.inner.poll_ready(cx)
}
fn call(&mut self, req: Request<Body>) -> Self::Future {
let user_agent = req
.headers()
.get("user-agent")
.and_then(|s| s.to_str().ok())
.unwrap_or("");
let host = req
.headers()
.get("host")
.and_then(|s| s.to_str().ok())
.unwrap_or("");
let sec_ch_ua_mobile = req
.headers()
.get("sec-ch-ua-mobile")
.and_then(|s| s.to_str().ok())
.unwrap_or("");
let sec_ch_ua_platform = req
.headers()
.get("sec-ch-ua-platform")
.and_then(|s| s.to_str().ok())
.unwrap_or("");
let span = info_span!(
"http request",
otel.name = %req.uri().path(),
otel.kind = "server",
http.method = %req.method(),
http.url = %req.uri(),
http.status_code = field::Empty,
http.user_agent = &user_agent,
http.host = &host,
http.sec_ch_ua_mobile = &sec_ch_ua_mobile,
http.sec_ch_ua_platform = &sec_ch_ua_platform,
request_id = field::Empty,
user_id = field::Empty,
);
if let Some(id) = req.extensions().get::<RequestId>() {
span.record("request_id", &id.to_string().as_str());
}
let fut = {
let _guard = span.enter();
self.inner.call(req)
};
PostFuture {
inner: fut.instrument(span.clone()),
span,
}
}
}
pin_project_lite::pin_project! {
/// Future that records http status code
pub struct PostFuture<F, B, E>
where
F: Future<Output = Result<Response<B>, E>>,
{
#[pin]
inner: F,
span: Span,
}
}
impl<F, B, E> Future for PostFuture<F, B, E>
where
F: Future<Output = Result<Response<B>, E>>,
{
type Output = F::Output;
fn poll(self: Pin<&mut Self>, cx: &mut Context<'_>) -> Poll<Self::Output> {
let this = self.project();
let res = futures::ready!(this.inner.poll(cx));
if let Ok(res) = &res {
this.span.record("http.status_code", &res.status().as_u16());
}
res.into()
}
}
...so that all HTTP requests have their own info-level span, and show up in Honeycomb like so:
Hey, those latencies don't line up with the figures you mentioned earlier.
Haha, one thing at a time.
That lets me answer questions like "what RSS readers (that aren't browsers) is my audience using?"
I'm not sure what to do with that particular answer, but hey, now we know.
Edit: by popular demand, here's the list including user-agents that contain "Mozilla/5.0":
Also incredibly valuable, because spans are being sent instead of simple log messages, we get these nice visualizations of where we're actually spending time.
This is for a request to the index:
As I explained earlier, there's a few DB queries involved, it's fetching page markup (which is pre-rendered, but also needs a DB query), and then it truncates, which takes... forever.
So, immediately, I jump back to the code and find a ton of actionable information there: for example, the size of my "connection pool" is only 10 r2d2-sqlite's default: under load, that's not enough.
How do I know? I can see it waiting for a checkout:
And this is all it took:
let conn = info_span!("sql.checkout").in_scope(|| self.content_pool.get())?;
I also noticed a couple other things, like:
- I'm running blocking code (SQLite queries) in an async context. I should be using spawn_blocking for that. (There's no good lint for that)
- I'm not caching prepared statements. This doesn't show up as a hotspot, but
switching from
prepare
to prepare_cached is trivial, so why not? - I don't have any indexes in my SQLite database! What a great freebie I kept for myself.
And finally, I'm spending way more time on truncate_html
than I thought.
It is non-trivial:
impl Filter for TruncateHtmlFilter {
#[instrument(name = "truncate_html", skip(self, input, runtime), fields(html_len))]
fn evaluate(
&self,
input: &dyn ValueView,
runtime: &dyn liquid_core::Runtime,
) -> liquid_core::Result<liquid_core::Value> {
let span = Span::current();
convert_errors(|| {
let kinput = input.to_kstr();
let input = kinput.as_str();
span.record("html_len", &input.len());
let mut output: Vec<u8> = Vec::new();
let output_sink = |c: &[u8]| {
output.write_all(c).unwrap();
};
let max: u64 = self
.args
.max
.as_ref()
.and_then(|p| p.try_evaluate(runtime))
.and_then(|l| l.as_scalar().and_then(|s| s.to_integer().map(|x| x as u64)))
.unwrap_or(180);
let char_count = AtomicU64::new(0);
let mut skip: bool = false;
let mut rewriter = HtmlRewriter::new(
Settings {
element_content_handlers: vec![
element!("*", |el| {
if char_count.load(Ordering::SeqCst) > max && el.tag_name() == "p" {
skip = true;
}
if skip {
el.remove();
}
Ok(())
}),
text!("*", |txt| {
if matches!(txt.text_type(), TextType::Data) {
char_count.fetch_add(txt.as_str().len() as u64, Ordering::SeqCst);
}
Ok(())
}),
],
..Settings::default()
},
output_sink,
);
rewriter
.write(input.as_bytes())
.map_err(|e| liquid_core::Error::with_msg(format!("rewriting error: {:?}", e)))?;
drop(rewriter);
Ok(to_value(&std::str::from_utf8(&output[..])?)?)
})
}
}
...but also, it feeds the whole article through, well past the initial 120 characters of text it's trying to keep. That's wasteful, and it's a very predictable transform that could be cached, just like I cache rendered markdown.
So what I did next...
You fixed all of these, right?
Absolutely not! I ignored all of these, and jumped straight to implementing caching.
Latency was always "fine" when the site wasn't under attack. Knowing about these bothers me, but they can wait. What I really needed was some form of whole-page caching.
I've put that off for so long, and it was so easy.
I simply plugged in moka, a "fast, concurrent cache library for Rust", that supports async, and time-based expiry.
So, boom, build a cache:
let server_state = ServerState {
config: Arc::clone(&config),
revholder,
content_pool,
users_pool,
broadcast_rev,
// this is the only new field
rendered_templates_cache: Cache::builder()
// TTL: 5 minutes
.time_to_live(Duration::from_secs(5 * 60))
// TTI: 1 minute
.time_to_idle(Duration::from_secs(60))
.build(),
};
And the serve_template
function becomes:
#[instrument(skip(self, globals), fields(cache.status))]
async fn serve_template(
self,
template_name: &str,
mut globals: Object,
content_type: &'static str,
) -> Result<Box<dyn Reply + 'static>, Report> {
let span = Span::current();
let cache_key = format!("{}?{}", self.path, self.raw_query.as_str());
let (creds, session_cookies) =
FutileCredentials::load_from_cookies(self.config(), &self.cookies).await;
let has_creds = creds.is_some();
let cache_control = if has_creds {
"no-cache"
} else {
"private, max-age=120"
};
let mut body: Option<Bytes> = None;
let should_cache = self.state.config.env.is_prod() && !has_creds;
if should_cache {
// try to hit the cache first
if let Some(cached_body) = self.state.rendered_templates_cache.get(&cache_key) {
body = Some(cached_body);
span.record("source", &"hit");
} else {
span.record("source", &"miss");
}
} else {
span.record("cache.status", &"dynamic");
}
let body = if let Some(body) = body {
body
} else {
let rev = self.state.revholder.get();
let template = rev
.template_repo
.templates
.get(template_name)
.ok_or_else(|| Error::TemplateNotFound(template_name.into()))?;
// omitted: insert a ton of "globals" my templates expect, stuff
// like url params, the config (minus secrets), a friend URL, user
// info (if logged in), user settings (ligatures enabled, theme),
// etc.
let rendered = {
let render_span = info_span!("liquid.render");
let _guard = render_span.enter();
template
.render(&globals)
.map_err(|e| e.context("template", template_name.to_string()))?
};
// oh yeah, I forgot: I have live-reload on my website. Adding an
// idle timeout on all connections broke it, so I just disabled it
// for the development environment.
let rendered = inject_livereload(&self.state.config, &rendered);
let body = Bytes::copy_from_slice(rendered.as_ref().as_bytes());
// an earlier version of this article (and my site) inserted pages
// rendered _with credentials_ into the cache. woops!
if should_cache {
self.state
.rendered_templates_cache
.insert(cache_key, body.clone())
.await;
}
body
};
let res = Response::builder()
.status(StatusCode::OK)
.header("cache-control", cache_control)
.header("content-type", content_type)
.apply_session_cookies(session_cookies)
.body(body);
let res: Box<dyn Reply + 'static> = Box::new(res);
Ok(res)
}
Cloning a Bytes, which happens on cache hit, simply increments a reference count, so everything is nice and cheap.
The cache-control
header here is technically wrong, since I don't actually
ever want Cloudflare to cache hit, but since it already doesn't... I'll fix it
later.
With that little change (it rendered unconditionally beforehand), all pages for logged-out users are cached for 1m-5m, depending on whether they're being requested a lot.
And that makes the difference between this:
$ oha -z 5s http://localhost -H '(valid cookies omitted)'
Summary:
Success rate: 1.0000
Total: 5.0011 secs
Slowest: 1.0898 secs
Fastest: 0.0791 secs
Average: 0.5314 secs
Requests/sec: 87.9810
Total data: 14.61 MiB
Size/request: 34.00 KiB
Size/sec: 2.92 MiB
Response time histogram:
0.171 [20] |■■■■■■■
0.263 [18] |■■■■■■
0.355 [48] |■■■■■■■■■■■■■■■■■
0.447 [50] |■■■■■■■■■■■■■■■■■
0.539 [89] |■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
0.630 [82] |■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
0.722 [63] |■■■■■■■■■■■■■■■■■■■■■■
0.814 [37] |■■■■■■■■■■■■■
0.906 [21] |■■■■■■■
0.998 [7] |■■
1.090 [5] |■
Latency distribution:
10% in 0.2837 secs
25% in 0.3926 secs
50% in 0.5350 secs
75% in 0.6570 secs
90% in 0.7618 secs
95% in 0.8596 secs
99% in 0.9995 secs
Details (average, fastest, slowest):
DNS+dialup: 0.0006 secs, 0.0001 secs, 0.0011 secs
DNS-lookup: 0.0000 secs, 0.0000 secs, 0.0000 secs
Status code distribution:
[200] 440 responses
(87 requests per second, painfully — P95 of 860ms)
And this:
$ oha -z 5s http://localhost
Summary:
Success rate: 1.0000
Total: 5.0017 secs
Slowest: 0.0077 secs
Fastest: 0.0002 secs
Average: 0.0015 secs
Requests/sec: 34073.2242
Total data: 5.53 GiB
Size/request: 34.00 KiB
Size/sec: 1.10 GiB
Response time histogram:
0.001 [7190] |■■■■
0.001 [22000] |■■■■■■■■■■■■■■■
0.001 [40329] |■■■■■■■■■■■■■■■■■■■■■■■■■■■
0.002 [46719] |■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
0.002 [32402] |■■■■■■■■■■■■■■■■■■■■■■
0.002 [14861] |■■■■■■■■■■
0.003 [4874] |■■■
0.003 [1364] |
0.004 [450] |
0.004 [130] |
0.004 [106] |
Latency distribution:
10% in 0.0008 secs
25% in 0.0011 secs
50% in 0.0014 secs
75% in 0.0018 secs
90% in 0.0022 secs
95% in 0.0024 secs
99% in 0.0029 secs
Details (average, fastest, slowest):
DNS+dialup: 0.0008 secs, 0.0001 secs, 0.0025 secs
DNS-lookup: 0.0000 secs, 0.0000 secs, 0.0003 secs
Status code distribution:
[200] 170425 responses
(34K requests per second, easily — P95 of 2.4ms)
It's important to note that this code correctly identifies valid session cookies. These are signed, so if you want to hammer an uncached endpoint, you now need to either:
- Somehow reverse how cookies are signed + find the secret key (which is super easy to swap, it'll just log out everyone once. No biggie)
- Become a subscriber so you get a legitimate cookie (which will show up in traces, easy to block)
Is this pay to play?
Which leaves cached endpoints as durable attack targets: but because the maximum RPS (for the costliest page on the site) went from ~90 to 34K (a 37677% increase, for those keeping track), an attack would need to be above Cloudflare's threshold to even make a dent, at which point their DDoS protection would kick in.
As for the video platform: I made no changes. None. It already had good observability, and I had planned for this eventuality from the start: the only thing I was trying to prevent wasn't downtime, it was a huge AWS bill. And I'm happy to report that 100% of the requests that were served, were served from the SSD cache.
Also, most requests were served, since there's eight separate instances of the video app in eight different regions (Paris, Tokyo, Washington, São Paulo, etc.)
After the storm: collateral damage
Another secondary objective of a DDoS is to generate collateral damage: to force the target to block legitimate traffic in response to the attack, losing out on potential business and/or generally annoying people.
Blocking an entire AS is often excessive (but oh-so-convenient), blocking Tor exit nodes is very tempting, and sometimes you just don't want to hear from a certain country in a while. But these are all overshoots.
Ever since the attack, I've gotten messages from regular readers who could't access my website, because they happened to be running the latest Google Chrome on Linux.
I tried everything: attack mode has been disabled for a while, I've started allowlisting the IPs of my readers, forcing a "managed challenge" for that user-agent (hoping it'll override the automated block), but nothing seemed to be doing the trick, except for switching to another browser.
Eventually, I realized Cloudflare had nothing to do with it. During the attack, I had started banning that same user-agent (returning a 403) from my origin directly, and I just... forgot to disable that after the attack was over.
The confusing bit was that readers were seeing a Cloudflare error page saying "Sorry, you have been blocked".
And even after I removed my counter-measure, they're still seeing it. Living behind someone else's edge is a mixed blessing.
This is also partly the point: you're running left and right trying to make things better, you get sloppy. That's how it's supposed to work.
Anyway.
Performing a DDoS is technically an "illegal cybercrime", but realistically, receiving one is just another Saturday.
Until next time, take excellent care of yourselves!
Update: hi again!
Minutes after I posted this article, the attack resumed. Same shit, different AS. Here are the newcomers:
- AS45102: Alibaba US (Global, cloud provider)
- AS16276: OVH (France, cloud provider)
- AS141677: Nathosts Limited (Hong Kong)
- AS8075: Microsoft Azure (US/HK/Brazil, cloud provider)
- AS328386: Adnexus (South Africa)
- AS7303: Telecom Argentina
- AS24940: Hetzner (Germany, cloud provider)
- AS45758: Triple T Broadband (Thailand, ISP)
- AS37963: Alibaba Hangzhou (Global, cloud provider)
- AS63949: Linode (US, VPS provider)
At first, the site immediately went down, returning 522 (Origin Connection Time-out).
Turns out a limit of 256 is way too low: Cloudflare has ~275 POPs, and each of them might establish a few connections to my origin. I raised the limit to 2048 (both for connections and in-flight requests), and immediately the 522 line went down, and a 200 line went up!
Shortly after, Cloudflare caught up with them and they saw a 403 spike. Then they went away. Site's back up and fast as ever.
Here's another article just for you:
Rust generics vs Java generics
In my previous article, I said I needed to stop thinking of Rust generics as Java generics, because in Rust, generic types are erased.
Someone gently pointed out that they are also erased in Java, the difference was elsewhere. And so, let's learn the difference together.
Java generics
I learned Java first (a long, long time ago), and their approach to generics made sense to me at the time.