# Rate limiting
# Brutal crawlers
Every so often a "brutal" crawler scrapes the shop. As a result, non-scaling services such as the RDS database system may get saturated, resulting in poor overall performance up to denial of service.
# Detection principle
# Metrics
HAPROXY can protocol a number of request and response metrics. The most interesting ones for abuse detection are:
- Connection rate
Aggregates TCP connections. - Request rate
Aggregates http requests.
Rates are averaged sums over a given period, e.g.
"average number of connections over 3 seconds".
# Request vs. connection rate
Clients using http/1.1 keepalive or http/2 (not yet used by us) open a single TCP connection and perform a number of http requests inside. A malicious bot could easily fire a huge amount of requests within one single connection.
In order to protect our non-scaling services analysis of request rate is the more promising approach.
# Client assignment
Once abusal has been detected only the client causing the requests
shall be affected.
HAPROXY keeps the aggregates metrics in a memory table with a single
entry per source. The source can be the originating IP address.
Just using the IP address as client key appears too broad when dealing with company networks. The default Giffits outgoing traffic is hidden behind a single IP address. With busy users and PRTG constantly calling shop URLs the request rate for our external IP would quickly raise above threshold.
The current approach combines the originating IP address with the requested server name. This results in reasonable request rates from company networks and still allows detection of malicious bots.
# Limiting principle
HAPROXY can react in several ways to a detected abusal:
- Hang up without response.
Consumes no resources and keeps naive scripts away. - Deny and respond with a http error, e.g. "429 Too Many Requests".
Consumes little resources and tells malicious bots that we know what they do. - Throttle down the originating source.
Uses more resources: The request is held until the backend allows more outgoing connections.
We can't be 100% safe from false positives in abuse detection.
Normal bots, especially Googlebot, and company networks must never
receive hard errors (unless there is a real error). In addition
we don't want to apply whitelists.
The best approach is to throttle down potentially abusing sources.
# How throttling works
Server definitions in a backend may contain the maxconn directive.
If this number of connections to a server is exceeded all subsequent
requests are held within HAPROXY until there are idle "slots" available
again.
Unfortunately HAPROXY does not allow dynamic (ACL based) values for
maxconn, neither can ACLs enable or disable server entries.
The solution is a 1:1 copy of the backend and choosing a very low
maxconn value in the new backend. Abuse ACL based switching is
then defined in the frontend with the use_backend statements.
As a prerequisite the throttled backend must not use keepalive when talking to its server. The client may use keepalive, but the backend has to make sure each request uses a new TCP connection.
# Implementation
# Detection algorithm
- Define
IP+ServernameasSource. - Filter: Only track shop6 URLs. Ignore Images.
- Per
Source, aggregate & average the http request rate over 10 seconds. - Treat the aggregate as
exceededwith a threshold of 150.
(means: >= 150 requests in 10 seconds) - If this aggregate is exceeded 5 times in a row, set an "abuse" ACL to true.
- If this aggregate falls below the threshold, set the "abuse" ACL to false.
- Route shop6 requests to the "throttled" backend if the "abuse" ACL is true.
# Sticky table
stick-table type string len 100 size 1M expire 3m store gpc0,http_req_rate(10s)
Defines a memory table with these properties:
- Contains 100 bytes long string entries.
- Can hold up to 1M entries (deletes oldest if exceeded).
- Deletes unused entries after 3 minutes.
- Stores values:
- http request rate per 10 seconds
- a global integer counter
gpc0(system defined)
We're gonna use this counter to implement the "5 times in a row" rule.
This results in an additional RAM consumption of up to 100MBytes
(+ internal overhead). That memory is allocated at process start
and maintained throughout the whole runtime.
Reloading HAPROXY resets the table.
# Enable tracking into sticky table
http-request set-var(req.host) hdr(host)
http-request track-sc0 src,concat(|,req.host) if shop6 !images !to_renderservice
Both statements only fire in case of an incoming http request (in comparison to fire at any level like tcp connect, ssl handshake etc.).
The first line stores the request header "host" inside a variable. This is necessary because the next line cannot access the request headers on its own.
The second line enables tracking if the filter rule applies (shop6,
no images).
The key is defined as concatenation of:
src- automatically set to the client IP address by HAPROXY,- a pipe symbol for separation,
- the host name previously extracted from the request headers.
# Apply detection rules
These ACLs represent the exceeded rule:
acl rate_exceeded sc0_http_req_rate ge 150
acl rate_ok sc0_http_req_rate lt 150
HAPROXY features ACL condition functions which are also commands.
In this case it's sc0_inc_gpc0:
- Command: Increment the gpc0 counter.
- Condition function: Return the new counter value.
acl rate_exceeded_multiple_times sc0_inc_gpc0 gt 5
acl clear_rate sc0_clr_gpc0 ge 0
The ACL rate_exceeded_multiple_times increments the gpc0
counter and resolves to true if it is > 5.
The ACL clear_rate resets the counter.
But:
The functions in those ACLs are only executed if the ACLs are
evaluated anywhere.
http-request set-header X-ratecheck ok if !rate_exceeded clear_rate
ACLs follow the circuit break pattern and are evaluated left to right.
The ACL clear_rate is only evaluated if rate_exceeded is false.
The counter is reset whenever the request rate has fallen below 150.
The next line evaluated rate_exceeded_multiple_times which increments
the counter whenever the request rate is above 150:
http-request set-header X-ratecheck exceeded if rate_exceeded rate_exceeded_multiple_times
# Why "http-request" set-header ?
We need to use a http-request action which does not change rule behaviour.
For example, "allow" would override later http-request rules with certain
actions like "deny" or "use-service".\
# Apply throttling
use_backend aws_production_throttled if rate_exceeded_multiple_times !to_stat !images !to_renderservice !shop5 shop6
use_backend aws_production if !to_stat !images !to_renderservice !shop5 shop6
use_backend commands follow the early out pattern: First match applies with
no further checks.
# Implement throttling
This is a shortened copy of the backend definitions (stripped of anything not of interest for rate limiting):
backend aws_production
mode http
option http-server-close
option http-pretend-keepalive
server production 52.59.138.176:80 id 1 check port 80 maxconn 300
backend aws_production_throttled
mode http
option http-server-close
option http-pretend-keepalive
server production 52.59.138.176:80 id 1 check port 80 maxconn 3
# Complete code extract
frontend www-in
mode http
# ... lots of options and ACLS
stick-table type string len 100 size 1M expire 3m store gpc0,http_req_rate(10s)
http-request set-var(req.host) hdr(host)
http-request track-sc0 src,concat(|,req.host) if shop6 !images !to_renderservice
acl rate_exceeded sc0_http_req_rate ge 150
acl rate_ok sc0_http_req_rate lt 150
acl rate_exceeded_multiple_times sc0_inc_gpc0 gt 5
acl clear_rate sc0_clr_gpc0 ge 0
http-request set-header X-ratecheck ok if !rate_exceeded clear_rate
http-request set-header X-ratecheck exceeded if rate_exceeded rate_exceeded_multiple_times
use_backend aws_production_throttled if rate_exceeded_multiple_times !to_stat !images !to_renderservice !shop5 shop6
use_backend aws_production if !to_stat !images !to_renderservice !shop5 shop6
backend aws_production
mode http
option http-server-close
option http-pretend-keepalive
server production 52.59.138.176:80 id 1 check port 80 maxconn 300
backend aws_production_throttled
mode http
option http-server-close
option http-pretend-keepalive
server production 52.59.138.176:80 id 1 check port 80 maxconn 3