Rate limiting

A rate limiter is the layer that caps how many requests an identified caller may make per unit time. For an authentication surface, the rate limiter is one of the most consequential pieces of operational defence in depth: the lockout policy catches the specific case of failed credentials, but the rate limiter catches the broader case of brute-force and credential-stuffing distribution. This chapter covers the RateLimitLayer Tower middleware, the key-extraction strategies that determine what is rate-limited, the tuning patterns for different endpoints, and the SLI signal the layer produces.

Why rate limiting matters

The lockout policy in Multi-tenancy catches one specific pattern: many failures against one identifier. A rate limiter catches a wider pattern: a high volume of requests against an endpoint, regardless of identifier, regardless of success.

The shapes of attack the rate limiter catches:

Credential stuffing. An attacker with a list of credentials tries each one against the login endpoint. Each individual attempt fails on its own credentials (no lockout against any single user), but the aggregate rate is far above legitimate traffic. The rate limiter on the login endpoint, keyed by source IP, drops the attack to a trickle.

Account-existence enumeration. An attacker probes the signup endpoint to find which usernames are taken. Each request might succeed (the username is unique) or fail (the username is taken), and the response leaks the information. The rate limiter caps the enumeration rate; combined with response-shaping (return the same shape for both cases), the attack becomes impractical.

Token-replay forwarding. An attacker who has captured a valid session cookie forwards it through many connections to evade fingerprint detection. Each request looks legitimate on its own; the aggregate volume is the giveaway. The rate limiter keyed by session id catches the pattern.

Workload misbehaviour. A workload that for some reason has entered a tight loop calling the application's API. The authentication side validates the workload token on each request; the rate limiter catches the runaway pattern before it overwhelms the service.

The layer

RateLimitLayer is a Tower layer with a small configuration:

use axess::{RateLimitLayer, RateLimitConfig, KeyExtractor};
use std::time::Duration;

let layer = RateLimitLayer::new(
    RateLimitConfig::builder()
        .max_requests(10)
        .window(Duration::from_secs(60))
        .key(KeyExtractor::PeerIp)
        .build(),
);

The configuration says "no more than ten requests per minute, keyed by the peer IP." The layer counts requests against each distinct peer IP; when a key has hit the limit within the window, subsequent requests get a 429 (Too Many Requests) with a Retry-After header.

The window is a sliding token bucket. The math: each key has a bucket of max_requests tokens; each request consumes one; tokens regenerate at a rate of max_requests per window. A burst of more than max_requests requests within a short interval consumes all the tokens; subsequent requests are rejected until enough tokens have regenerated.

The state of the buckets lives in memory by default (BucketStore::InMemory). For multi-instance deployments where the same caller can reach any instance, the rate limit needs to be aggregated across instances; BucketStore::Valkey { client } shifts the state to a shared Valkey instance.

Key extraction

The key is what the rate limiter counts against. The KeyExtractor enum carries the choices:

pub enum KeyExtractor {
    PeerIp,                              // request source IP (read through trusted-proxy)
    SessionId,                           // present session id
    UserId,                              // authenticated user
    TenantId,                            // authenticated tenant
    WorkloadId,                          // authenticated workload
    Custom(Arc<dyn KeyExtractorFn>),     // application-supplied
    Composite(Vec<KeyExtractor>),        // multi-key (one bucket per combination)
}

The choice of key determines which attack the limiter catches. PeerIp catches single-source attacks; SessionId catches session-replay attacks; UserId catches per-user runaway loops; TenantId catches per-tenant runaway (which can be a noisy neighbour rather than an attack).

The Composite choice creates one bucket per combination of the named keys. A rate limit keyed by (PeerIp, UserId) lets a single legitimate user from one IP do their normal work while catching a single attacker IP that is rotating through many users (the composite key is unique per (ip, user) pair, so the attacker exhausts each pair's bucket once per user, but the total request rate stays bounded).

The Custom choice is the escape hatch for keys axess does not know about: the OAuth client id, a custom request header, the authenticated session's tenant slug. The application provides the extraction function; the layer uses it to derive the key.

Per-endpoint rate limits

Different endpoints have different sensitivities. A login endpoint can tolerate a few requests per second per IP because real users do not log in fast; a search endpoint accepts hundreds per second because real users browse. The configuration shape is typically per-endpoint:

let auth_routes = Router::new()
    .route("/login", post(login))
    .route("/signup", post(signup))
    .route("/reset-password", post(reset_password))
    .layer(RateLimitLayer::new(
        RateLimitConfig::builder()
            .max_requests(10)
            .window(Duration::from_secs(60))
            .key(KeyExtractor::PeerIp)
            .build(),
    ));

let api_routes = Router::new()
    .route("/data", get(get_data))
    .layer(RateLimitLayer::new(
        RateLimitConfig::builder()
            .max_requests(300)
            .window(Duration::from_secs(60))
            .key(KeyExtractor::SessionId)
            .build(),
    ));

let app = Router::new()
    .merge(auth_routes)
    .merge(api_routes)
    .layer(session_layer);

The pattern is to layer the rate limit on the specific routes it applies to, with the most restrictive limits on the most sensitive endpoints. A login endpoint with a tight per-IP limit is the canonical case; a token-refresh endpoint with a per-session limit is the second canonical case.

The trusted-proxy configuration covered in Cookies, fingerprinting, hijack detection applies to the PeerIp extractor here as well. Read the IP from the forwarded header only when the immediate peer is a trusted proxy; otherwise the rate limiter can be spoofed.

Tuning the windows

Tuning the rate limit is more art than science, but a few guidelines hold up.

For login endpoints: 10 requests per minute per IP is the conservative starting point. Real users log in at most a few times a day from any one IP. Credential-stuffing attacks need hundreds per minute to be efficient; 10 is well below that. Tune up only if the warn rate is too high on legitimate traffic (many users behind a corporate NAT, for instance).

For signup endpoints: 5 requests per minute per IP. Signup is even less frequent for legitimate users than login; account enumeration is best stopped tight.

For password reset: 3 requests per hour per IP. A reset is a once-in-a-while operation. Attackers spam reset to exhaust the victim's inbox; the tight limit is the defence.

For token refresh: matched to the session TTL. A session that refreshes every hour should have a rate limit of a few refreshes per hour per session id; an attacker who steals a session cannot extract value through rapid refresh.

For data endpoints: matched to the application's expected use pattern. An API for human-driven dashboards sees a few requests per minute per session; an API for programmatic clients sees hundreds per second per workload. The pattern is deployment-specific.

The default to start with is to measure first. The metrics from AuthnMetrics::rate_limit_rejected (covered below) tell you the real reject rate; the calibration is then to set the limit just above the legitimate-traffic envelope.

What happens at the limit

A request that hits the rate limit gets:

A 429 status code. The standard HTTP response for "Too Many Requests."

A Retry-After header. The value is the number of seconds the client should wait before retrying. The header is read by browsers and well-behaved clients; attackers ignore it.

A short JSON body explaining the limit. The body is generic ("rate limit exceeded") rather than specific (no "you have 0 of 10 requests remaining"); the latter leaks the limit configuration, which lets an attacker calibrate their attack to just under the limit.

The application's metrics record the rejection. The AuthnMetrics::rate_limit_rejected method is the metric; applications wire it to their Prometheus or OpenTelemetry counter.

Distinguishing attack from misconfiguration

A high rate of 429s is operationally interesting. The cause is either an attack (real attacker getting throttled) or a misconfiguration (legitimate traffic hitting a limit that was set too low).

The signals that distinguish them:

A rate of 429s heavily concentrated on a small set of source IPs, with the IPs not matching legitimate user patterns (datacenter IPs, VPN exit nodes, residential ASNs from countries the application does not typically serve) suggests attack.

A rate of 429s spread across many IPs, matching legitimate user patterns (residential ASNs from served countries, mixed mobile and home connections), suggests misconfiguration.

The audit events the rate limiter produces (a RateLimitRejected event per drop) carry the source IP, the endpoint, and the timestamp; SIEM queries against these distinguish the patterns quickly.

Per-tenant rate limits

For multi-tenant deployments, the rate limit configuration can be per-tenant. A tenant with a higher SLA gets a higher rate limit; a tenant with a lower SLA gets a tighter one. The mechanism is the same RateLimitLayer, with a Custom key extractor that composes the standard key (typically PeerIp) with the tenant id, and with separate RateLimitConfigs per tenant tier.

The pattern is operationally complex (one configuration per tenant tier), so most deployments use a single shared limit and calibrate to the deployment-wide envelope. The per-tenant shape is for deployments where the SLA differences are explicit and the operational overhead is justified.

Metrics

The layer emits two metrics through the AuthnMetrics trait:

rate_limit_rejected is incremented on each 429. The metric is the primary signal for tuning and for attack detection.

rate_limit_evaluated (optional, off by default) is incremented on every request the layer sees, regardless of outcome. The ratio of rejected to evaluated is the reject rate; below 0.1% typically means the limit is set well, above 1% suggests either attack or misconfiguration.

The AuthnMetrics implementation is the application's; it typically routes to Prometheus, OpenTelemetry, or whatever metrics system the deployment uses. The examples/sqlite/ reference application shows a simple AtomicU64-based implementation suitable for adapting to a real metrics system.

Composing with the lockout policy

The rate limiter and the lockout policy are different defences that compose. The rate limiter catches volume; the lockout policy catches credential pattern. Both fire on attacks, in different shapes.

The pattern that emerges: the rate limiter is the first line of defence against credential stuffing. It drops the attack to a trickle before any individual user's lockout policy can fire. The lockout policy then catches the few attempts that get through, marking the targeted user accounts as locked.

A deployment that has rate limiting but no lockout policy is vulnerable to slow attacks that stay below the rate limit. A deployment that has lockout but no rate limiting is vulnerable to high-volume attacks that distribute across many users. Both together cover both attack shapes.

What this enables

The rate limiter is the operational layer that sits between "the request was sent" and "the authentication logic runs." A deployment without it is vulnerable to a class of attack that the authentication logic alone cannot prevent; a deployment with it has the broader defence against volume-based attacks that complements the credential-pattern defence of lockout.

Axess