Rate Limiting in ASP.NET Core

Introduction

Rate limiting is sometimes known as throttling, even though they're not quite the same - rate limiting is more about fairness, making sure everyone gets a fair share (number of requests), whereas throttling is about limiting, or cutting down, the number of requests. The two are commonly used interchangeably.

Why do we need this? There are many reasons:

  • To prevent attacks (DoS, DDoS)
  • To save resource usage
  • To limit accesses using some strategy

Rate limiting consists of two major components:

  • A limiting strategy, which specifies how requests are going to be limited
  • A partitioning strategy, which specifies the request groups to which we will be applying the limiting strategy

Like with API versioning, caching, API routing, etc, this is something that is usually implemented at the cloud level, at the API gateway, both Azure, AWS, and Google supply this services, but there may be good reasons to implement it ourselves, in our app, or a reverse proxy. Fortunately ASP.NET Core includes a good library for this! It applies to all incoming requests, either MVC, Razor Pages, or APIs.

Let's start by the limiting strategies available.

Limiting Strategies

The limiting strategy/algorithm of choice will determine how the limiting will be applied. Generally, it consists of a number of requests per time slot, all of which can be configured. Worth noting that the count is stored in memory, but there are plans to support distributed storage using Redis, which will be nice for clusters. The ASP.NET Core rate limiting middleware includes four strategies:

Fixed Window

The fixed window limiter uses a fixed time window to limit requests. When the time window expires, a new time window starts and the request limit is reset.

Read more here: https://learn.microsoft.com/en-us/aspnet/core/performance/rate-limit?#fixed

Sliding Window

The sliding window algorithm is similar to the fixed window limiter but adds segments per window. The window slides one segment each segment interval. The segment interval is (window time)/(segments per window). Limits the requests for a window to X requests and each time window is divided in N segments per window. Requests taken from the expired time segment one window back (N segments prior to the current segment) are added to the current segment. The most expired time segment one window back is the expired segment.

Official documentation: https://learn.microsoft.com/en-us/aspnet/core/performance/rate-limit?#slide

Token Bucket

This is similar to the sliding window limiter, but rather than adding back the requests taken from the expired segment, a fixed number of tokens are added each replenishment period. The tokens added each segment can't increase the available tokens to a number higher than the token bucket limit.

Info: https://learn.microsoft.com/en-us/aspnet/core/performance/rate-limit?#token

Concurrency

The concurrency limiter limits the number of concurrent requests, which is a bit different than the previous ones. Each request reduces the concurrency limit by one, when a request completes, the limit is instead increased by one. Unlike the other requests limiters that limit the total number of requests for a specified period, the concurrency limiter limits only the number of concurrent requests and doesn't care for a time period.

Here is the doc: https://learn.microsoft.com/en-us/aspnet/core/performance/rate-limit?#concur

Chained Limiters

You can combine multiple algorithms together, the framework will consider them as one, and apply all the algorithms contained in sequence. More on this later on.

Rate Partitioner

A rate limiter, such as ConcurrencyLimiter, FixedWindowRateLimiterSlidingWindowRateLimiter, and TokenBucketRateLimiter, is just an instance of a class that derives from RateLimiter. All of the built-in rate limiters have helper methods to easen their registration.

Applying Rate Limitations

It all starts with the AddRateLimiter() method, to register the services in the Dependency Injection (DI) framework:

builder.Services.AddRateLimiter(options => 
{
//options go here })

It must be followed by UseRateLimiter(), to actually add the middleware to the pipeline, and this must go before other middleware, such as controllers:

app.UseRateLimiter();

Let's now see how can we configure the limitations.

Policy-Based

For the fixed window, as an example, it should be something like this:

builder.Services.AddRateLimiter(_ => _ .AddFixedWindowLimiter(
policyName:
"Limit10PerMinute", options =>
{
options.PermitLimit = 10
;
options.Window = TimeSpan.FromMinutes(
1);
options.QueueProcessingOrder = QueueProcessingOrder.OldestFirst;
options.QueueLimit =
2;
})
);

I won't go into all the details of each algorithm, I ask you to have a look at the documentation, but this example uses the fixed window algorithm (method AddFixedWindowLimiter, but you can use any other algorithm using AddSlidingWindowLimiter, AddTokenBucketLimiter, or AddConcurrencyLimiter), limits requests to 10 (PermitLimit) per minute (Window), where the oldest in the queue is allowed first (QueueProcessingOrder), and there's a queue limit (QueueLimit) of just 2, the rest is discarded. The policy we are creating is called "10PerMinute" (policyName), the name should always be anything meaningful, we'll see how to use it in a moment. We can, of course, have multiple policies:

builder.Services.AddRateLimiter(options =>
{
    options.AddFixedWindowLimiter("API", options =>
    {
        options.PermitLimit = 20;
        options.Window = TimeSpan.FromMinutes(1);
    });

    options.AddFixedWindowLimiter("Web", options =>
    {
        options.PermitLimit = 10;
        options.Window = TimeSpan.FromMinutes(1);
    });

options.AddPolicy("Limit3PerIP", ctx =>
    {                    
    var clientIpAddress = ctx.GetRemoteIpAddress()!;

        return RateLimitPartition.GetConcurrencyLimiter(clientIpAddress, _ =>
    new ConcurrencyLimiterOptions
            {
             PermitLimit = 3
            });
    });

options.AddPolicy("NoLimit", ctx =>
{
return RateLimitPartition.GetNoLimiter("");
});
});

If we want to configure how the request is rejected we can use the OnRejected parameter, like in this example:

builder.Services.AddRateLimiter(_ =>
{
_ .AddFixedWindowLimiter(
policyName:
"Limit10PerMinute", options =>
{
//...
});

_.RejectionStatusCode = StatusCodes.Status429TooManyRequests;

_.OnRejected = async (ctx, cancellationToken) =>
{
await ctx.HttpContext.Response.WriteAsync("Request slots exceeded, try again later", cancellationToken);
};
});

We are both setting the status code of the response (RejectionStatusCode) to HTTP 429 - Too Many Requests (it actually is the default), and supplying a custom response message.

It could also be a redirect to some page:

ctx.HttpContext.Response.Redirect("https://some.site/unavailable.html", permanent: false);

Global

Having a named policy means that the rate limiting may or may not be used, depending on whether or not we actually use the policy; the alternative is to have a global rate limiter, where we don't specify a policy name, and it will apply everywhere, including MVC and minimal APIs:

builder.Services.AddRateLimiter(options =>
{
options.GlobalLimiter = PartitionedRateLimiter.Create<HttpContext, string>(ctx =>
    {
var partitionKey = ""; //<--- more on this in a moment

return RateLimitPartition.GetFixedWindowLimiter(partitionKey: partitionKey, _ => new FixedWindowRateLimiterOptions
{
PermitLimit = 10,
Window = TimeSpan.FromMinutes(1)
});
});
});

Any of the options, like setting the status code or the response message/redirect, can be supplied as well for global limiters.

Chained

A chained partition limiter acts as one, and each individual limiters will be applied sequentially as they are registered. A chained limiter is created using PartitionedRateLimiter.CreateChained, here, for a global limiter only:

builder.Services.AddRateLimiter(options =>
{
    options.GlobalLimiter = PartitionedRateLimiter.CreateChained(
        PartitionedRateLimiter.Create<HttpContext, string>(ctx =>
            RateLimitPartition.GetConcurrencyLimiter(ctx.GetRemoteIpAddress()!, partition =>
                new ConcurrencyLimiterOptions
                {
                    PermitLimit = 3
                })
        ),
        PartitionedRateLimiter.Create<HttpContext, string>(ctx =>
            RateLimitPartition.GetFixedWindowLimiter(ctx.GetRemoteIpAddress()!, partition =>
                new FixedWindowRateLimiterOptions
                {
                    PermitLimit = 6000,
                    Window = TimeSpan.FromHours(1)
                })
        ));
});

Let's now see how we can partition the requests.

Partioning Strategies

The idea behind the partitioning is: the limitation will be applied, and individual requests counted, for each of the partitions ("buckets"). For example, the fixed window count will be per partition, one may have reached the limit while others may not. The partition key is the partitionKey parameter and it is up to us to provide it somehow.

Global

If we wish to apply the same limit to all, we just need to set the partitionKey parameter to the same value, which can even be an empty string. For example:

options.GlobalLimiter = PartitionedRateLimiter.Create<HttpContext, string>(ctx =>
    {
var partitionKey = ""; //global partition, applies to all requests
return RateLimitPartition.GetFixedWindowLimiter(partitionKey: partitionKey, _ => new FixedWindowRateLimiterOptions
{
PermitLimit = 10,
Window = TimeSpan.FromMinutes(1)
});
});

Health Status

One possible option is to limit the number of accesses (concurrent or per time period) if the health status of the system is degraded or unhealth. Ths idea is, if your system is misbehaving, you might want to throttle requests to it. If you've seen my recent post, you know how to check for this. Here's one way to implement this limitation: we say that if the health is degraded or unhealthy (HealthStatus), we apply a certain limiter, by setting a common partition key, otherwise, we just return a GUID. As it is, returning a GUID mean essentially disabling the rate limiter:

var healthChecker = ctx.RequestServices.GetRequiredService<HealthCheckService>();
var healthStatus = healthChecker.CheckHealthAsync().ConfigureAwait(false).GetAwaiter().GetResult();
var partitionKey = (healthStatus.Status == HealthStatus.Healthy) ? Guid.NewGuid().ToString() : healthStatus.ToString();

The HealthCheckService.CheckHealthAsync() call is asynchronous, so we must make it synchronous, because we are in a synchronous context.

Authenticated vs Anonymous Users

Another option would be: we want to offer the same (high) quality of service to authenticated users, while limiting anonymous ones. We could do it as this:

var partitionKey = ctx.User.Identity.IsAuthenticated ? Guid.NewGuid().ToString() : "Anonymous";

So, if the user is authenticated, we just return a GUID, which essentially means that the limits "bucket" will be incremented each time for a different GUID, which is the same to say that the limits will never be reached. For anonymous users, the same "bucket" will be incremented, which will make it reach the limit faster.

Per Authenticated User

A related option is to use a limit "bucket" for each authenticated user, and the same for anonymous ones:

var partitionKey = ctx.User.Identity.IsAuthenticated ? ctx.User.Identity.Name : "Anonymous";

Per Header

And if we want to use some custom header, we can as well:

var partitionKey = ctx.Request.Headers["X-ClientId"].ToString();

Per Tenant

If we are running a multi-tenant app, we can use as the limits "bucket" the tenant name. Let's suppose we have a ITenantIdentificationService, such as the one I presented on my SharedFlat project, which looks like this:

public interface ITenantIdentificationService
{
string GetCurrentTenant(HttpContext context);
}

If this service is registered in DI, we can get the tenant name as this:

var partitionKey = ctx.RequestServices.GetRequiredService<ITenantIdentificationService>().GetCurrentTenant(ctx);

In any case, you just need to have a way to get the current tenant from the HttpContext, and use it as the partition key.

Per Country

Another option, which I introduce here more for fun: we apply limits per requesting country! We can use the service introduced in my previous post, IGeoIpService. We get the geo location from the requesting client's IP address, and then use the country code as the partition key:

var partitionKey = ctx.RequestServices.GetRequiredService<IGeoIpService>().GetInfo(ctx.GetRemoteIpAddress()).ConfigureAwait(false).GetAwaiter().GetResult().CountryCode;

Of course, this call is asynchronous, so we must make it synchronous first. Not ideal, but nothing we can do about it.

Per Source IP

Yet another option is to use the remote client's IP address:

var partitionKey = ctx.GetRemoteIpAddress();

Mind you, the GetRemoteIpAddress() extension method is the same that was introduced here.

Per IP Range

And what if we want to limit just some IP range? We can do it as well:

var ipAddress = ctx.GetRemoteIpAddress();
var startIpAddress = ...; //get a start and end IP addresses somehow
var endIpAddress = ...;

if (IPAddress.Parse(ipAddress!).IsInRange(startIpAddress, endIpAddress))
{
partitionKey = $"{_startIpAddress}-{_endIpAddress}";
}
else
{
partitionKey = Guid.NewGuid().ToString(); //the client IP is out of the limited range, which means that we don't want it to be limited
}

The IsInRange extension method is:

public static class IPAddressExtensions
{
public static bool IsInRange(this IPAddress ipAddress, IPAddress startIpAddress, IPAddress endIpAddress)
{
ArgumentNullException.ThrowIfNull(ipAddress, nameof(ipAddress));
ArgumentNullException.ThrowIfNull(startIpAddress, nameof(startIpAddress));
ArgumentNullException.ThrowIfNull(endIpAddress, nameof(endIpAddress));
ArgumentOutOfRangeException.ThrowIfNotEqual((int)ipAddress.AddressFamily, (int)startIpAddress.AddressFamily, nameof(startIpAddress));
ArgumentOutOfRangeException.ThrowIfNotEqual((int)ipAddress.AddressFamily, (int)endIpAddress.AddressFamily, nameof(endIpAddress));

long ipStart = 0;
long ipEnd = 0;
long ip = 0;

if (ipAddress.AddressFamily == AddressFamily.InterNetwork)
{
ipStart = BitConverter.ToInt32(startIpAddress.GetAddressBytes().Reverse().ToArray(), 0);
ipEnd = BitConverter.ToInt32(endIpAddress.GetAddressBytes().Reverse().ToArray(), 0);
ip = BitConverter.ToInt32(ipAddress.GetAddressBytes().Reverse().ToArray(), 0);
}
else if (ipAddress.AddressFamily == AddressFamily.InterNetworkV6)
{
ipStart = BitConverter.ToInt64(startIpAddress.GetAddressBytes().Reverse().ToArray(), 0);
ipEnd = BitConverter.ToInt64(endIpAddress.GetAddressBytes().Reverse().ToArray(), 0);
ip = BitConverter.ToInt64(ipAddress.GetAddressBytes().Reverse().ToArray(), 0);
}
else
{
throw new ArgumentException($"AddressFamily {ipAddress.AddressFamily} not supported.", nameof(ipAddress));
}

return (ip >= ipStart) && (ip <= ipEnd);
}
}

Essentially, it turns each IPAddress into a number and checks to see if the remote client's IP address is contained.

Per Domain

Similar to the previous one, if we want to limit per DNS domain name:

var ipAddress = ctx.GetRemoteIpAddress();
var address = IPAddress.Parse(ipAddress!);
var entry = Dns.GetHostEntry(address);
var partitionKey = "";

if (entry != null)
{
partitionKey = string.Join(".", entry.HostName.Split('.').Skip(1)); //just the domain name, no host
}
else
{
partitionKey = ipAddress.ToString(); //no DNS domain name registered, so we have to use the source IP
}

Applicability

Now, how to actually apply the limitations.

Global

We just saw how to configure a rate limit globally. If nothing else is said, global limiters apply to all endpoints, MVC controllers, Razor Pages, and minimal APIs.

Policy-Based

In order to apply a specific policy to an endpoint, we apply the [EnableRateLimiting] attribute. For MVC controllers, here's how we do it:

[EnableRateLimiting("Limit10PerMinute")]
public IActionResult Get()
{
//
}

It can also be applied to the whole constructor, and to individual action methods:

[EnableRateLimiting("Limit10PerMinute")]
public class HomeController : Controller
{
[EnableRateLimiting("Web")]
public IActionResult Get()
{
//
}
}

Or globally:

app.MapDefaultControllerRoute().RequireRateLimiting("Limit10PerMinute");

For Razor Pages, we need to apply the attribute to the page model class:

[EnableRateLimiting("Limit10PerMinute")]
public class Index2Model : PageModel
{
}

Mind you, we cannot restrict a single method, like OnGet or OnPost, but we can apply globally:

app.MapRazorPages().RequireRateLimiting("Limit10PerMinute");

For minimal APIs, we add the restriction next to the route declaration:

Snippet

app.MapGet("/Home", () =>
{
    //
}).RequireRateLimiting("Limit10PerMinute");

And, if we want to exclude some particular action method, we apply a [DisableRateLimiting] attribute:

[DisableRateLimiting]
public IActionResult Get()
{
//
}

And for minimal API endpoints:

app.MapGet("/Home", () =>
{
    //
}).DisableRateLimiting();

Conclusion

Rate limiting is a powerful tool for you to use on your services, usually when your cloud provider-provided service is not enough, or when you are hosting on-premises.

Hope you enjoyed this, stay tuned for more!

                             

No Comments

Add a Comment

As it will appear on the website

Not displayed

Your website