Rate Limiting Middleware in ASP.NET Core – Built-In, Practical, and Production-Ready

Every API you build will eventually face one uncomfortable reality: not all traffic is legitimate, and even legitimate traffic can become a problem at scale.

A single client hammering your endpoint thousands of times per minute. An automated script scraping your data. A burst of traffic after a product launch that your server was never sized to handle. These are not edge cases — they are certainties in production.

Rate limiting is the mechanism that puts you back in control.

ASP.NET Core ships with a built-in rate limiting middleware that requires zero third-party dependencies and integrates cleanly into your existing pipeline. This article walks you through everything — from the fundamentals to advanced partitioned limiters — so you can make the right decisions at every level of your stack.

Whether you are building your first API or hardening a production system, this guide has something for you.

What Is Rate Limiting and Why It Matters

Rate limiting is a technique that controls how many requests a client can make to your API within a defined period of time. When a client exceeds that limit, the server rejects or queues the excess requests rather than processing them.

It sounds simple. The implications, however, are significant.

Preventing Abuse — Public APIs are a target. Without rate limiting, a single bad actor can exhaust your server’s resources and degrade the experience for every legitimate user. A per-client request limit stops that before it becomes a crisis.

Ensuring Fair Usage — In a multi-tenant system, one high-volume tenant should not be able to monopolize shared infrastructure. Rate limiting enforces fairness across all consumers of your API.

Protecting Backend Resources — Database connections, compute cycles, memory — these are finite. Rate limiting gives you a hard ceiling on how much work your backend is asked to do in a given window.

Mitigating DoS Attacks — Denial of Service attacks work by flooding your server with requests until it stops responding. Rate limiting raises the cost of that attack significantly by cutting off the flood at the middleware layer.

Cost Management — If you run on cloud infrastructure where you pay per request or per compute unit, uncontrolled traffic translates directly into an uncontrolled bill. Rate limiting gives you predictable costs.

Improving Overall Performance — Counterintuitively, limiting requests improves responsiveness for legitimate users. When your server is not fighting to handle every possible request, the ones that do get through are processed faster.

A note on DDoS — Rate limiting mitigates Denial of Service attacks, but it is not a replacement for a dedicated DDoS protection solution. Distributed attacks originate from thousands of IP addresses simultaneously, which a single middleware layer cannot absorb alone. For production systems, pair rate limiting with a WAF or CDN-level protection such as Azure Web Application Firewall, AWS Shield, or Cloudflare.

How Rate Limiting Worked Before ASP.NET Core 7

Before the built-in middleware arrived, teams had three main ways to handle rate limiting, each with real drawbacks.

Custom Middleware

The most common approach was writing your own middleware from scratch. You would track request counts in memory or in a distributed cache like Redis, compare against a threshold, and return a 429 response when exceeded.

C#
public selaed class CustomRateLimitingMiddleware
{
    private readonly RequestDelegate _next;
    private static readonly Dictionary<string, (int Count, DateTime WindowStart)> _requestCounts = new();
    private const int MaxRequests = 10;
    private static readonly TimeSpan Window = TimeSpan.FromMinutes(1);

    public CustomRateLimitingMiddleware(RequestDelegate next)
    {
        _next = next;
    }

    public async Task InvokeAsync(HttpContext context)
    {
        var clientIp = context.Connection.RemoteIpAddress?.ToString() ?? "unknown";
        var now = DateTime.UtcNow;

        if (_requestCounts.TryGetValue(clientIp, out var entry))
        {
            if (now - entry.WindowStart > Window)
            {
                _requestCounts[clientIp] = (1, now);
            }
            else if (entry.Count >= MaxRequests)
            {
                context.Response.StatusCode = StatusCodes.Status429TooManyRequests;
                await context.Response.WriteAsync("Too many requests.");
                return;
            }
            else
            {
                _requestCounts[clientIp] = (entry.Count + 1, entry.WindowStart);
            }
        }
        else
        {
            _requestCounts[clientIp] = (1, now);
        }

        await _next(context);
    }
}

This works in a demo. In production, it has race conditions, no distributed state support, no queue management, and no standardized error responses. It also has to be maintained.

Third-Party Libraries (AspNetCoreRateLimit)

The AspNetCoreRateLimit NuGet package became the standard answer for years. It offered IP-based and client-based limiting with configuration stored in appsettings.json.

But it came with real costs. It added an external dependency that had to be versioned, updated, and maintained alongside your app. The configuration was verbose and spread across multiple service registrations. And it was not always aligned with the ASP.NET Core release cycle, sometimes leaving teams stuck on older versions.

For teams that needed more than basic in-memory tracking, a distributed cache like Redis was required on top — adding even more infrastructure overhead.

It solved the problem. It just solved it in a way that required more effort than the problem deserved.

Action Filters

Some teams implemented rate limiting as an IActionFilter or IAsyncActionFilter in the MVC pipeline. This gave controller-level control but did not work cleanly for Minimal APIs and required wiring up custom attributes.

All three approaches had the same fundamental problem: they were workarounds for something the framework should have provided natively.

ASP.NET Core 7 finally provided it.

What Changed in ASP.NET Core 7

Starting with ASP.NET Core 7, the Microsoft.AspNetCore.RateLimiting namespace ships as part of the framework itself. No additional NuGet packages are required.

The middleware provides four built-in algorithms, supports both global and named policies, integrates directly with the endpoint routing system, and handles rejection responses with structured 429 status codes.

Two things you configure in Program.cs enable everything:

C#
// 1. Register rate limiting services
builder.Services.AddRateLimiter(options =>
{
    options.RejectionStatusCode = StatusCodes.Status429TooManyRequests;
    // policies configured here
});

// 2. Enable the middleware
app.UseRateLimiter();

Middleware ordering matters. When using endpoint-specific rate limiting, call UseRateLimiter() after UseRouting(). When using only global limiters, the order is flexible.

Rate Limiter Algorithms Explained

The framework ships four distinct algorithms. Understanding the difference between them is what allows you to pick the right tool for each use case.

Fixed Window Limiter

The fixed window algorithm divides time into fixed-size windows. Each window has a request limit. When the window expires, the counter resets and a new window begins.

When to use it: Simple public APIs, login endpoints, or any scenario where you want a clean, predictable reset cycle.

The trade-off: A burst of requests at the end of one window and the start of the next can exceed your intended limit by up to 2x, since both windows reset independently.

C#
builder.Services.AddRateLimiter(options =>
{
    options.AddFixedWindowLimiter(policyName: "fixed", limiterOptions =>
    {
        limiterOptions.PermitLimit = 10;
        limiterOptions.Window = TimeSpan.FromSeconds(10);
        limiterOptions.QueueProcessingOrder = QueueProcessingOrder.OldestFirst;
        limiterOptions.QueueLimit = 2;
    });
});

In this configuration, a maximum of 10 requests are allowed per 10-second window. Up to 2 additional requests can be queued when the limit is hit — they will be processed as permits become available in the next window.

Sliding Window Limiter

The sliding window algorithm addresses the burst problem of the fixed window by dividing each window into segments. As the window slides forward one segment at a time, requests from the expired segment are recycled back into the available count.

When to use it: APIs where you want smoother enforcement over time with no burst vulnerability at window boundaries.

The trade-off: Slightly more complex than fixed window but significantly more accurate in preventing traffic spikes.

C#
builder.Services.AddRateLimiter(options =>
{
    options.AddSlidingWindowLimiter(policyName: "sliding", limiterOptions =>
    {
        limiterOptions.PermitLimit = 10;
        limiterOptions.Window = TimeSpan.FromSeconds(30);
        limiterOptions.SegmentsPerWindow = 3;
        limiterOptions.QueueProcessingOrder = QueueProcessingOrder.OldestFirst;
        limiterOptions.QueueLimit = 2;
    });
});

With SegmentsPerWindow = 3 and a 30-second window, each segment is 10 seconds. Requests from the segment that is 30 seconds old are added back to the current segment’s available capacity.

Token Bucket Limiter

The token bucket algorithm maintains a bucket of tokens. Each request consumes one token. Tokens are replenished at a fixed rate every replenishment period, up to a maximum limit.

When to use it: APIs that need to allow short bursts of traffic while still enforcing a long-term average rate. Think file uploads, report generation, or any endpoint with variable but expensive processing.

The trade-off: The burst capacity makes it feel more permissive than fixed window, which is often exactly the right user experience.

C#
builder.Services.AddRateLimiter(options =>
{
    options.AddTokenBucketLimiter(policyName: "token", limiterOptions =>
    {
        limiterOptions.TokenLimit = 100;
        limiterOptions.ReplenishmentPeriod = TimeSpan.FromSeconds(10);
        limiterOptions.TokensPerPeriod = 20;
        limiterOptions.QueueProcessingOrder = QueueProcessingOrder.OldestFirst;
        limiterOptions.QueueLimit = 5;
        limiterOptions.AutoReplenishment = true;
    });
});

With AutoReplenishment = true, an internal timer adds 20 tokens every 10 seconds automatically. The bucket never holds more than 100 tokens. If you need manual control over replenishment — for example, in tests or custom scheduling — set AutoReplenishment = false and call TryReplenish() yourself.

Concurrency Limiter

The concurrency limiter is the simplest algorithm. It limits the number of requests being processed simultaneously, with no concept of time windows or token replenishment.

When to use it: Endpoints that are expensive to run concurrently — database-heavy queries, file processing, third-party API calls — where you want to cap the number of simultaneous executions rather than the request rate.

The trade-off: A fast endpoint can handle far more requests per second under a concurrency limiter than a fixed window limiter would allow.

C#
builder.Services.AddRateLimiter(options =>
{
    options.AddConcurrencyLimiter(policyName: "concurrency", limiterOptions =>
    {
        limiterOptions.PermitLimit = 10;
        limiterOptions.QueueProcessingOrder = QueueProcessingOrder.OldestFirst;
        limiterOptions.QueueLimit = 5;
    });
});

No more than 10 requests will execute simultaneously. The 11th request either queues (up to the queue limit) or receives a 429 response.

Applying Rate Limiting to Endpoints

Once you have defined your policies, you need to attach them to your endpoints. The approach differs slightly between Minimal APIs and MVC controllers.

Minimal APIs

Use .RequireRateLimiting() directly on the endpoint definition:

C#
app.MapGet("/products", async (IProductRepository repo) =>
{
    var products = await repo.GetAllAsync();
    return Results.Ok(products);
})
.RequireRateLimiting("fixed");

app.MapPost("/orders", async (CreateOrderRequest request, IOrderService orderService) =>
{
    var result = await orderService.PlaceOrderAsync(request);
    return Results.Created($"/orders/{result.Id}", result);
})
.RequireRateLimiting("token");

You can also apply a policy to a group of endpoints:

C#
var apiGroup = app.MapGroup("/api")
    .RequireRateLimiting("fixed");

apiGroup.MapGet("/users", GetUsers);
apiGroup.MapGet("/products", GetProducts);
apiGroup.MapPost("/orders", CreateOrder);
MVC Controllers

Use the [EnableRateLimiting] and [DisableRateLimiting] attributes:

C#
[EnableRateLimiting("fixed")]
public sealed class OrdersController(IOrderService orderService) : ControllerBase
{
    // Inherits the "fixed" policy from the controller
    [HttpGet]
    public async Task<IActionResult> GetOrders()
    {
        var orders = await orderService.GetAllAsync();
        return Ok(orders);
    }

    // Overrides with a more permissive "sliding" policy
    [HttpGet("{id}")]
    [EnableRateLimiting("sliding")]
    public async Task<IActionResult> GetOrder(int id)
    {
        var order = await orderService.GetByIdAsync(id);
        return order is null ? NotFound() : Ok(order);
    }

    // Exempt from rate limiting entirely
    [HttpGet("health")]
    [DisableRateLimiting]
    public IActionResult Health() => Ok(new { Status = "Healthy" });
}

The rule: [EnableRateLimiting] on an action method overrides the controller-level policy. [DisableRateLimiting] removes all rate limiting from that action, regardless of what is applied at the controller level or globally.

Advanced: Rate Limiting with Partitions

Named policies and global limiters are a great starting point. But real-world applications rarely treat all clients the same — and your rate limiting strategy should not either. Partitioned rate limiting lets you move from a single shared counter to independent counters per client, giving you precise control over how traffic is managed across your entire user base.

Why Partitioning Matters

A single global counter is a blunt instrument. It treats a legitimate power user the same as a bot hammering your API. The moment one client hits the limit, every other client pays the price too.

Partitioning solves this by giving each client its own independent counter. A burst from one user has zero impact on another. A misbehaving client gets throttled without affecting the rest of your traffic.

This also opens the door to tiered service models — authenticated users can have higher limits than anonymous ones, premium API key holders can get more headroom than free tier users, and internal services can be excluded entirely. All of this is expressed in a single middleware configuration, with no external dependencies and no custom tracking logic to maintain.

By IP Address

The simplest partition strategy. Each unique IP address gets its own counter:

C#
builder.Services.AddRateLimiter(options =>
{
    options.GlobalLimiter = PartitionedRateLimiter.Create<HttpContext, string>(httpContext =>
        RateLimitPartition.GetFixedWindowLimiter(
            partitionKey: httpContext.Connection.RemoteIpAddress?.ToString() ?? "unknown",
            factory: _ => new FixedWindowRateLimiterOptions
            {
                PermitLimit = 50,
                Window = TimeSpan.FromMinutes(1),
                AutoReplenishment = true
            }));
});

Production warning: IP-based partitioning is vulnerable to IP spoofing attacks. If your app sits behind a reverse proxy or load balancer, make sure you are reading the correct client IP from X-Forwarded-For headers rather than RemoteIpAddress, which will be the proxy’s address.

By Authenticated User

For authenticated APIs, partitioning by user identity is far more meaningful than IP address. One user’s burst does not affect another’s:

C#
builder.Services.AddRateLimiter(options =>
{
    options.GlobalLimiter = PartitionedRateLimiter.Create<HttpContext, string>(httpContext =>
    {
        var userId = httpContext.User.Identity?.Name ?? "anonymous";

        return RateLimitPartition.GetFixedWindowLimiter(
            partitionKey: userId,
            factory: _ => new FixedWindowRateLimiterOptions
            {
                PermitLimit = httpContext.User.IsInRole("Premium") ? 1000 : 100,
                Window = TimeSpan.FromMinutes(1),
                AutoReplenishment = true
            });
    });
});

This example also demonstrates tiered rate limiting — premium users get a higher limit than standard users, all from the same middleware configuration.

By API Key

For API-key-authenticated services, partition on the key itself:

C#
builder.Services.AddRateLimiter(options =>
{
    options.GlobalLimiter = PartitionedRateLimiter.Create<HttpContext, string>(httpContext =>
    {
        var apiKey = httpContext.Request.Headers["X-API-Key"].FirstOrDefault() ?? "anonymous";

        return apiKey switch
        {
            var key when key.StartsWith("premium-") => RateLimitPartition.GetFixedWindowLimiter(
                partitionKey: key,
                factory: _ => new FixedWindowRateLimiterOptions
                {
                    PermitLimit = 5000,
                    Window = TimeSpan.FromMinutes(1),
                    AutoReplenishment = true
                }),

            _ => RateLimitPartition.GetFixedWindowLimiter(
                partitionKey: apiKey,
                factory: _ => new FixedWindowRateLimiterOptions
                {
                    PermitLimit = 100,
                    Window = TimeSpan.FromMinutes(1),
                    AutoReplenishment = true
                })
        };
    });
});
Chained Limiters

Sometimes you need multiple rate limits on the same traffic — for example, a per-second burst limit combined with a per-minute sustained limit. PartitionedRateLimiter.CreateChained() combines multiple limiters into one. A request must pass all limiters in the chain to proceed:

C#
builder.Services.AddRateLimiter(options =>
{
    options.OnRejected = async (context, cancellationToken) =>
    {
        context.HttpContext.Response.StatusCode = StatusCodes.Status429TooManyRequests;

        if (context.Lease.TryGetMetadata(MetadataName.RetryAfter, out var retryAfter))
        {
            context.HttpContext.Response.Headers.RetryAfter =
                ((int)retryAfter.TotalSeconds).ToString();
        }

        await context.HttpContext.Response.WriteAsync(
            "Too many requests. Please try again later.", cancellationToken);
    };

    options.GlobalLimiter = PartitionedRateLimiter.CreateChained(
        // Burst limit: max 5 requests per 2 seconds
        PartitionedRateLimiter.Create<HttpContext, string>(httpContext =>
            RateLimitPartition.GetFixedWindowLimiter(
                partitionKey: httpContext.Connection.RemoteIpAddress?.ToString() ?? "unknown",
                factory: _ => new FixedWindowRateLimiterOptions
                {
                    AutoReplenishment = true,
                    PermitLimit = 5,
                    Window = TimeSpan.FromSeconds(2)
                })),

        // Sustained limit: max 100 requests per minute
        PartitionedRateLimiter.Create<HttpContext, string>(httpContext =>
            RateLimitPartition.GetFixedWindowLimiter(
                partitionKey: httpContext.Connection.RemoteIpAddress?.ToString() ?? "unknown",
                factory: _ => new FixedWindowRateLimiterOptions
                {
                    AutoReplenishment = true,
                    PermitLimit = 100,
                    Window = TimeSpan.FromMinutes(1)
                }))
    );
});

Handling Rate Limit Rejections

When a request is rejected, the framework needs to communicate that to the client clearly. There are three mechanisms for this.

Setting the Rejection Status Code

The default rejection status code is 503 Service Unavailable. That is incorrect — 429 Too Many Requests is the semantically accurate code and clients know how to handle it:

C#
builder.Services.AddRateLimiter(options =>
{
    options.RejectionStatusCode = StatusCodes.Status429TooManyRequests;
});
Custom OnRejected Callback

For production APIs, you want to return a structured response and optionally log the event:

C#
builder.Services.AddRateLimiter(options =>
{
    options.RejectionStatusCode = StatusCodes.Status429TooManyRequests;

    options.OnRejected = async (context, cancellationToken) =>
    {
        context.HttpContext.Response.StatusCode = StatusCodes.Status429TooManyRequests;

        // Add Retry-After header if metadata is available
        if (context.Lease.TryGetMetadata(MetadataName.RetryAfter, out var retryAfter))
        {
            context.HttpContext.Response.Headers.RetryAfter =
                ((int)retryAfter.TotalSeconds).ToString();
        }

        // Return a structured JSON response
        context.HttpContext.Response.ContentType = "application/json";
        await context.HttpContext.Response.WriteAsJsonAsync(new
        {
            title = "Too Many Requests",
            status = 429,
            detail = "You have exceeded the allowed request limit. Please try again later."
        }, cancellationToken);

        // Log the rejection
        var logger = context.HttpContext.RequestServices
            .GetRequiredService<ILogger<Program>>();

        logger.LogWarning("Rate limit exceeded for IP: {IpAddress}, Path: {Path}",
            context.HttpContext.Connection.RemoteIpAddress,
            context.HttpContext.Request.Path);
    };
});
Request Queuing

Rather than immediately rejecting excess requests, you can queue them to be processed when capacity becomes available:

C#
builder.Services.AddRateLimiter(options =>
{
    options.AddFixedWindowLimiter("queued", limiterOptions =>
    {
        limiterOptions.PermitLimit = 10;
        limiterOptions.Window = TimeSpan.FromSeconds(10);
        limiterOptions.QueueLimit = 5;                                    // Queue up to 5 requests
        limiterOptions.QueueProcessingOrder = QueueProcessingOrder.OldestFirst;
    });
});

Queuing is appropriate for background processing or batch endpoints. For interactive user-facing APIs, immediate rejection with a clear error message is usually a better experience than making users wait in an invisible queue.

Disabling Rate Limiting for Specific Endpoints

There are legitimate cases where you want a specific endpoint to be exempt from all rate limiting — even when a global limiter is configured.

C#
// Webhook from an external service — payload verified by signature, not schema
app.MapPost("/webhooks/payment", async (HttpContext context) =>
{
    var payload = await new StreamReader(context.Request.Body).ReadToEndAsync();
    var signature = context.Request.Headers["X-Signature-256"];

    // Signature verification and processing
    return Results.Ok();
})
.DisableRateLimiting();
C#
// Health check endpoint — must always respond, even under load
app.MapGet("/health", () => Results.Ok(new { Status = "Healthy", Timestamp = DateTime.UtcNow }))
   .DisableRateLimiting();

Good candidates for .DisableRateLimiting() include health check and readiness endpoints, internal diagnostic endpoints, webhook receivers where the external system controls the payload format, and any endpoint where rate limiting would break a dependent integration

Choosing the Right Algorithm

Selecting the wrong algorithm for your use case leads to either under-protection or a frustrating user experience. Here is how to think about it:

ScenarioRecommended AlgorithmReason
Public API with simple per-minute limitsFixed WindowSimple, predictable, easy to reason about
API where burst traffic at window boundaries is a concernSliding WindowSmooths out boundary bursts
API that should allow short bursts but enforce an average rateToken BucketBurst-friendly with long-term enforcement
Expensive endpoint — database-heavy or CPU-intensiveConcurrencyCaps simultaneous executions, not request rate
Login / authentication endpointsFixed WindowPrevents brute force with hard limits
File upload or export generationToken BucketAllows burst while protecting resources
Multi-tenant API with per-user fairnessPartitioned (any algorithm)Each user gets their own independent counter

A good rule of thumb: start with Fixed Window for simplicity, move to Sliding Window if you observe burst abuse at window boundaries, and use Token Bucket when your clients are building integrations that need predictable burst headroom.

Testing Rate-Limited Endpoints

Untested rate limiting is dangerous in both directions — too permissive and you leave yourself exposed, too aggressive and you start rejecting legitimate traffic.

Before deploying to production, always load test your rate limiting configuration.

Tools worth knowing:

  • Apache JMeter — Script-based load testing that can simulate concurrent users at a controlled request rate
  • Azure Load Testing — Managed load testing with Azure Monitor integration
  • BlazeMeter — Cloud-based JMeter execution with reporting

A minimal test strategy should cover: verifying that the correct status code (429) is returned when limits are exceeded, verifying that the Retry-After header is present and accurate, confirming that legitimate traffic within limits always succeeds, and testing partition boundaries (each user/IP gets their own counter, not a shared one).

Security consideration: When creating partitions based on client-supplied data — like IP addresses or headers — you introduce a potential attack surface. An attacker who can enumerate partition keys can craft requests to exhaust resources. For IP-based partitioning specifically, be aware of IP spoofing as described in RFC 2827.

Summary

Rate limiting was one of the most requested features in the ASP.NET Core ecosystem, and the built-in middleware delivers exactly what production applications need.

The Microsoft.AspNetCore.RateLimiting middleware ships as part of the framework with no external dependencies. It provides four well-designed algorithms — Fixed Window, Sliding Window, Token Bucket, and Concurrency — each suited to different scenarios. It supports both global and named policies, integrates cleanly with Minimal APIs and MVC controllers, and handles rejection responses with proper 429 status codes and structured payloads.

Partitioned limiters take it further, enabling per-user, per-IP, and per-API-key limits with different thresholds for different client tiers. Chained limiters allow you to compose multiple policies into a single middleware pass.

For teams already running ASP.NET Core 7 or later, there is no longer a reason to reach for a third-party library for standard rate limiting scenarios.

Takeaways

  • Call builder.Services.AddRateLimiter() to register rate limiting services and app.UseRateLimiter() to activate the middleware
  • Always set RejectionStatusCode = StatusCodes.Status429TooManyRequests — the default 503 is semantically incorrect
  • Call UseRateLimiter() after UseRouting() when applying endpoint-specific policies
  • Use Fixed Window for simple, predictable limits; Sliding Window to prevent boundary bursts; Token Bucket for burst-tolerant enforcement; Concurrency to cap simultaneous executions
  • Apply named policies with .RequireRateLimiting("policyName") on Minimal API endpoints and [EnableRateLimiting("policyName")] on MVC controllers
  • Use PartitionedRateLimiter.Create<>() to assign independent counters per user, IP, or API key
  • Use PartitionedRateLimiter.CreateChained() to enforce multiple limits simultaneously (e.g., per-second burst + per-minute sustained)
  • Implement OnRejected to return structured ProblemDetails-style responses and log rejection events
  • Use QueueLimit for batch or background endpoints; prefer immediate rejection for interactive APIs
  • Use .DisableRateLimiting() for health checks, webhook receivers, and internal diagnostic endpoints
  • Load test your rate limiting configuration before deploying — untested limits fail in both directions
  • Pair rate limiting with a dedicated DDoS protection layer for production systems handling public traffic

Found this article useful? Share it with your network and spark a conversation.