The Smarter Way to Rate Limits and Protect Your APIs

Every backend developer eventually reaches that point where the app is growing, traffic is rising, and suddenly your logs look like they are being attacked by a small army of overly enthusiastic bots. This is usually the moment you whisper to yourself, “I should probably add rate limiting.”

Rate limiting feels simple on the surface. Set a number, enforce it, and keep your server from melting. Then real users show up, real traffic patterns emerge, and you start seeing innocent people being blocked while malicious actors stroll right past your defenses.

That brings us to today’s topic, why IP-based rate limiting is outdated and how user-based rate limiting saves the day.

Why IP Based Rate Limiting Fails in the Real World

Most teams begin with rate limiting by IP address. It feels intuitive at first. One IP should represent one person. Until you ship your app and discover this is rarely true.

Consider these common situations:

One corporate office might have two hundred employees sharing one public IP.

A library or café might have fifty people connected through the same network.

Meanwhile, a single attacker can rotate through hundreds of IPs in minutes.

In short:

You end up blocking real users while attackers slip through with ease.

Your rate limiter becomes unfair, unreliable, and unpredictable. Users complain, analytics become messy, and your system spends more time punishing legitimate traffic.

So the idea is to move away from network identity and toward application identity. Instead of rate limiting by IP, limit based on the actual user making the request.

The Smarter Approach: Rate Limit by User Identity

Shifting from IP identity to user identity immediately makes your system more fair and significantly harder to exploit.

Here is what this approach looks like in practice:

Each user gets their own bucket of tokens.

Tokens refill over time according to the plan or tier.

Every request consumes a token.

When the bucket is empty, the user must wait.

This avoids the classic shared Wi Fi problem where an entire group is punished for one active person. It also stops malicious users from bypassing restrictions simply by switching IPs.

If a user is logged in, your rate limiter knows exactly who they are and can treat them fairly. If your route is public, you can approximate identity by combining the IP with session information and device fingerprinting. Either way, it becomes far more stable than the traditional IP only method.

Understanding the Project Structure

We’ll be building a simple LLM chat interface similar to ChatGPT, but without AI APIs, it’s just simulating AI responses.

We’ll be using Typescript, Express and Redis to build this and we’ll implement Token Bucket strategy here to rate-limit requests.

A typical structure for this project looks like the following:

1tiers.ts
2redisClient.ts
3server.ts

Here is what each part contributes.

1. tiers.ts defines how many tokens each plan receives.

2. redisClient.ts creates and manages the Redis connection.

3. server.ts handles requests and applies the Token Bucket rules.

This setup is lightweight, simple to understand, and realistic enough to scale into production systems.

Code Breakdown

Let us walk through the essential parts of the implementation by focusing only on the core logic.

1. Defining Tiers

Each user tier receives a certain number of tokens. Higher tiers refill faster and have larger overall capacity. For example for a free user it takes 30 secs to refill 1 token, while for Gold user it just takes 6 secs to refill 1 token.

1export const TIERS = {
2  Free: 2,
3  Silver: 5,
4  Gold: 10,
5  Platinum: 15
6};

This gives your system a clear, plan based structure that can be applied across all routes.

2. Connecting to Redis

Redis is perfect for rate limiting because it is fast, persistent, and supports atomic operations.

1import dotenv from 'dotenv';
2import { createClient } from 'redis';
3
4dotenv.config();
5
6const client = createClient({
7  url: process.env.REDIS_URL
8});
9
10client.on('connect', () => {
11  console.log('Connected to Redis');
12});
13
14client.on('error', (err) => {
15  console.error('Redis error:', err);
16});
17
18client.connect();
19
20export default client;

With this in place, the server can track token counts and refill times for each user.

3. Mock Users for the Demo

To keep things simple, the demo uses a small list of predefined users.

1const USERS = [
2  { id: 1, name: 'Alice', tier: 'Free' },
3  { id: 2, name: 'Bob', tier: 'Silver' },
4  { id: 3, name: 'Charlie', tier: 'Gold' },
5  { id: 4, name: 'David', tier: 'Platinum' }
6];

In production, you would fetch this information from your database.

4. Handling the Chat Endpoint

The rate limiting logic sits inside an Express route.

1app.post('/chat', async (req, res) => {
2  const { userId, message } = req.body;
3});

Once the request arrives, the server needs to determine the user’s limit and their current token status.

5. Calculating Token Refill

We’ll first fetch the limit assigned to the userId, using the User and the tiers list we defined above.

1const limit = TIERS[user.tier as keyof typeof TIERS];

And now before we hit the business logic, we need to first verify the amount of tokens the user has, and calculate the latest refill rate, and then allow them to proceed.

So the first thing we do is define the refill rate, that is 60 seconds.

1const key = `rate:${userId}`;
2const now = Date.now();
3const refillRate = limit / 60;

Then we’ll get the usage data from Redis

1let data = await redisClient.get(key);
2let tokens: number;
3let lastRefill: number;

If the data doesn’t exist in Redis we just update the total limit (2 for free users) and current time as the last refill rate.

But if the data exists, we check the last refill time, and calculate how time as elapsed from last time to present, and calculate how many tokens do we have to refill. The Math.min(limit, tokens + refillAmount); make sures that we do not cross the limit defined in the tier.

1if (!data) {
2  tokens = limit;
3  lastRefill = now;
4} else {
5  const parsed = JSON.parse(data);
6  tokens = parsed.tokens;
7  lastRefill = parsed.lastRefill;
8
9  const elapsed = (now - lastRefill) / 1000;
10  const refillAmount = elapsed * refillRate;
11  tokens = Math.min(limit, tokens + refillAmount);
12  lastRefill = now;
13}

This ensures smooth refill behavior without sudden resets or unpredictable jumps.

Math.min ensures that the bucket never exceeds the maximum capacity.

6. Allowing or Rejecting the Request

If the user has at least one token available, the request proceeds and one token is deducted.

1if (tokens >= 1) {
2  tokens -= 1;
3  await redisClient.set(key, JSON.stringify({ tokens, lastRefill }));
4
5  res.setHeader("X-RateLimit-Limit", limit.toString());
6  res.setHeader("X-RateLimit-Remaining", Math.floor(tokens).toString());
7
8  setTimeout(() => {
9    res.json({
10      user: user.name,
11      message,
12      response: `Simulated AI reply: ${message}`,
13    });
14  }, 1000);
15}
16

If not, the system blocks the request and informs the user how long to wait.

1 else {
2  await redisClient.set(key, JSON.stringify({ tokens, lastRefill }));
3
4  const retryAfter = Math.ceil((1 - tokens) / refillRate);
5
6  res.setHeader("X-RateLimit-Limit", limit.toString());
7  res.setHeader("X-RateLimit-Remaining", "0");
8  res.setHeader("Retry-After", retryAfter.toString());
9
10  res.status(429).json({
11    error: "Rate limit exceeded",
12    limit,
13    remaining: 0,
14    retryAfter,
15  });
16}

Including retry headers is important for client side behavior because it lets the caller respect your limits gracefully.

Why User Based Rate Limiting Matters

This method is incredibly helpful across many types of applications.

1. Login and Authentication

You can prevent brute force attacks without blocking entire offices or schools.

2. Payment or Order APIs

You avoid accidental duplicate orders while still allowing natural user activity.

3. Developer Platforms

Each API key receives a fair and predictable quota.

4. Chat Systems

One noisy user can never overload the server or affect others.

User based rate limiting creates a safer and more predictable environment for both you and your users.

Conclusion

Rate limiting might look like a small backend detail, but it plays a major role in how stable and secure your application feels. IP-based rate limits are easy to implement but come with serious flaws. By shifting your focus to user identity, you make your system far more accurate, fair, and resistant to abuse.

Want to see this entire system in action? Watch this YouTube video that explains it in depth.

💻 Explore the Full Source Code

The complete project is available here:

https://github.com/pavankpdev/rate-limiting-implementation

Clone it, explore the implementation, and try extending the logic into reusable middleware for your entire backend.

Stop Using IP Rate Limits: The Smarter Way to Protect Your APIs