Portfolio | Harsh Mange

Rate limiting sounds easy until you try to distribute it across multiple servers. Suddenly you're fighting race conditions, coordination bugs, inconsistent clocks, failing nodes, and the dreaded thundering herd problem.

I spent the past few weeks building Chronon, an open-source distributed rate limiter. Here's what I learned about building distributed systems.

The Problem

Imagine you're running an API that handles 10,000 requests per second. You want to limit each user to 100 requests per minute. Simple, right?

// Naive approach
const counts = new Map();
 
function checkLimit(userId) {
  const count = counts.get(userId) || 0;
  if (count >= 100) return false;
  counts.set(userId, count + 1);
  return true;
}

This works on one server. But the moment you scale to two servers, you're in trouble:

Server A: count = 50  →  Allow request
Server B: count = 50  →  Allow request
                         ↓
            User made 100 requests but both servers think 50

What Makes Distribution Hard

Problem 1: Consistency

If User-123 hits Server-A, then Server-B, then Server-A again—all three servers need to agree on the count. Traditional solutions use a central Redis counter:

-- Atomic increment in Redis
local count = redis.call('INCR', key)
if count > limit then
  return 0
end
return 1

But now every request goes to Redis. At 10K requests/second, you're paying 10K Redis round-trips per second.

Problem 2: Ownership

Here's the insight that makes distributed rate limiting work: not everyone needs to talk to everyone.

If we designate one server as the "owner" of each user's counter, requests for that user always go to the same server. The owner handles the counting locally, only syncing to Redis periodically.

But how do we pick the owner?

Consistent Hashing: The Heart of the System

Consistent hashing maps keys (like user IDs) to servers in a way that minimizes disruption when servers join or leave.

        Node A        Node B        Node C
          │             │             │
    ──────┴─────────────┴─────────────┴──────────
          │◄───────────►│◄───────────►│
          Keys A-F      Keys G-M      Keys N-Z

When Node B dies, only keys G-M get redistributed—not the entire keyspace.

I used virtual nodes (128 per physical node) to ensure even distribution. The implementation is surprisingly short:

export class ConsistentHash {
  private ring: Map<number, string> = new Map();
  private sortedHashes: number[] = [];
  
  addNode(nodeId: string) {
    for (let i = 0; i < VIRTUAL_NODES; i++) {
      const hash = xxHash(`${nodeId}:${i}`);
      this.ring.set(hash, nodeId);
    }
    this.sortedHashes = [...this.ring.keys()].sort((a, b) => a - b);
  }
  
  getOwner(key: string): string {
    const hash = xxHash(key);
    // Binary search for the first node clockwise from our hash
    const idx = this.binarySearch(hash);
    return this.ring.get(this.sortedHashes[idx]);
  }
}

Discovery: Finding Your Neighbors

Servers need to know about each other. I used Redis as a lightweight service registry:

Heartbeat: Each node writes its metadata to Redis every 2 seconds
TTL: Keys expire after 10 seconds
Polling: Nodes poll Redis every 5 seconds for membership changes

// Registration
await redis.set(
  `rl:nodes:${nodeId}`,
  JSON.stringify({ url, registeredAt }),
  'PX', 10000  // 10 second TTL
);

When a node stops heartbeating, it disappears from Redis automatically. No complex leader election, no Raft consensus—just TTLs.

The tradeoff? Up to 10 seconds of stale data. For rate limiting, that's acceptable. For a database, it wouldn't be.

The Circuit Breaker: Preventing Cascade Failures

Here's a scenario that kept me up at night:

Node B goes down
Node A forwards 1000 requests to Node B
All 1000 requests time out (5 seconds each)
Node A is now stuck, can't serve its own traffic
Clients timeout waiting for Node A
Cascade failure

The fix is a circuit breaker—a pattern from electrical engineering:

CLOSED ─────► OPEN ─────► HALF-OPEN ─────► CLOSED
  │             │              │              ▲
  │ 3 failures  │ timeout      │ 1 success    │
  ▼             │              ▼              │
  try request   wait 30s    try 1 request    │
                              │              │
                              └──────────────┘

When a node fails 3 times, we stop trying for 30 seconds. This is called "failing fast"—better to return an error immediately than hang for 5 seconds.

One gotcha: if all nodes recover at exactly 30 seconds, they all flood the recovering node simultaneously. This is the "thundering herd" problem. The fix? Add jitter:

const jitter = Math.random() * 0.2; // 0-20% random variation
const timeout = baseTimeout * (1 + jitter);

Now nodes recover at 30s, 32s, 35s instead of all at 30s.

Observability: Because Production is Dark

Distributed systems fail in subtle ways. Without observability, you're debugging in the dark.

Structured Logging

{"level":"info","time":"2025-01-01T12:00:00Z","nodeId":"node-1","msg":"Node registered","nodeUrl":"http://limiter-1:3000"}

Every log line includes the node ID. When you're grepping through 3 million log lines, this saves hours.

Prometheus Metrics

chronon_requests_total{status="allowed"} 15234
chronon_requests_total{status="denied"} 423
chronon_request_duration_seconds_bucket{le="0.01"} 14892
chronon_cluster_nodes 3

I avoided high-cardinality labels like tenant_id. With 100K tenants, you'd have 100K time series per metric. Prometheus runs out of memory, and you're debugging Prometheus instead of your app.

The Token Bucket Algorithm

For the actual rate limiting, I used the token bucket algorithm:

     ┌───────────────────┐
     │    Token Bucket   │
     │    [● ● ● ○ ○ ○]  │  ← 3 tokens available
     └─────────┬─────────┘
               │
          Refill rate: 10 tokens/second

Each request consumes a token. Tokens refill at a steady rate. If the bucket is empty, the request is denied.

The math is simple:

const elapsedMs = now - lastRefillTime;
const newTokens = elapsedMs * (limit / windowMs);
const currentTokens = Math.min(limit, lastTokens + newTokens);
const allowed = currentTokens >= cost;

I implemented this as an atomic Lua script in Redis, ensuring correctness even under concurrent access.

Architecture Overview

Parcelo High-level Architecture

Using Chronon

If you want to try it:

# Start the cluster
docker run -d --name redis redis:7-alpine
docker run -d -p 3000:3000 -e REDIS_URL=redis://host.docker.internal:6379 harshmange44/chronon

// In your app
import { ChrononClient } from '@chronon/client';
 
const limiter = new ChrononClient('http://localhost:3000');
 
const result = await limiter.check({
  tenantId: 'my-app',
  keyType: 'user',
  keyId: 'user-123',
});
 
if (!result.allowed) {
  throw new Error('Rate limited');
}

Wrapping Up

Building distributed systems is humbling. Every "simple" problem—counting requests, finding nodes, handling failures—has edge cases that bite you in production.

But it's also incredibly satisfying. There's something magical about watching three servers coordinate without a central coordinator, handling failures gracefully, and serving thousands of requests per second.

The code is open source: github.com/harshmange44/chronon

If you build something with it, I'd love to hear about it.

Chronon is available on Docker Hub and npm.

Building Chronon: A Distributed Rate Limiter