1. Limiting Strategies for Websites vs APIs
Factor | Websites | APIs |
---|---|---|
Users | Human users (browsers) | Machines, mobile apps, scripts |
Requests per User | 10-100 per session | 1,000+ per session |
State | Sessions, cookies, authentication | Stateless, token-based |
Concurrency | Multiple users per IP (firewalls, NAT) | Often 1:1 client-server |
Caching | Heavy caching via CDN | Caching only applies to GET requests |
Impact of Rate Limiting | Users might get frustrated | API clients may retry aggressively |
Rate Limit Strategy | Per session, per user token | Per API key, client IP, or device fingerprint |
🔹 Key takeaway: API rate limiting must be stricter since bots, scrapers, and malicious actors abuse APIs more often. Websites can rely on caching and CDNs to reduce load.
2. Handling Rate Limiting for Websites
2.1 Web Client-Side Strategies
To prevent users from hitting rate limits, we can use:
🔹 JavaScript Throttling/Debouncing – Prevents excessive requests from UI interactions.
🔹 Session-Based Limits – Limits requests per logged-in user rather than by IP.
🔹 CDN Caching – Offloads traffic by serving cached responses.
Example: JavaScript Throttling with Lodash
|
|
🔹 Why this works? Prevents users from spamming API requests by limiting clicks.
2.2 Web Server-Side Strategies
For web servers, we use:
- Session-based limits – Track request counts per user session.
- Dynamic rate limits – Adjust limits based on user behavior.
- CDN caching – Serve static data to reduce backend load.
Example: Python Flask with Session-Based Rate Limiting
|
|
🔹 Why this works? Limits each logged-in user separately, even if they share an IP.
3. Handling Rate Limiting for APIs
3.1 API Server-Side Strategies
For APIs, we use:
- Token-based rate limiting – Assign limits per API key or session token.
- Geo-based exceptions – Allow higher limits for countries behind firewalls.
- Device fingerprinting – Track unique devices instead of IPs.
Example: Python Flask API with Token-Based Rate Limiting
|
|
🔹 Why this works? Limits API usage per token rather than per IP.
3.2 API Gateway Rate Limiting
Use NGINX, Traefik, or Istio for API gateway-based rate limiting.
Example: NGINX Rate Limiting by API Key
|
|
🔹 Why this works? Ensures rate limits apply per API key, not per IP.
4. Architecture Patterns for Scaling Web & API Rate Limiting
4.1 Using CDNs for Websites
For static assets and cached API responses, use Cloudflare, AWS CloudFront, or Fastly.
Example: AWS CloudFront Caching API Responses
|
|
🔹 Why this works? Reduces API calls by serving cached responses from edge locations.
4.2 Microservices for Scalable Rate Limiting
Break monolithic applications into microservices and limit rates per microservice.
Example: Kubernetes Rate Limiting with Istio
|
|
🔹 Why this works? Limits each microservice separately.
5. Final Thoughts: Key Takeaways
Strategy | Websites | APIs |
---|---|---|
Rate limit by IP | ❌ Bad for shared IPs | ✅ Good for bot defense |
Rate limit by user session | ✅ Best practice | ❌ Hard to track clients |
Rate limit by API key | ❌ Not needed | ✅ Best practice |
Use CDNs | ✅ Yes, for caching | ✅ Only for GET requests |
Use dynamic limits | ✅ Improves UX | ✅ Prevents abuse |
Use microservices | ✅ Reduces bottlenecks | ✅ Enables distributed limits |
Best Practices for Modern Cloud Applications
✅ Websites should cache, throttle UI actions, and use CDNs
✅ APIs should rate limit by API key, use gateways, and track device fingerprints
✅ Kubernetes microservices help distribute load and prevent bottlenecks