Rate limiting and retries
X-RateLimit headers, 429 handling, and exponential back-off patterns you can copy.
Rate limiting is what stops a single misbehaving client from drowning a public API. It is also what protects your own quota when you are the misbehaving client (a runaway script, a typo in a loop). Go REST applies a per-token budget that is generous enough not to bother developers but tight enough to stop abuse. This guide explains how the budget works, how to read the headers the server sends back, and how to write a client that stays inside the limit.
The budget
Every access token has a default budget of 90 requests per minute. The minute is a sliding window starting from your first request; it is not a fixed clock minute. So if you make 90 calls at 12:00:00, you can make 90 more at 12:00:59 only if 60 seconds have passed since each of the originals.
You can raise or lower the limit per token in/my-account/access-tokens (allowed range: 1 to 300 requests per minute). A higher limit is useful for batch jobs that legitimately need to read many resources; a lower limit is useful for tokens you are testing with that you do not want to leak budget for.
Headers on every response
Every API response includes three rate-limit headers so your client can stay informed without trial and error:
X-RateLimit-Limit: 90
X-RateLimit-Remaining: 88
X-RateLimit-Reset: 47
X-RateLimit-Limitis the budget for this token, per minute.X-RateLimit-Remainingis how many calls you can still make in the current window.X-RateLimit-Resetis the number of seconds until the window resets and you get a fresh allowance.
A polite client checksX-RateLimit-Remaining on each response and slows itself down before hitting zero. A naive client just hammers and reads 429s. Both work; the first is friendlier and survives transient bursts.
What 429 looks like
When you exceed the budget, the API returns 429 with the same headers and a tiny body:
HTTP/1.1 429 Too Many Requests
Content-Type: application/json
X-RateLimit-Limit: 90
X-RateLimit-Remaining: 0
X-RateLimit-Reset: 41
{ "message": "Too many requests" }
The 429 itself counts against your budget too, so back off rather than retrying immediately; retrying inside the window just delays you further.
Simple back-off
The straightforward retry: readX-RateLimit-Reset, sleep that long, retry. This works because the server is telling you exactly when it will let you back in.
async function fetchWithRetry(url, opts = {}, attempts = 3) {
for (let i = 0; i < attempts; i++) {
const res = await fetch(url, opts);
if (res.status !== 429) return res;
const wait = (parseInt(res.headers.get("x-ratelimit-reset"), 10) || 1) * 1000;
await new Promise(r => setTimeout(r, wait));
}
throw new Error("Rate limited; gave up");
}
Three attempts are usually enough. If you need more, the upstream is genuinely overloaded, and you are pushing into a different problem.
Exponential back-off (when you do not have a hint)
If you are calling an API that does not return a reset header (or a 5xx response that does not), use exponential back-off with jitter. The pattern: each retry waits at least twice as long as the previous one, plus a random nudge so multiple clients do not all retry in lock-step.
import random, time
def with_backoff(fn, attempts=4, base=0.5, cap=10):
for i in range(attempts):
try:
return fn()
except RetryableError:
if i == attempts - 1:
raise
sleep = min(cap, base * (2 ** i)) + random.random() * 0.25
time.sleep(sleep)
The "+ random.random() * 0.25" is the jitter. Without it, a thousand clients that all started a job at the same moment would all retry at the same moment, hitting the server in a tight wave. Jitter spreads them out.
Concurrency vs throughput
Two clients each doing 90 requests per minute is 180 requests per minute total. That is fine if they have different tokens; the budget is per token, not per IP. If you need more throughput from one logical job, the right move is to mint a second token rather than open concurrent connections on one token.
Inside one token, raising concurrency does not raise throughput; the server gates each request, not each connection. Two parallel calls on the same token still count as two against the budget.
What to do when you keep hitting the limit
If your application reliably trips the limit, the question is not "how do I retry harder" but "do I need every one of these calls?" Common fixes that are cheaper than more budget:
- Cache by id. If the same id is read 100 times, fetch it once and remember it for a few seconds. The shape of the data does not change inside a window.
- Batch with filters. Use
?email=/?status=to pull a filtered list rather than fetching ids one at a time. - Page in larger chunks. The default page size is 10. Larger pages return more data per request, so pass
?per_page=N(up to 100) on any list endpoint to fetch more rows in one call. - Push, do not poll. If you find yourself polling
GET /usersevery 10 seconds to look for changes, the API is the wrong tool. Polling that frequently will burn the budget no matter how big it is.
Test your retry path without waiting
The?force_status=429 query parameter on every endpoint returns a 429 (with the headers your retry code reads) without using up your real budget. Drop it into your tests so you can validate the back-off behaviour quickly:
curl -sSi -H "Authorization: Bearer $TOKEN" \
"$API/users?force_status=429"
The simulated response setsX-Simulated-Status: 429 so you can assert that the simulation fired. Pair it with the?delay=N parameter to test slow + rate-limited responses simultaneously.