How to Build Idempotent Webhook Event Processors
Webhook delivery is at-least-once. If your endpoint returns a non-2xx response, the sender will retry. If there's a network timeout during delivery, the sender may retry even after a successful delivery that never reached you. This means your event processor must be idempotent: processing the same event twice must produce the same result as processing it once.
Idempotency is not a nice-to-have. It's the fundamental contract that makes webhook integrations reliable. Without it, retry delivery creates duplicate records, double-charged accounts, and duplicate notifications.
The Core Idempotency Pattern
The standard pattern is event ID tracking. Every webhook sender includes a unique identifier in the payload. Before doing any processing work, check whether you've already processed this ID. If yes, return success and stop. If no, process the event and record the ID as handled.
async def process_webhook_event(event_id: str, payload: dict) -> None:
async with db.transaction():
already_processed = await db.fetchval(
"SELECT 1 FROM processed_events WHERE event_id = $1",
event_id
)
if already_processed:
return # idempotent - already done
# do the actual work
await handle_event(payload)
# record as processed within the same transaction
await db.execute(
"INSERT INTO processed_events (event_id, processed_at) VALUES ($1, NOW())",
event_id
)
The check and the insert must happen within the same database transaction. If processing succeeds but the insert fails, the event gets processed again on the next retry. If the insert succeeds but processing fails, the event is marked done without actually completing. Both are failure modes that the transaction boundary prevents.
Using Unique Constraints Instead of Application-Level Checks
An alternative to the application-level check is using a unique database constraint. Attempt to insert the event ID, catch the conflict, and use the conflict to determine whether to skip processing.
CREATE TABLE processed_events (
event_id TEXT PRIMARY KEY,
processed_at TIMESTAMPTZ DEFAULT NOW()
);
async def process_webhook_event(event_id: str, payload: dict) -> None:
try:
await db.execute(
"INSERT INTO processed_events (event_id) VALUES ($1)",
event_id
)
except UniqueViolationError:
return # already processed
await handle_event(payload)
This approach is simpler but has a gap: if handle_event fails after the insert succeeds, the event is marked as processed without actually being processed. The transaction-based approach prevents this by rolling back the insert on processing failure.
Choose the constraint-based approach when your processing is idempotent by design (running it twice produces the same state), or when you prefer the simplicity over the transactional approach.
Making Processing Operations Idempotent by Design
The cleanest idempotency comes from designing the processing operations themselves to be idempotent, rather than tracking event IDs. An operation is idempotent by design when running it multiple times produces the same result as running it once.
Account balance updates:
Non-idempotent: UPDATE accounts SET balance = balance + 100 WHERE id = $1
Idempotent: UPDATE accounts SET balance = $2 WHERE id = $1 AND balance != $2
Record creation:
Non-idempotent: INSERT INTO orders (id, status) VALUES ($1, 'created')
Idempotent: INSERT INTO orders (id, status) VALUES ($1, 'created') ON CONFLICT (id) DO NOTHING
Status transitions:
Non-idempotent: update status regardless of current state
Idempotent: UPDATE orders SET status = 'completed' WHERE id = $1 AND status != 'completed'
For operations that involve external side effects (sending emails, charging cards, calling third-party APIs), idempotent design is harder. You can't easily make a "send email" operation idempotent at the operation level. In these cases, event ID tracking is the correct approach, with the external call inside the transaction scope.
Handling Concurrent Duplicate Delivery
The idempotency check above works correctly for sequential duplicates. It has a race condition for concurrent ones: two delivery attempts arrive within milliseconds of each other, both check the processed_events table, both find no record, and both start processing simultaneously.
The fix is a database-level constraint. Both inserts race to commit. One succeeds. The other hits the unique constraint violation and skips processing. The constraint is the race condition guard; the application-level check is just an optimization to avoid the constraint error in the common case.
async def process_webhook_event(event_id: str, payload: dict) -> None:
# Optimistic check: fast path for already-processed events
if await db.fetchval("SELECT 1 FROM processed_events WHERE event_id = $1", event_id):
return
async with db.transaction():
try:
await db.execute(
"INSERT INTO processed_events (event_id) VALUES ($1)",
event_id
)
except UniqueViolationError:
return # lost the race, already processing
await handle_event(payload)
The optimistic check at the top avoids the transaction overhead for the common case (most events arrive once). The transactional insert handles the race condition for concurrent duplicates.
Storing the Event Payload for Debugging
In addition to tracking the event ID, storing the raw payload in the processed_events table provides a debugging record. When a processing bug is discovered after the fact, you can replay events from the stored payloads without needing the sender to resend historical data.
CREATE TABLE processed_events (
event_id TEXT PRIMARY KEY,
source TEXT NOT NULL,
payload JSONB NOT NULL,
processed_at TIMESTAMPTZ DEFAULT NOW(),
status TEXT DEFAULT 'pending'
);
Mark the status as 'pending' on insert, then update to 'completed' after successful processing. Failed events have status 'failed' and can be re-queued manually for investigation.
Choosing Between Database and Redis for Event Tracking
The processed_events table approach works well up to moderate webhook volumes. For high-volume integrations receiving thousands of events per hour, the database table becomes a write bottleneck.
Redis offers an alternative for the event ID store that scales better under write pressure. A Redis SET operation to add the event ID, followed by a check for the key's existence, performs the same function as the database insert with lower latency and higher throughput. Redis's TTL support means you can automatically expire event IDs after a window that matches your sender's retry policy, keeping memory usage bounded without a cleanup job.
The tradeoff: Redis does not give you the payload storage and audit trail that a database table provides. For integrations where payload debugging and replay are important, the database table is the right choice. For pure idempotency tracking at high volume, Redis's performance makes it practical.
How Stripe Documents Idempotency for Reference
Stripe publishes detailed documentation on their webhook idempotency model and it is worth reading as a reference implementation. Each Stripe event has a unique ID that is stable across all delivery attempts for that event. Stripe explicitly notes that they may deliver the same event multiple times and recommends checking the event ID before processing.
Their documentation also covers what to do when your endpoint returns success but your processing code threw an exception after the response was sent. This is the "acknowledged but not processed" failure mode. Stripe's recommended pattern matches what is described in this article: use the event ID as a database key, and wrap the processing and the ID insert in the same transaction so partial failures do not create orphaned IDs.
For a reference on the HTTP layer behavior that webhook delivery depends on, MDN Web Docs covers the semantics of HTTP response codes, including the difference between 200 and 202 responses and when each is appropriate for a webhook receiver.
For the complete webhook receiver architecture that this idempotency pattern fits into, including signature verification and async processing, How to Build a Webhook Receiver That Handles Real-World Traffic covers the full implementation.
This agency designs and builds data integration systems, including webhook infrastructure with idempotent event processing for production workloads. The data integration service at 137Foundry handles end-to-end integration architecture for teams that need reliable event processing at scale.
Photo by
Comments
Post a Comment