Stateful Polling — Gap Detection and Lock TTL Design

A stateless cron that computes its lookback window relative to “now” has a blind spot: any downtime longer than the window permanently drops events that occurred during the gap. The fix is to persist the last successful poll timestamp and always query from that point forward.

Store poll start time, not end time

Capture pollStartTime before the API call, not after.

Storing the end time sets the next poll’s date_from to after the API call began. Sessions updated during the call (e.g. due to upstream replication lag) may not have appeared in the response — storing the end time skips them permanently.

Storing the start time means the next poll re-queries from before the API call began, catching anything the previous response may have missed. This creates slight overlap with the previous window, which is harmless if HandleSessionUpdate (or equivalent) is idempotent.

Tip

Store poll start time, not end time — overlap is safe if your handler is idempotent; gaps are not recoverable.

Only advance the timestamp on full success

If any session fails processing mid-poll, leave the stored timestamp unchanged. The next cycle re-fetches the same window and retries every session in it. No session is permanently skipped due to a transient error.

Distributed lock TTL as crash recovery ceiling

When multiple pods run the same cron, use a distributed lock to ensure only one polls at a time. The lock TTL is not a per-window guard — it’s a crash recovery mechanism.

Set the TTL to exceed the worst-case poll duration, not the cron interval. The TTL only fires if the pod crashes mid-poll and finally never runs. If the TTL matched the cron interval, a legitimately slow poll would expire the lock mid-run and let another pod start a concurrent poll.

Warning

Don’t set lock TTL to match your cron interval — set it to exceed your worst-case operation duration. The TTL is a crash safety net, not a window enforcer.

Zhu Yuechen's Tech Notes

Explorer

Stateful Polling — Gap Detection and Lock TTL Design

Store poll start time, not end time

Only advance the timestamp on full success

Distributed lock TTL as crash recovery ceiling

See also

Graph View

Table of Contents