Monitoring and health

PgQue exposes its health through a small set of read-only functions. This page explains the columns that matter operationally, the one failure mode you must catch early — a stuck consumer that blocks table rotation — and the queries to wire into your monitoring.

All of the get_*_info functions and pgque.version() are granted to pgque_reader, so a read-only monitoring role can run everything here. pgque.status() is admin-only. For role setup see Installation and operations; for vocabulary see Concepts.

The examples assume:

PAGER=cat psql --no-psqlrc -d yourdb

The observability surface

pgque.status() — is the engine wired up

pgque.status() returns (component, status, detail) rows. It is the one-stop check that the ticker and maintenance jobs are scheduled. If pg_cron is installed and pgque.start() has run, you will see ticker and maintenance rows with a scheduled status and their cron job ids. This function is admin-only.

select * from pgque.status();

If status() shows nothing scheduled, no ticks are being created, and every pgque.receive() returns zero rows forever. That is the first thing to rule out.

pgque.get_queue_info([queue]) — is the queue flowing

Call with no argument for all queues, or pass a queue name for one. The operationally important output columns:

column	meaning	watch for
`ticker_lag`	wall time since this queue’s last tick	grows without bound when the ticker is not running
`ev_per_sec`	recent event throughput (float8, from the last ~20 ticks)	sudden drop to zero, or unexpected spikes
`ev_new`	events sent but not yet covered by a tick	climbs and stays high if ticking stalls
`last_tick_id`	id of the most recent tick	should keep advancing
`queue_ticker_paused`	whether ticking is paused on this queue	`true` means no delivery by design
`queue_ticker_max_count` / `queue_ticker_max_lag` / `queue_ticker_idle_period`	the tick-trigger thresholds	context for interpreting `ticker_lag`
`queue_rotation_period` / `queue_switch_time`	rotation period and last rotation time	stale `queue_switch_time` hints rotation is stuck

select queue_name, ticker_lag, ev_per_sec, ev_new, last_tick_id
from pgque.get_queue_info('orders');

ticker_lag is the single most useful queue signal. With the default settings, the queue ticks at least every ticker_idle_period (1 minute) even when idle, so a ticker_lag that keeps climbing past that means the ticker has stopped.

pgque.get_consumer_info([queue[, consumer]]) — is the consumer keeping up

Call with no arguments for every consumer on every queue, with a queue name to scope to one queue, or with both to inspect a single consumer. Output columns:

column	meaning	watch for
`lag`	age of the events the consumer is currently positioned at	grows when the consumer falls behind
`last_seen`	elapsed time since the consumer last processed a batch	grows when the consumer has stopped calling `receive`
`pending_events`	events waiting past the consumer’s position, not yet consumed	a growing backlog
`last_tick`	tick id of the consumer’s last processed tick	should advance; a frozen value is the stuck-consumer signal
`current_batch`	active batch id, or NULL if none open	a long-lived non-NULL value means a batch is never being acked
`next_tick`	final tick of the active batch, if one is open	—

select queue_name, consumer_name, lag, last_seen, pending_events, last_tick
from pgque.get_consumer_info('orders', 'processor');

In a healthy system lag and last_seen both stay low and pending_events stays near zero. A consumer whose last_tick stops advancing while last_seen keeps climbing is stuck — see the next section.

pgque.get_batch_info(batch_id) — inspect one in-flight batch

Given a batch id (the batch_id on a pgque.message, or current_batch from get_consumer_info), this returns one row describing the batch: queue_name, consumer_name, batch_start, batch_end, prev_tick_id, tick_id, lag, seq_start, seq_end. Use it to debug a specific batch that seems stalled — lag is now() minus the batch’s end-tick time, and seq_end - seq_start approximates the batch’s event span.

select queue_name, consumer_name, lag, seq_start, seq_end
from pgque.get_batch_info(12345);

What to alert on

The critical one: a stuck consumer blocks rotation

This is the headline operational risk in PgQue, and it is worth understanding before any other alert.

PgQue stores events in a set of inherited tables and reclaims space by rotating them: periodically it advances to the next table in the set and TRUNCATEs the one it is reusing. Rotation is the only thing that frees disk — there are no per-row deletes.

Rotation is gated on the slowest consumer. Step one of rotation finds the lowest sub_last_tick across all subscriptions on the queue; if the slowest consumer still needs the table about to be truncated, rotation returns zero and skips. A consumer that has stopped — crashed, deadlocked, a deploy gone wrong, or simply far too slow — pins that lowest tick and blocks the TRUNCATE indefinitely. The event tables then grow without bound until the consumer recovers or is unsubscribed.

So the alert that protects your disk is not a disk alert — it is a stuck-consumer alert. Catch it by watching get_consumer_info:

last_seen keeps growing for a consumer that should be active, and
its last_tick is not advancing while last_tick_id on the queue is,
typically with pending_events climbing alongside.

When you confirm a consumer is wedged and will not come back, unsubscribe it so rotation can proceed:

select pgque.unsubscribe('orders', 'dead_consumer');

(Or pgque.drop_queue('orders', true) to unregister all consumers, if you are tearing the queue down.) A dead consumer that you do not intend to restart must be unsubscribed, or it will hold the queue’s storage forever.

Threshold table

Frame these relatively — PgQue ships no SLA. Alert on trends across several sampling intervals, not on a single reading, and tune absolute thresholds to your own tick rate and traffic.

signal	source	alert when	why it matters
ticker lag	`get_queue_info.ticker_lag`	climbs and stays above `ticker_idle_period` (default 1 minute) across intervals	ticker not running → no batches → no delivery
consumer lag	`get_consumer_info.lag` / `pending_events`	`lag` and `pending_events` keep growing across intervals	a consumer is falling behind real-time
stuck consumer	`get_consumer_info.last_seen` + frozen `last_tick`	`last_seen` grows while `last_tick` stays put and the queue’s `last_tick_id` advances	pins the lowest tick → blocks `TRUNCATE` rotation → event tables grow unbounded (the critical one)
DLQ depth	`dlq_inspect` row count / `pgque.dead_letter`	the dead-letter backlog grows or is non-empty when you expect zero	events are exhausting retries; a downstream is failing

Dead-letter depth

Events that exhaust their retries (5 by default) land in pgque.dead_letter. A growing dead-letter backlog means a downstream is failing repeatedly. Count it two ways — directly on the table, or via dlq_inspect (both granted to pgque_reader):

-- depth per queue, straight from the table
select dl_queue_id, count(*) as dlq_depth
from pgque.dead_letter
group by dl_queue_id
order by dlq_depth desc;

-- inspect the most recent dead-lettered events for one queue
select dl_id, ev_id, dl_time, dl_reason, ev_type
from pgque.dlq_inspect('orders', 20);

To replay or purge dead-letter entries, see the DLQ functions in the Reference and the patterns in Examples.

Read-only monitoring queries

Everything below runs as pgque_reader.

Confirm the installed version:

select pgque.version();

Queue health across all queues at a glance:

select queue_name, ticker_lag, ev_per_sec, ev_new, last_tick_id
from pgque.get_queue_info()
order by ticker_lag desc;

Every consumer’s lag and liveness, worst first:

select queue_name, consumer_name, lag, last_seen, pending_events, last_tick
from pgque.get_consumer_info()
order by last_seen desc nulls last;

Stuck-consumer hunt — join consumer position against the queue’s latest tick so a frozen last_tick stands out against an advancing last_tick_id:

select c.queue_name, c.consumer_name, c.last_seen, c.last_tick,
       q.last_tick_id, q.last_tick_id - c.last_tick as ticks_behind,
       c.pending_events
from pgque.get_consumer_info() c
join pgque.get_queue_info() q using (queue_name)
order by ticks_behind desc nulls last;

Dead-letter depth per queue:

select dl_queue_id, count(*) as dlq_depth, max(dl_time) as latest
from pgque.dead_letter
group by dl_queue_id
order by dlq_depth desc;

Concepts — tick, batch, rotation, and the snapshot rule.
Installation and operations — pg_cron setup, the ticker cadence, and roles.
Latency and tuning — how tick_period_ms and the ticker thresholds trade latency against overhead.
Reference — full signatures, return columns, and role grants.
Examples — DLQ replay, fan-out, and exactly-once patterns.