CCLee / Blog / Redis Stream Part II: Production-Ready Message Queues with Consumer Groups

Redis Stream Part II: Production-Ready Message Queues with Consumer Groups

1.Consumer Groups
2.Commands for Consumer Group
- 2.1.XGROUP
- 2.2.XREADGROUP
  - 2.2.1.Syntax
  - 2.2.2.Understand PEL (Pending Entry List) Behavior
  - 2.2.3.Examples
    - 2.2.3.1.Basic Consumer Group Read
    - 2.2.3.2.Blocking Read with Consumer Group
    - 2.2.3.3.With Specific ID Value
- 2.3.XACK
  - 2.3.1.Syntax
  - 2.3.2.What is it?
  - 2.3.3.Examples
    - 2.3.3.1.Complete Workflow (From Read to ACK)
    - 2.3.3.2.Python Script Example
- 2.4.XINFO
  - 2.4.1.XINFO STREAM - Stream Details
  - 2.4.2.XINFO GROUPS - List Consumer Groups
  - 2.4.3.XINFO CONSUMERS - List Consumers in Group
  - 2.4.4.Python Monitoring Script
- 2.5.Error Recovery: Ensuring At-Least-Once Delivery
  - 2.5.1.Pattern 1: Consumer Checks Its Own PEL
  - 2.5.2.Pattern 2: Dedicated Recovery Worker
  - 2.5.3.Pattern 3: Combined Approach
  - 2.5.4.Summary: At-Least-Once Delivery Guarantees
- 2.6.XPENDING
  - 2.6.1.Syntax
  - 2.6.2.Examples
  - 2.6.3.Major Use Case: Finding Stuck Messages
- 2.7.XCLAIM
  - 2.7.1.Syntax
  - 2.7.2.Examples
    - 2.7.2.1.Basic Claim Flow
    - 2.7.2.2.Automated Recovery Worker
    - 2.7.2.3.Claim Multiple Messages
3.Concurrent Message Processing
- 3.1.Asyncio
  - 3.1.1.Why Asyncio for Redis Streams?
  - 3.1.2.Basic Asyncio Consumer
  - 3.1.3.Running Multiple Concurrent Consumers
  - 3.1.4.Asyncio with Error Recovery
  - 3.1.5.Asyncio vs Threading Comparison
- 3.2.Semaphore
4.References

March 2, 2026

Redis

1.Consumer Groups
2.Commands for Consumer Group
- 2.1.XGROUP
- 2.2.XREADGROUP
  - 2.2.1.Syntax
  - 2.2.2.Understand PEL (Pending Entry List) Behavior
  - 2.2.3.Examples
    - 2.2.3.1.Basic Consumer Group Read
    - 2.2.3.2.Blocking Read with Consumer Group
    - 2.2.3.3.With Specific ID Value
- 2.3.XACK
  - 2.3.1.Syntax
  - 2.3.2.What is it?
  - 2.3.3.Examples
    - 2.3.3.1.Complete Workflow (From Read to ACK)
    - 2.3.3.2.Python Script Example
- 2.4.XINFO
  - 2.4.1.XINFO STREAM - Stream Details
  - 2.4.2.XINFO GROUPS - List Consumer Groups
  - 2.4.3.XINFO CONSUMERS - List Consumers in Group
  - 2.4.4.Python Monitoring Script
- 2.5.Error Recovery: Ensuring At-Least-Once Delivery
  - 2.5.1.Pattern 1: Consumer Checks Its Own PEL
  - 2.5.2.Pattern 2: Dedicated Recovery Worker
  - 2.5.3.Pattern 3: Combined Approach
  - 2.5.4.Summary: At-Least-Once Delivery Guarantees
- 2.6.XPENDING
  - 2.6.1.Syntax
  - 2.6.2.Examples
  - 2.6.3.Major Use Case: Finding Stuck Messages
- 2.7.XCLAIM
  - 2.7.1.Syntax
  - 2.7.2.Examples
    - 2.7.2.1.Basic Claim Flow
    - 2.7.2.2.Automated Recovery Worker
    - 2.7.2.3.Claim Multiple Messages
3.Concurrent Message Processing
- 3.1.Asyncio
  - 3.1.1.Why Asyncio for Redis Streams?
  - 3.1.2.Basic Asyncio Consumer
  - 3.1.3.Running Multiple Concurrent Consumers
  - 3.1.4.Asyncio with Error Recovery
  - 3.1.5.Asyncio vs Threading Comparison
- 3.2.Semaphore
4.References

1. Consumer Groups

What is it? Consumer Groups in Redis Stream is analogous to Kafka consumer groups. They solve the coordination problems we had with manual XREAD.

Key Features.

Independent consumer groups. Multiple groups can process the same stream independently
Automatic message distribution. Messages automatically distributed among consumers in a group
Pending Entry List (PEL). Tracks which consumer has which unacknowledged messages. Messages are added to PEL immediately when consumed via XREADGROUP (not after processing). Only messages consumed without NOACK flag enter PEL.
At-least-once delivery. Messages remain pending (in PEL) until explicitly acknowledged with XACK
Consumer failure handling. Can claim messages from dead consumers
Last delivered ID tracking. Group tracks progress automatically

2. Commands for Consumer Group

2.1.
`XGROUP`

XGROUP manages consumer groups: creation, deletion, and configuration.

2.1.1.
`XGROUP CREATE` - Create Consumer Group

Syntax.

1XGROUP CREATE stream group_name starting_id [MKSTREAM]

Parameters:

stream - Stream name
group_name - Name for the consumer group
starting_id - Where to start reading (0 = beginning, $ = only new messages)
MKSTREAM - Create stream if it doesn't exist (optional)

Examples.

Create consumer group starting from beginning:

1XGROUP CREATE orders:payments payment-processors 0

Create group starting from current position (only new messages):

1XGROUP CREATE orders:payments analytics-team $

Create group and stream if stream doesn't exist:

1XGROUP CREATE orders:refunds refund-processors 0 MKSTREAM

Error if group already exists:

1XGROUP CREATE orders:refunds refund-processors 0

When to use 0 vs $.

Using 0 - Process all existing + new messages:

1XGROUP CREATE backfill-orders backfill-processors 0

Using $ - Only process new messages (ignore existing):

1XGROUP CREATE orders:payments real-time-processors $

2.1.2.
`XGROUP SETID` - Reset Group Position

Syntax.

1XGROUP SETID stream group_name new_id

Use cases.

Skip to only new messages:

1XGROUP SETID orders:payments payment-processors $

Reset to beginning (reprocess all messages):

1XGROUP SETID orders:payments payment-processors 0

Reset to specific message ID:

1XGROUP SETID orders:payments payment-processors 1709251200500-0

Recovery scenario. Processing got stuck at bad message - skip past the problematic message:

1XGROUP SETID orders:payments payment-processors 1709251200123-0

2.1.3.
`XGROUP DESTROY` - Delete Consumer Group

Delete consumer group (cannot be undone!):

1XGROUP DESTROY orders:payments analytics-team

Returns: 1 (success)

Deleting non-existent group:

1XGROUP DESTROY orders:payments fake-group

Returns: 0 (group didn't exist)

2.1.4.
`XGROUP DELCONSUMER` - Remove Consumer

Remove specific consumer from group:

1XGROUP DELCONSUMER orders:payments payment-processors worker-1

Returns: 2 (number of pending messages that were assigned to worker-1)

These pending messages become available for other consumers.

2.2.
`XREADGROUP`

XREADGROUP reads messages with automatic distribution and tracking.

2.2.1.
Syntax

1XREADGROUP GROUP group_name consumer_name \
2    [COUNT count] [BLOCK ms] [NOACK] STREAMS stream_name ID

Parameters:

GROUP group_name consumer_name - Group and consumer identity
COUNT - Max messages to read
BLOCK - Wait for messages (milliseconds)
- 0 = wait indefinitely (most common for consumer workers)
- > 0 = timeout in milliseconds
- Omit BLOCK = non-blocking, return immediately
NOACK - Don't add to PEL (fire-and-forget, rarely used). By default, messages ARE added to PEL immediately when consumed
ID - Starting position
- > = only undelivered messages (most common for new work)
- 0 = check PEL first (returns this consumer's pending messages)
- Valid message ID (e.g., 1709251200000-0) = returns messages with ID greater than specified, including any that are in this consumer's PEL

Side Effect (When ID=> and COUNT > 0). In this case when we execute XREADGROUP to a consumer, the consumer has an internal state that records the last_id consumed.

The next time we execute XREADGROUP again the messages that are older than the last_id will be eliminated

Side Effect (When ID=>). XREADGROUP with ID=> consumes messages from the stream, redis also immediately adds them to consumer's PEL

No Side Effect (When ID=0) In this case XREADGROUP reads from the consumer's PEL, not from the stream directly.

The returned messages are already consumed but not ACKed.

Remark. PEL is a separate radix tree data structure tracking unacknowledged messages per consumer.

In this case we also say that we read the pending messages of a consumer.

2.2.2.
Understand `PEL` (Pending Entry List) Behavior

When XREADGROUP returns messages, they are immediately added to the consumer's PEL before any processing begins. This happens at the moment of consumption, not after processing. The PEL tracks unacknowledged messages and enables reliable message delivery:

Messages stay in PEL until explicitly acknowledged with XACK
If a consumer crashes, messages remain in its PEL for recovery
Use NOACK flag only when you don't need reliability (fire-and-forget scenarios)
Only messages that need acknowledgment enter the PEL

2.2.3.
Examples

2.2.3.1.
Basic Consumer Group Read

Setup.

1# Create stream with messages
2XADD orders:payments * orderID 1001 amount 100 userID 123
3XADD orders:payments * orderID 1002 amount 200 userID 456
4XADD orders:payments * orderID 1003 amount 300 userID 789
5
6# Create consumer group
7XGROUP CREATE orders:payments payment-processors 0

> means "give me new messages not yet delivered to this group":

1XREADGROUP GROUP payment-processors worker-1 COUNT 2 \
2    STREAMS orders:payments >

PEL Tracking. These 2 messages are immediately added to worker-1's Pending Entry List (PEL) the moment XREADGROUP returns them—before any processing happens.

They will remain in the PEL until either:

Successfully acknowledged with XACK after processing completes
Reclaimed by another consumer via XCLAIM (if worker-1 crashes or takes too long)

This immediate PEL tracking is what enables at-least-once delivery semantics: if the consumer crashes before ACKing, the messages remain in the PEL and can be recovered.

Consumer 2 reads (simultaneously):

1XREADGROUP GROUP payment-processors worker-2 COUNT 2 \
2    STREAMS orders:payments >

Here

worker-2 gets different messages (automatic distribution!)
Messages 1001, 1002 already assigned to worker-1

2.2.3.2.
Blocking Read with Consumer Group

By using XREADGROUP we (i) create worker-1 and (ii) listen to new messages to the stream at the same time:

1XREADGROUP GROUP payment-processors worker-1 BLOCK 30000 \
2    STREAMS orders:payments >

We can test by adding a new message in another terminal:

1XADD orders:payments * orderID 1004 amount 400 userID 111

2.2.3.3.
With Specific ID Value

Resume from Specific Point. XREADGROUP returns messages with ID greater than specified, including:

Pending messages from this consumer's PEL with ID > specified ID
New messages from stream with ID > specified ID (not yet delivered to group)

1XREADGROUP GROUP mygroup consumer1 STREAMS \
2    mystream 1709251200000-0

Example. If consumer has pending

1[1709251200005-0, 1709251200010-0]

and we query with ID 1709251200007-0, it returns: 1709251200010-0 (from PEL) + any newer undelivered messages from the stream.

2.3.
`XACK`

2.3.1.
Syntax

1XACK stream group_name message_id [message_id ...]

Note that group_name is required, this command only operates within the context of a consumer group.

2.3.2.
What is it?

XACK removes messages from the Pending Entry List (PEL), signaling successful processing.
XACK only works with consumer groups. When using XREAD without consumer groups, there is no PEL and no XACK command, we must manually track which messages we have processed.

XACK does not delete messages from the stream for potential reprocessing, auditing, or consumption by other consumer groups.

More specifically, even we have ACK-ed a message via

1r.xack('orders:payments',    # the stream
2       'payment-processors', # the consumer group
3        message_id)

Stream Storage: ACKed messages remain in the stream permanently (unless explicitly trimmed with XTRIM)

PEL: XACK only removes messages from the consumer group's PEL. We can verify the message is removed from PEL by checking:

1XREADGROUP GROUP payment-processors \
2    worker-1 STREAMS orders:payments 0

If the message was ACKed, it will NOT appear in this result (because it's no longer in worker-1's PEL).

Different Read Commands See Different Views:

XREAD STREAMS orders:payments 0 - Reads ALL messages from stream (includes ACKed messages)
XREADGROUP ... STREAMS orders:payments 0 - Reads only unACKed messages from THIS consumer's PEL
XREADGROUP ... STREAMS orders:payments > - Reads new messages not yet delivered to consumer group

To actually remove messages from the stream, use XTRIM:

1XTRIM orders:payments MINID 1709251200100-0  # Remove older messages
2XTRIM orders:payments MAXLEN 1000             # Keep only last 1000

2.3.3.
Examples

2.3.3.1.
Complete Workflow (From Read to `ACK`)

Worker reads messages:

1XREADGROUP GROUP payment-processors worker-1 STREAMS orders:payments >

It returns message: 1709251200000-0

Process the payment (... payment successful ...)

Acknowledge message (remove from PEL):

1XACK orders:payments payment-processors 1709251200000-0

Returns: 1 (number of messages acknowledged)

Can ACK multiple messages at once:

1XACK orders:payments payment-processors 1709251200000-0 1709251200001-0 1709251200002-0

Returns: 3

2.3.3.2.
Python Script Example

1import redis
2
3r = redis.Redis(decode_responses=True)
4
5# Consumer loop
6while True:
7    # Read message
8    result = r.xreadgroup(
9        groupname='payment-processors',
10        consumername='worker-1',
11        streams={'orders:payments': '>'},
12        count=1,
13        block=5000
14    )
15    
16    if result:
17        stream, messages = result[0]
18        
19        # Check if there are messages (list could be empty)
20        if not messages:
21            continue
22            
23        message_id, data = messages[0]
24        
25        try:
26            # Process payment
27            process_payment(data['orderID'], data['amount'])
28            
29            # Success - ACK
30            r.xack('orders:payments', 'payment-processors', message_id)
31            print(f'Acknowledged: {message_id}')
32            
33        except Exception as e:
34            # Error: Message stays in PEL for retry
35            print(f'Failed: {message_id}, Error: {e}')
36            # Will be re-processed when we check PEL

2.4.
`XINFO`

XINFO provides detailed information about streams, groups, and consumers.

Three subcommands:

2.4.1.
`XINFO STREAM` - Stream Details

1XINFO STREAM orders:payments

2.4.2.
`XINFO GROUPS` - List Consumer Groups

1XINFO GROUPS orders:payments

Example output (Listpack).

11) 1) "name"
2   2) "payment-processors"
3   3) "consumers"
4   4) (integer) 1
5   5) "pending"
6   6) (integer) 0
7   7) "last-delivered-id"
8   8) "1772438927833-0"
9   9) "entries-read"
10  10) (integer) 16
11  11) "lag"
12  12) (integer) 0

Key fields.

name - Consumer group name
consumers - Number of active consumers in this group
pending - Total unacknowledged messages across all consumers
last-delivered-id - Last message ID delivered to any consumer (determines what ID=> returns)
entries-read - Total messages read by this group since creation
lag - Number of messages in stream not yet delivered (stream length - entries-read)

Important. If last-delivered-id is ahead of all message IDs in the stream, XREADGROUP with ID=> will return nothing.

If it is intended to re-deliver all messages again, run

1XGROUP SETID <stream> <group> 0

Upon reset, any blocking call of XGROUPREAD will process those messages.

2.4.3.
`XINFO CONSUMERS` - List Consumers in Group

1XINFO CONSUMERS orders:payments payment-processors

2.4.4.
Python Monitoring Script

1import redis
2import json
3
4r = redis.Redis(decode_responses=True)
5
6def monitor_consumer_groups(stream_name):
7    """Monitor health of consumer groups"""
8    print(f'\n=== Stream: {stream_name} ===')
9    
10    # Stream stats
11    stream_info = r.xinfo_stream(stream_name)
12    print(f'Total messages: {stream_info["length"]}')
13    print(f'Consumer groups: {stream_info["groups"]}')
14    print(f'Last message ID: {stream_info["last-generated-id"]}\n')
15    
16    # Each consumer group
17    groups = r.xinfo_groups(stream_name)
18    for group in groups:
19        group_name = group['name']
20        print(f'Group: {group_name}')
21        print(f'  Consumers: {group["consumers"]}')
22        print(f'  Pending: {group["pending"]}')
23        print(f'  Last delivered: {group["last-delivered-id"]}')
24        
25        # Each consumer in group
26        consumers = r.xinfo_consumers(stream_name, group_name)
27        for consumer in consumers:
28            print(f'    Consumer: {consumer["name"]}')
29            print(f'      Pending: {consumer["pending"]}')
30            print(f'      Idle: {consumer["idle"]}ms')
31            
32            # Alert if consumer is too idle
33            if consumer['idle'] > 300000 and consumer['pending'] > 0:
34                print(f'      ALERT: Consumer may be dead!')
35        print()

Output of the script:

2.5.
Error Recovery: Ensuring At-Least-Once Delivery

When exceptions occur during processing, messages remain in the PEL without being ACKed. Redis Streams provides mechanisms to ensure at-least-once delivery by retrying these pending messages.

How It Works:

Message consumed → Added to consumer's PEL immediately
Exception thrown → Message NOT ACKed, stays in PEL
Worker crashes → Message still in PEL (persistent)
Recovery → Read pending messages and retry

2.5.1.
Pattern 1: Consumer Checks Its Own PEL

Each consumer periodically checks its own pending messages:

1import redis
2import time
3
4r = redis.Redis(decode_responses=True)
5
6def consumer_with_retry():
7    """Consumer that retries its own pending messages"""
8    while True:
9        # Step 1: Check for pending messages first (ID=0)
10        result = r.xreadgroup(
11            groupname='payment-processors',
12            consumername='worker-1',
13            streams={'orders:payments': '0'},  # 0 = check MY PEL
14            count=10
15        )
16        
17        if result and result[0][1]:
18            # Found pending messages - retry them
19            stream, messages = result[0]
20            print(f'Found {len(messages)} pending messages, retrying...')
21            
22            for message_id, data in messages:
23                try:
24                    process_payment(data)
25                    r.xack('orders:payments', 'payment-processors', message_id)
26                    print(f'Retry successful: {message_id}')
27                except Exception as e:
28                    print(f'Retry failed: {message_id}, Error: {e}')
29                    # Still in PEL, will retry next iteration
30        
31        # Step 2: Process new messages (ID=>)
32        result = r.xreadgroup(
33            groupname='payment-processors',
34            consumername='worker-1',
35            streams={'orders:payments': '>'},  # > = new messages
36            count=10,
37            block=5000  # Wait up to 5s for new messages (prevents busy-waiting)
38        )
39        # Note: block=5000 makes the command wait up to 5 seconds if no messages
40        # are available, instead of returning immediately. This prevents busy-waiting
41        # (constantly polling in a tight loop), reducing CPU and network usage.
42        
43        if result:
44            stream, messages = result[0]
45            for message_id, data in messages:
46                try:
47                    process_payment(data)
48                    r.xack('orders:payments', 'payment-processors', message_id)
49                    print(f'Processed: {message_id}')
50                except Exception as e:
51                    print(f'Failed: {message_id}, Error: {e}')
52                    # Stays in PEL for next retry cycle
53
54# consumer_with_retry()

Key Points:

Use ID=0 to read pending messages
Check PEL periodically (e.g., every loop iteration or every N seconds)
Failed messages remain in PEL for next retry
Simple pattern for single consumer recovery

2.5.2.
Pattern 2: Dedicated Recovery Worker

A separate worker monitors and claims stuck messages from ALL consumers:

1import redis
2import time
3
4r = redis.Redis(decode_responses=True)
5
6def recovery_worker(max_idle_time=60000, max_retries=3):
7    """
8    Dedicated worker that claims stuck messages from any consumer
9    
10    Args:
11        max_idle_time: Claim messages idle for > this time (ms)
12        max_retries: Move to DLQ after this many attempts
13    """
14    while True:
15        # Find ALL stuck messages across all consumers
16        pending = r.xpending_range(
17            name='orders:payments',
18            groupname='payment-processors',
19            min='-',
20            max='+',
21            count=100,
22            idle=max_idle_time  # Only messages idle > 60s
23        )
24        
25        if not pending:
26            print('No stuck messages')
27            time.sleep(30)
28            continue
29        
30        print(f'Found {len(pending)} stuck messages')
31        
32        for msg in pending:
33            message_id = msg['message_id']
34            consumer = msg['consumer']
35            idle_ms = msg['time_since_delivered']
36            delivery_count = msg['times_delivered']
37            
38            print(f'Stuck message: {message_id}')
39            print(f'  Consumer: {consumer}, Idle: {idle_ms}ms, Attempts: {delivery_count}')
40            
41            # Check if exceeded max retries
42            if delivery_count >= max_retries:
43                # Move to Dead Letter Queue
44                message_data = r.xrange('orders:payments', message_id, message_id)[0]
45                r.xadd('orders:dlq', {
46                    'original_id': message_id,
47                    'original_data': str(message_data[1]),
48                    'attempts': delivery_count,
49                    'last_consumer': consumer,
50                    'reason': 'max_retries_exceeded'
51                })
52                
53                # ACK to remove from PEL
54                r.xack('orders:payments', 'payment-processors', message_id)
55                print(f'  → Moved to DLQ (exceeded {max_retries} retries)')
56                continue
57            
58            # Claim and retry
59            try:
60                claimed = r.xclaim(
61                    name='orders:payments',
62                    groupname='payment-processors',
63                    consumername='recovery-worker',
64                    min_idle_time=max_idle_time,
65                    message_ids=[message_id]
66                )
67                
68                if claimed:
69                    _, data = claimed[0]
70                    
71                    try:
72                        # Attempt to process
73                        process_payment(data)
74                        
75                        # Success - ACK
76                        r.xack('orders:payments', 'payment-processors', message_id)
77                        print(f'  → Recovered successfully')
78                        
79                    except Exception as e:
80                        print(f'  → Recovery failed: {e}')
81                        # Stays in PEL, delivery_count incremented
82                        # Will retry later if idle time threshold reached
83                        
84            except Exception as e:
85                print(f'  → Claim failed: {e}')
86        
87        time.sleep(30)  # Check every 30 seconds
88
89# recovery_worker(max_idle_time=60000, max_retries=3)

Advantages of Recovery Worker:

Monitors ALL consumers (finds stuck messages from crashed workers)
Automatic cleanup of dead consumer's pending messages
Centralized retry logic and DLQ management
Prevents message loss from permanent consumer failures

2.5.3.
Pattern 3: Combined Approach

Best practice: Regular consumers retry their own pending + dedicated recovery worker:

1def smart_consumer():
2    """Consumer with built-in retry + recovery worker backup"""
3    retry_interval = 60  # Check own PEL every 60 seconds
4    last_pel_check = time.time()
5    
6    while True:
7        # Periodically check own PEL
8        if time.time() - last_pel_check > retry_interval:
9            # Retry my pending messages
10            result = r.xreadgroup(
11                groupname='payment-processors',
12                consumername='worker-1',
13                streams={'orders:payments': '0'},
14                count=10
15            )
16            
17            if result and result[0][1]:
18                for message_id, data in result[0][1]:
19                    try:
20                        process_payment(data)
21                        r.xack('orders:payments', 'payment-processors', message_id)
22                    except Exception as e:
23                        print(f'Retry failed: {e}')
24            
25            last_pel_check = time.time()
26        
27        # Process new messages
28        result = r.xreadgroup(
29            groupname='payment-processors',
30            consumername='worker-1',
31            streams={'orders:payments': '>'},
32            count=10,
33            block=5000
34        )
35        
36        if result:
37            for message_id, data in result[0][1]:
38                try:
39                    process_payment(data)
40                    r.xack('orders:payments', 'payment-processors', message_id)
41                except Exception as e:
42                    print(f'Processing failed: {e}')
43                    # Will retry in next PEL check
44
45# Run multiple consumers + 1 recovery worker
46# Terminal 1: smart_consumer() as worker-1
47# Terminal 2: smart_consumer() as worker-2  
48# Terminal 3: recovery_worker()

2.5.4.
Summary: At-Least-Once Delivery Guarantees

Redis Streams ensures at-least-once delivery through:

PEL Persistence - Messages added to PEL immediately when consumed
Survives Crashes - PEL stored in Redis, not consumer memory
Self Retry - Consumers check own PEL with ID=0
Cross-Consumer Recovery - XCLAIM allows other consumers to take over
Idle Detection - XPENDING finds stuck messages
DLQ Pattern - Failed messages after max retries moved to dead letter queue

No message is lost as long as:

Redis server is running
Recovery worker or consumers check PEL periodically
Messages are ACKed only after successful processing

2.6.
`XPENDING`

XPENDING shows unacknowledged messages in the Pending Entry List.

Two forms:

Summary form - Overview of pending messages
Detailed form - Individual message details

2.6.1.
Syntax

1XPENDING stream group_name [IDLE min_idle_time] start_id end_id \
2    count [consumer_name]

2.6.2.
Examples

2.6.2.1.
Get pending messages from a consumer group

Get detailed info for first 10 pending messages:

1XPENDING orders:payments payment-processors - + 10

Returns for each message:

Message ID
Consumer name
Milliseconds since delivered
Delivery count (how many times read)

We have used XGROUP SETID to trigger the consumption of a stream in a blocking while-loop of XGROUPREAD, and deliberately thrown exceptions for a few of them, making them be consumed but not ACKed.

11) 1) "1772438038075-0"
2   2) "worker-1"
3   3) (integer) 109119
4   4) (integer) 1
52) 1) "1772438927833-0"
6   2) "worker-1"
7   3) (integer) 109118
8   4) (integer) 1

2.6.2.2.
Get pending messages from a consumer group idle for more than 60 seconds (60000 ms)

Returns only messages not processed for > 60s:

1XPENDING orders:payments payment-processors IDLE 60000 - + 10

2.6.2.3.
Get pending messages from specific consumer

1XPENDING orders:payments payment-processors - + 10 worker-1

Returns only worker-1's pending messages.

2.6.3.
Major Use Case: Finding Stuck Messages

Find messages stuck for > 5 minutes (300000 ms):

1XPENDING orders:payments payment-processors IDLE 300000 - + 100

These are candidates for claiming (reassigning to another consumer).

2.7.
`XCLAIM`

XCLAIM transfers pending messages from one consumer to another, useful for recovering from consumer failures.

2.7.1.
Syntax

1XCLAIM stream group_name consumer_name min_idle_time message_id [message_id ...] [IDLE ms] [TIME unix_time_ms] [RETRYCOUNT count] [FORCE] [JUSTID]

Parameters:

min_idle_time - Only claim if message idle for at least this long (milliseconds)
IDLE ms - Set the idle time of claimed message
RETRYCOUNT count - Set delivery count
FORCE - Claim even if not in PEL
JUSTID - Return only IDs (not full messages)

2.7.2.
Examples

2.7.2.1.
Basic Claim Flow

worker-1 crashed after reading message:

1XREADGROUP GROUP payment-processors worker-1 STREAMS orders:payments >

Returns: 1709251200000-0 (now in worker-1's PEL)
worker-1 crashes and doesn't recover.

Check pending (5 minutes later = 300000 ms):

1XPENDING orders:payments payment-processors IDLE 300000 - + 10

Shows: 1709251200000-0 owned by worker-1, idle for 300000ms

worker-2 claims the stuck message:

1XCLAIM orders:payments payment-processors 
2    worker-2 \
3    60000 \ # min-idle time
4    1709251200000-0

Returns the claimed message:

11) 1) "1772438038075-0"
2  2) "worker-1"
3  3) (integer) 109119
4  4) (integer) 1
52) 1) "1772438927833-0"
6  2) "worker-1"
7  3) (integer) 109118
8  4) (integer) 1

Message now in worker-2's PEL.

2.7.2.2.
Automated Recovery Worker

1import redis
2import time
3
4r = redis.Redis(decode_responses=True)
5
6def recovery_worker():
7    """Claim and process stuck messages"""
8    while True:
9        # Find messages stuck for > 2 minutes
10        pending = r.xpending_range(
11            name='orders:payments',
12            groupname='payment-processors',
13            min='-',
14            max='+',
15            count=10,
16            idle=120000  # 2 minutes
17        )
18        
19        if not pending:
20            print('No stuck messages')
21            time.sleep(30)
22            continue
23        
24        for msg in pending:
25            message_id = msg['message_id']
26            original_consumer = msg['consumer']
27            idle_time = msg['time_since_delivered']
28            delivery_count = msg['times_delivered']
29            
30            print(f'Found stuck message: {message_id}')
31            print(f'  Original consumer: {original_consumer}')
32            print(f'  Idle time: {idle_time}ms')
33            print(f'  Delivery count: {delivery_count}')
34            
35            if delivery_count >= 3:
36                # Too many retries - move to DLQ
37                r.xack('orders:payments', 'payment-processors', message_id)
38                r.xadd('orders:dlq', {'original_id': message_id, 'reason': 'max_retries'})
39                print(f'  → Moved to DLQ')
40            else:
41                # Claim and retry
42                claimed = r.xclaim(
43                    name='orders:payments',
44                    groupname='payment-processors',
45                    consumername='recovery-worker',
46                    min_idle_time=60000,
47                    message_ids=[message_id]
48                )
49                
50                if claimed:
51                    try:
52                        # Process message
53                        _, data = claimed[0]
54                        process_payment(data)
55                        
56                        # Success - ACK
57                        r.xack('orders:payments', 'payment-processors', message_id)
58                        print(f'  → Recovered successfully')
59                    except Exception as e:
60                        print(f'  → Recovery failed: {e}')
61                        # Stays in PEL for next retry
62        
63        time.sleep(30)  # Check every 30 seconds
64
65# recovery_worker()

Here the delivery_count is recorded in the struct of PEL and is very helpful to implementing maximum retry threshold for DLQ.

2.7.2.3.
Claim Multiple Messages

Claim multiple stuck messages at once:

1XCLAIM orders:payments payment-processors worker-3 60000 \
2  1709251200000-0 1709251200001-0 1709251200002-0

Returns all 3 claimed messages. All moved from original consumers to worker-3's PEL.

3. Concurrent Message Processing

3.1.
Asyncio

For I/O-bound workloads (API calls, database queries, external services), asyncio provides efficient concurrent processing with minimal overhead compared to threads.

3.1.1.
Why Asyncio for Redis Streams?

Problem: Single-threaded consumers process messages sequentially:

1# Single-threaded - processes ONE message at a time
2while True:
3    result = r.xreadgroup(...)
4    for message_id, data in messages:
5        process_payment(data)  # Takes 2 seconds (network call to payment API)
6        r.xack(...)
7# Throughput: ~0.5 messages/second

Solution: Asyncio consumers process multiple messages concurrently:

1# Asyncio - 10 coroutines process messages in parallel
2# While one waits for I/O, others continue working
3# Throughput: ~5 messages/second (10x improvement)

3.1.2.
Basic Asyncio Consumer

1import asyncio
2import redis.asyncio as redis
3from typing import Dict, Any
4
5async def process_payment_async(data: Dict[str, Any]):
6    """Async payment processing (simulates API call)"""
7    order_id = data.get('orderID')
8    amount = data.get('amount')
9    
10    print(f'Processing payment {order_id}...')
11    await asyncio.sleep(2)  # Simulates async I/O (network call)
12    print(f'Payment {order_id} completed: ${amount}')
13    return True
14
15async def consumer_coroutine(consumer_name: str):
16    """Single async consumer coroutine"""
17    r = await redis.Redis(decode_responses=True)
18    
19    print(f'{consumer_name} started')
20    
21    try:
22        while True:
23            # XREADGROUP is async
24            result = await r.xreadgroup(
25                groupname='payment-processors',
26                consumername=consumer_name,
27                streams={'orders:payments': '>'},
28                count=10,
29                block=5000
30            )
31            
32            if result:
33                stream, messages = result[0]
34                for message_id, data in messages:
35                    try:
36                        await process_payment_async(data)
37                        await r.xack('orders:payments', 'payment-processors', message_id)
38                        print(f'{consumer_name}: ACKed {message_id}')
39                    except Exception as e:
40                        print(f'{consumer_name}: Failed {message_id}: {e}')
41    finally:
42        await r.close()
43
44# Run single consumer
45# asyncio.run(consumer_coroutine('worker-1'))

3.1.3.
Running Multiple Concurrent Consumers

Pattern 1: Multiple Coroutines in One Process

Perfect for maximizing single-machine utilization:

1import asyncio
2import redis.asyncio as redis
3import os
4
5async def main():
6    """Run multiple concurrent consumers on this machine"""
7    hostname = os.getenv('HOSTNAME', 'server1')
8    num_consumers = 10  # 10 concurrent coroutines
9    
10    # Create consumer group (only needs to happen once)
11    r = await redis.Redis(decode_responses=True)
12    try:
13        await r.xgroup_create('orders:payments', 'payment-processors', id='0', mkstream=True)
14        print('Consumer group created')
15    except redis.ResponseError as e:
16        if 'BUSYGROUP' not in str(e):
17            raise
18    await r.close()
19    
20    # Launch all consumers concurrently
21    tasks = [
22        consumer_coroutine(f'{hostname}-consumer-{i}')
23        for i in range(num_consumers)
24    ]
25    
26    print(f'Starting {num_consumers} concurrent consumers...')
27    await asyncio.gather(*tasks)
28
29if __name__ == '__main__':
30    asyncio.run(main())

Output:

1Starting 10 concurrent consumers...
2server1-consumer-0 started
3server1-consumer-1 started
4...
5server1-consumer-0: Processing payment 1001...
6server1-consumer-1: Processing payment 1002...
7server1-consumer-2: Processing payment 1003...
8# All 10 consumers work concurrently!
9server1-consumer-0: Payment 1001 completed: $99.99
10server1-consumer-0: ACKed 1709251200000-0

3.1.4.
Asyncio with Error Recovery

Combine async processing with PEL-based retry:

1import asyncio
2import redis.asyncio as redis
3
4async def consumer_with_retry(consumer_name: str):
5    """Async consumer with periodic PEL checking"""
6    r = await redis.Redis(decode_responses=True)
7    retry_interval = 60  # Check PEL every 60 seconds
8    last_pel_check = asyncio.get_event_loop().time()
9    
10    while True:
11        current_time = asyncio.get_event_loop().time()
12        
13        # Periodically check own PEL
14        if current_time - last_pel_check > retry_interval:
15            print(f'{consumer_name}: Checking own PEL for retries...')
16            
17            result = await r.xreadgroup(
18                groupname='payment-processors',
19                consumername=consumer_name,
20                streams={'orders:payments': '0'},  # 0 = my pending messages
21                count=10
22            )
23            
24            if result and result[0][1]:
25                print(f'{consumer_name}: Found {len(result[0][1])} pending messages, retrying...')
26                for message_id, data in result[0][1]:
27                    try:
28                        await process_payment_async(data)
29                        await r.xack('orders:payments', 'payment-processors', message_id)
30                        print(f'{consumer_name}: Retry successful for {message_id}')
31                    except Exception as e:
32                        print(f'{consumer_name}: Retry failed for {message_id}: {e}')
33            
34            last_pel_check = current_time
35        
36        # Process new messages
37        result = await r.xreadgroup(
38            groupname='payment-processors',
39            consumername=consumer_name,
40            streams={'orders:payments': '>'},
41            count=10,
42            block=5000
43        )
44        
45        if result:
46            for message_id, data in result[0][1]:
47                try:
48                    await process_payment_async(data)
49                    await r.xack('orders:payments', 'payment-processors', message_id)
50                except Exception as e:
51                    print(f'{consumer_name}: Processing failed: {e}')
52                    # Stays in PEL for next retry cycle
53
54async def main():
55    """Run 20 async consumers with retry logic"""
56    tasks = [
57        consumer_with_retry(f'async-worker-{i}')
58        for i in range(20)
59    ]
60    await asyncio.gather(*tasks)
61
62# asyncio.run(main())

3.1.5.
Asyncio vs Threading Comparison

Aspect	Asyncio	Threading
Concurrency Model	Cooperative multitasking	Preemptive multitasking
Context Switching	Very lightweight (user space)	Heavier (kernel space)
Memory per Unit	~1-2 KB per coroutine	~8 MB per thread (Linux)
Max Concurrent	1000s of coroutines	10-100 threads
Best For	I/O-bound (network, DB)	CPU-bound + I/O-bound
Python GIL	Single-threaded (no GIL issue)	Limited by GIL
Error Isolation	One exception can affect all	Thread isolation
Debugging	Easier (single thread)	Harder (race conditions)

When to use Asyncio:

High concurrency (100+ consumers on one machine)
I/O-bound workloads (API calls, database queries)
Lower memory footprint
Simpler debugging (no thread synchronization)

When to use Threading:

Need true parallelism (CPU-bound work)
Using blocking libraries (no async support)
Better fault isolation (thread crashes don't affect others)

3.2.
Semaphore

Similar to Multithreading with Semaphore in Kotlin, we don't want the unlimited amount of messages to exhaust all the resource of a machine.

We use Semaphore to

limit concurrent downstream requests
Prevents overwhelming external APIs or databases
Balances throughput with resource constraints

1# Async consumer with concurrency control
2import asyncio
3from asyncio import Semaphore
4
5async def consumer_with_concurrency_limit(consumer_name: str, max_concurrent: int = 5):
6    """Limit concurrent message processing to avoid overwhelming downstream services"""
7    r = await redis.Redis(decode_responses=True)
8    semaphore = Semaphore(max_concurrent)  # Max 5 concurrent processing tasks
9    
10    async def process_with_limit(message_id, data):
11        async with semaphore:  # Acquire semaphore slot
12            await process_payment_async(data)
13            await r.xack('orders:payments', 'payment-processors', message_id)
14    
15    while True:
16        result = await r.xreadgroup(
17            groupname='payment-processors',
18            consumername=consumer_name,
19            streams={'orders:payments': '>'},
20            count=10,
21            block=5000
22        )
23        
24        if result:
25            tasks = []
26            for message_id, data in result[0][1]:
27                task = asyncio.create_task(process_with_limit(message_id, data))
28                tasks.append(task)
29            
30            # Wait for all messages in batch to complete (with concurrency limit)
31            await asyncio.gather(*tasks, return_exceptions=True)
32
33# asyncio.run(consumer_with_concurrency_limit('worker-1', max_concurrent=5))

4. References

李健青, Redis 高手心法, Broadview
Claude Sonnect 4.5

Contents

Contents

1. Consumer Groups

2. Commands for Consumer Group

2.1.XGROUP

2.1.1.XGROUP CREATE - Create Consumer Group

2.1.2.XGROUP SETID - Reset Group Position

2.1.3.XGROUP DESTROY - Delete Consumer Group

2.1.4.XGROUP DELCONSUMER - Remove Consumer

2.2.XREADGROUP

2.2.1.Syntax

2.2.2.Understand PEL (Pending Entry List) Behavior

2.2.3.Examples

2.2.3.1.Basic Consumer Group Read

2.2.3.2.Blocking Read with Consumer Group

2.2.3.3.With Specific ID Value

2.3.XACK

2.3.1.Syntax

2.3.2.What is it?

2.3.3.Examples

2.3.3.1.Complete Workflow (From Read to ACK)

2.3.3.2.Python Script Example

2.4.XINFO

2.4.1.XINFO STREAM - Stream Details

2.4.2.XINFO GROUPS - List Consumer Groups

2.4.3.XINFO CONSUMERS - List Consumers in Group

2.4.4.Python Monitoring Script

2.5.Error Recovery: Ensuring At-Least-Once Delivery

2.5.1.Pattern 1: Consumer Checks Its Own PEL

2.5.2.Pattern 2: Dedicated Recovery Worker

2.5.3.Pattern 3: Combined Approach

2.5.4.Summary: At-Least-Once Delivery Guarantees

2.6.XPENDING

2.6.1.Syntax

2.6.2.Examples

2.6.2.1.Get pending messages from a consumer group

2.6.2.2.Get pending messages from a consumer group idle for more than 60 seconds (60000 ms)

2.6.2.3.Get pending messages from specific consumer

2.6.3.Major Use Case: Finding Stuck Messages

2.7.XCLAIM

2.7.1.Syntax

2.7.2.Examples

2.7.2.1.Basic Claim Flow

2.7.2.2.Automated Recovery Worker

2.7.2.3.Claim Multiple Messages

3. Concurrent Message Processing

3.1.Asyncio

3.1.1.Why Asyncio for Redis Streams?

3.1.2.Basic Asyncio Consumer

3.1.3.Running Multiple Concurrent Consumers

3.1.4.Asyncio with Error Recovery

3.1.5.Asyncio vs Threading Comparison

3.2.Semaphore

4. References

Blog Explorer

2.1.
`XGROUP`

2.1.1.
`XGROUP CREATE` - Create Consumer Group

2.1.2.
`XGROUP SETID` - Reset Group Position

2.1.3.
`XGROUP DESTROY` - Delete Consumer Group

2.1.4.
`XGROUP DELCONSUMER` - Remove Consumer

2.2.
`XREADGROUP`

2.2.1.
Syntax

2.2.2.
Understand `PEL` (Pending Entry List) Behavior

2.2.3.
Examples

2.2.3.1.
Basic Consumer Group Read

2.2.3.2.
Blocking Read with Consumer Group

2.2.3.3.
With Specific ID Value

2.3.
`XACK`

2.3.1.
Syntax

2.3.2.
What is it?

2.3.3.
Examples

2.3.3.1.
Complete Workflow (From Read to `ACK`)

2.3.3.2.
Python Script Example

2.4.
`XINFO`

2.4.1.
`XINFO STREAM` - Stream Details

2.4.2.
`XINFO GROUPS` - List Consumer Groups

2.4.3.
`XINFO CONSUMERS` - List Consumers in Group

2.4.4.
Python Monitoring Script

2.5.
Error Recovery: Ensuring At-Least-Once Delivery

2.5.1.
Pattern 1: Consumer Checks Its Own PEL

2.5.2.
Pattern 2: Dedicated Recovery Worker

2.5.3.
Pattern 3: Combined Approach

2.5.4.
Summary: At-Least-Once Delivery Guarantees

2.6.
`XPENDING`

2.6.1.
Syntax

2.6.2.
Examples

2.6.2.1.
Get pending messages from a consumer group

2.6.2.2.
Get pending messages from a consumer group idle for more than 60 seconds (60000 ms)

2.6.2.3.
Get pending messages from specific consumer

2.6.3.
Major Use Case: Finding Stuck Messages

2.7.
`XCLAIM`

2.7.1.
Syntax

2.7.2.
Examples

2.7.2.1.
Basic Claim Flow

2.7.2.2.
Automated Recovery Worker

2.7.2.3.
Claim Multiple Messages

3.1.
Asyncio

3.1.1.
Why Asyncio for Redis Streams?

3.1.2.
Basic Asyncio Consumer

3.1.3.
Running Multiple Concurrent Consumers

3.1.4.
Asyncio with Error Recovery

3.1.5.
Asyncio vs Threading Comparison

3.2.
Semaphore