Retry Configuration¶
Prodigy provides sophisticated retry mechanisms with multiple backoff strategies to handle transient failures gracefully. The retry system supports both command-level and workflow-level configurations with fine-grained control over retry behavior.
Overview¶
Prodigy has two retry systems that work together:
- Enhanced Retry System - Rich, configurable retry with multiple backoff strategies, jitter, circuit breakers, and conditional retry (from
src/cook/retry_v2.rs) - Workflow-Level Retry - Simpler retry configuration for workflow-level error policies (from
src/cook/workflow/error_policy.rs)
This chapter focuses on the enhanced retry system which provides comprehensive retry capabilities. Circuit breakers prevent cascading failures by temporarily stopping retries when a threshold of consecutive failures is reached.
When to Use Each Retry System¶
Use Enhanced Retry (retry_v2) for: - Individual command execution failures (API calls, shell commands, file operations) - Operations needing fine-grained control over backoff strategies - Situations requiring conditional retry based on error types - Commands where jitter is needed to prevent thundering herd - External API calls with rate limiting - Operations benefiting from circuit breakers
Use Workflow-Level Retry (error_policy) for: - MapReduce work item failures - Workflow-wide error handling policies - Bulk operations requiring Dead Letter Queue (DLQ) integration - Scenarios needing failure thresholds and batch error collection - When you want to retry entire work items rather than individual commands
For a detailed comparison with examples, see Workflow-Level vs Command-Level Retry.
RetryConfig Structure¶
This table documents the enhanced retry system (retry_v2::RetryConfig). For workflow-level retry configuration, see the Workflow-Level vs Command-Level Retry subsection.
The RetryConfig struct controls retry behavior with the following fields:
| Field | Type | Default | Description |
|---|---|---|---|
attempts |
u32 |
3 |
Maximum number of retry attempts |
backoff |
BackoffStrategy |
Exponential (base: 2.0) |
Strategy for calculating delays between retries |
initial_delay |
Duration |
1s |
Initial delay before first retry |
max_delay |
Duration |
30s |
Maximum delay between any two retries |
jitter |
bool |
false |
Whether to add randomness to delays |
jitter_factor |
f64 |
0.3 |
Amount of jitter (0.0 to 1.0) |
retry_on |
Vec<ErrorMatcher> |
[] |
Retry only on specific error types (empty = retry all) |
retry_budget |
Option<Duration> |
None |
Maximum total time for all retry attempts |
on_failure |
FailureAction |
Stop |
Action to take after all retries exhausted |
Source: RetryConfig struct defined in src/cook/retry_v2.rs:14-52
YAML Configuration Syntax¶
The RetryConfig fields map to YAML workflow syntax as follows:
commands:
- shell: "your-command-here"
retry_config:
attempts: 5 # RetryConfig.attempts (u32)
backoff:
type: exponential # BackoffStrategy::Exponential
base: 2.0 # exponential base multiplier
initial_delay: "1s" # RetryConfig.initial_delay (humantime format)
max_delay: "30s" # RetryConfig.max_delay (humantime format)
jitter: true # RetryConfig.jitter (bool)
jitter_factor: 0.3 # RetryConfig.jitter_factor (0.0-1.0)
retry_on: # RetryConfig.retry_on (Vec<ErrorMatcher>)
- network
- timeout
- server_error
retry_budget: "5m" # RetryConfig.retry_budget (Optional<Duration>)
on_failure: stop # RetryConfig.on_failure (FailureAction)
Alternative Backoff Strategies:
# Fixed delay
backoff: fixed
# Linear backoff
backoff:
type: linear
increment: "2s"
# Fibonacci backoff
backoff: fibonacci
# Custom delay sequence
backoff:
type: custom
delays: ["1s", "2s", "5s", "10s"]
Note: Field names use snake_case in YAML but map to the exact struct fields in src/cook/retry_v2.rs:14-52. Duration values use humantime format (e.g., "1s", "30s", "5m").
For complete working examples, see Complete Examples.
Circuit Breakers¶
Circuit breakers are configured separately via RetryExecutor, not as part of RetryConfig. Circuit breakers provide fail-fast behavior when downstream systems are consistently failing, preventing resource exhaustion from repeated failed retries.
Configuration (programmatic):
let executor = RetryExecutor::new(retry_config)
.with_circuit_breaker(
5, // failure_threshold: open after 5 consecutive failures
Duration::from_secs(30) // recovery_timeout: attempt recovery after 30 seconds
);
Source: src/cook/retry_v2.rs:184-188 (with_circuit_breaker method), src/cook/retry_v2.rs:325-397 (CircuitBreaker implementation)
Circuit States: - Closed: Normal operation, retries are attempted - Open: Circuit tripped, requests fail immediately without retry - HalfOpen: Testing recovery, limited requests allowed
Additional Topics¶
See also: - Basic Retry Configuration - Backoff Strategies - Backoff Strategy Comparison - Jitter for Distributed Systems - Conditional Retry with Error Matchers - Retry Budget - Failure Actions - Complete Examples - Workflow-Level vs Command-Level Retry - Retry Metrics and Observability - Troubleshooting - Implementation References
Related Topics¶
This section provides links to related documentation that complements retry configuration. Understanding these topics will help you build more resilient workflows.
Within This Chapter¶
The following subsections provide detailed information about specific aspects of retry configuration:
- Basic Retry Configuration - Start here to understand fundamental retry configuration syntax and options
- Backoff Strategies - Control the timing between retry attempts using exponential, linear, Fibonacci, or constant delays
- Backoff Strategy Comparison - Compare different backoff strategies with examples and use cases
- Failure Actions - Define custom actions to execute when commands fail or exhaust retries
- Conditional Retry with Error Matchers - Use error patterns to selectively retry only specific failures
- Jitter for Distributed Systems - Add randomization to retry delays to prevent thundering herd problems
- Retry Budget - Limit total retry attempts across your workflow to prevent infinite retry loops
- Retry Metrics and Observability - Monitor retry behavior through events and logging
- Workflow-Level vs Command-Level Retry - Understand the differences between retry scopes and when to use each
- Complete Examples - Real-world retry configuration examples demonstrating various strategies
- Troubleshooting - Debug common retry configuration issues
- Implementation References - Links to source code implementing retry logic
Related Chapters¶
These chapters cover topics that interact with or complement retry configuration:
- Error Handling - Overall error handling strategy and how Prodigy propagates errors through workflows. Retry configuration is one component of a comprehensive error handling approach.
- Workflow Configuration - Workflow-level settings including global retry defaults that apply to all commands unless overridden at the command level.
- MapReduce - Retry behavior in MapReduce workflows, where individual map agents can retry independently. MapReduce adds complexity to retry semantics due to parallel execution.
- Dead Letter Queue (DLQ) - Handling failed work items in MapReduce workflows. When map agents exhaust all retries, items move to the DLQ for manual inspection and retry.
- Environment Variables - Use environment variables in retry configuration to parameterize retry behavior across different deployment environments (dev, staging, production).