Complete Examples

Complete Examples¶

This section provides complete, runnable YAML workflow examples demonstrating various retry configurations.

Source: Examples based on test patterns from src/cook/retry_v2.rs:463-748

Example 1: Basic Retry with Exponential Backoff¶

Simple API call with standard exponential backoff:

name: fetch-api-data
mode: standard

commands:
  - shell: "curl -f https://api.example.com/data"
    retry_config:
      attempts: 5
      backoff: exponential
      initial_delay: "1s"
      max_delay: "30s"

When it's useful: - External API calls - Network-dependent operations - Transient failure recovery

Retry sequence: - Attempt 1: Immediate - Attempt 2: ~2s delay - Attempt 3: ~4s delay - Attempt 4: ~8s delay - Attempt 5: ~16s delay

Example 2: Exponential Backoff with Jitter (Distributed Systems)¶

Multiple parallel agents with jitter to prevent thundering herd:

name: parallel-processing
mode: mapreduce

map:
  input: "items.json"
  json_path: "$.items[*]"
  max_parallel: 10

  agent_template:
    - shell: "process-item ${item.id}"
      retry_config:
        attempts: 5
        backoff: exponential
        initial_delay: "1s"
        max_delay: "30s"
        jitter: true          # Critical for parallel agents
        jitter_factor: 0.3    # 30% randomization

Why jitter matters: Without jitter, all 10 parallel agents would retry at exactly the same time, overwhelming the recovering service.

Source: Jitter implementation in src/cook/retry_v2.rs:308-317

Example 3: Conditional Retry with Error Matchers¶

Only retry transient errors, fail fast on permanent errors:

name: selective-retry
mode: standard

commands:
  - shell: "curl -f https://api.example.com/resource"
    retry_config:
      attempts: 5
      backoff: exponential
      initial_delay: "1s"
      max_delay: "30s"
      retry_on:
        - network        # Connection issues
        - timeout        # Slow responses
        - server_error   # 5xx errors

Behavior: - Retries: Network errors, timeouts, 500/502/503/504 - Fails immediately: 404, 401, 400 (permanent errors)

Source: ErrorMatcher enum in src/cook/retry_v2.rs:100-151

Example 4: Retry Budget to Prevent Infinite Loops¶

High retry attempts with time-based cap:

name: budget-limited-retry
mode: standard

commands:
  - shell: "long-running-operation"
    retry_config:
      attempts: 100          # High attempt count
      backoff: fibonacci
      initial_delay: "1s"
      max_delay: "60s"
      retry_budget: "10m"    # But never exceed 10 minutes total

Why: Prevents endless retries while allowing many attempts for operations that typically succeed eventually.

Source: retry_budget field in src/cook/retry_v2.rs:46-47, tests at lines 675-708

Example 5: Fallback on Failure¶

Use cached data when API fails:

name: fallback-example
mode: standard

commands:
  - shell: "curl -f https://api.example.com/live-data"
    retry_config:
      attempts: 3
      backoff: exponential
      initial_delay: "2s"
      max_delay: "10s"
      retry_on:
        - network
        - timeout
      on_failure:
        fallback:
          command: "cat /cache/data.json"

  # Continue processing with either live or cached data
  - shell: "process-data data.json"

Execution flow: 1. Try to fetch live data (3 attempts with exponential backoff) 2. If all attempts fail → Use cached data 3. Continue with processing

Source: FailureAction::Fallback in src/cook/retry_v2.rs:164

Example 6: Continue on Failure (Non-Critical Operations)¶

Allow workflow to continue even if optional operations fail:

name: mixed-criticality
mode: standard

commands:
  # Critical: must succeed
  - shell: "cargo build"
    retry_config:
      attempts: 3
      on_failure: stop

  # Optional: nice to have but not critical
  - shell: "notify-slack 'Build started'"
    retry_config:
      attempts: 2
      initial_delay: "5s"
      on_failure: continue    # Don't fail workflow if notification fails

  # Critical: must succeed
  - shell: "cargo test"
    retry_config:
      attempts: 3
      on_failure: stop

Use case: Separating critical operations from best-effort operations.

Source: FailureAction::Continue in src/cook/retry_v2.rs:162

Example 7: Rate Limit Handling¶

Handle API rate limits with long delays:

name: rate-limit-aware
mode: standard

commands:
  - shell: "api-call.sh"
    retry_config:
      attempts: 10
      backoff: exponential
      initial_delay: "60s"    # Start with 1-minute delay
      max_delay: "10m"        # Cap at 10 minutes
      retry_on:
        - rate_limit          # Only retry on 429 errors

Why: Rate limits often require longer delays than network errors.

Source: ErrorMatcher::RateLimit in src/cook/retry_v2.rs:143-147

Example 8: Custom Pattern Matching¶

Retry database-specific errors:

name: database-retry
mode: standard

commands:
  - shell: "sqlite3 db.sqlite 'INSERT INTO ...'"
    retry_config:
      attempts: 5
      backoff: linear
      initial_delay: "100ms"
      retry_on:
        - pattern: "database.*locked"
        - pattern: "SQLITE_BUSY"
        - pattern: "cannot commit.*in progress"

Source: ErrorMatcher::Pattern in src/cook/retry_v2.rs:113

Example 9: Fibonacci Backoff for Gradual Recovery¶

Gentler backoff curve for services needing recovery time:

name: fibonacci-backoff-example
mode: standard

commands:
  - shell: "connect-to-recovering-service.sh"
    retry_config:
      attempts: 8
      backoff: fibonacci
      initial_delay: "1s"
      max_delay: "60s"

Delay sequence: 1s, 2s, 3s, 5s, 8s, 13s, 21s, 34s

Why Fibonacci: Grows slower than exponential, giving services more time to recover without aggressive backoff.

Source: Fibonacci calculation in src/cook/retry_v2.rs:424-440

Example 10: Linear Backoff for Predictable Delays¶

Testing or debugging with consistent delays:

name: linear-backoff-example
mode: standard

commands:
  - shell: "test-operation.sh"
    retry_config:
      attempts: 5
      backoff:
        linear:
          increment: "3s"
      initial_delay: "1s"

Delay sequence: 1s, 4s, 7s, 10s, 13s (initial + n * increment)

Source: BackoffStrategy::Linear in src/cook/retry_v2.rs:77-80

Example 11: Fixed Delay for Polling¶

Consistent polling interval:

name: polling-example
mode: standard

commands:
  - shell: "check-job-status.sh"
    retry_config:
      attempts: 20
      backoff: fixed
      initial_delay: "5s"

Delay sequence: 5s between every attempt

Use case: Status polling, health checks

Source: BackoffStrategy::Fixed in src/cook/retry_v2.rs:75

Example 12: Complex Multi-Command Workflow¶

Real-world example combining multiple retry strategies:

name: deployment-workflow
mode: standard

commands:
  # Step 1: Build (critical, retry network issues)
  - shell: "cargo build --release"
    retry_config:
      attempts: 3
      backoff: exponential
      retry_on:
        - network
      on_failure: stop

  # Step 2: Run tests (critical, no retry on real failures)
  - shell: "cargo test"
    retry_config:
      attempts: 2
      initial_delay: "5s"
      retry_on:
        - pattern: "temporary.*failure"
      on_failure: stop

  # Step 3: Upload artifacts (retry with backoff)
  - shell: "upload-to-s3.sh artifacts/"
    retry_config:
      attempts: 5
      backoff: exponential
      initial_delay: "2s"
      max_delay: "60s"
      jitter: true
      retry_on:
        - network
        - timeout
        - server_error
      on_failure:
        fallback:
          command: "save-to-local-backup.sh artifacts/"

  # Step 4: Notify (optional, don't block on failure)
  - shell: "notify-deployment.sh"
    retry_config:
      attempts: 2
      initial_delay: "5s"
      on_failure: continue

  # Step 5: Health check (retry with fixed delay)
  - shell: "health-check.sh"
    retry_config:
      attempts: 10
      backoff: fixed
      initial_delay: "10s"
      on_failure: stop

Example 13: MapReduce with DLQ and Retry¶

MapReduce workflow with error handling:

name: mapreduce-with-retry
mode: mapreduce

error_policy:
  on_item_failure: dlq        # Send failures to Dead Letter Queue
  continue_on_failure: true   # Keep processing other items
  max_failures: 5             # Stop if more than 5 items fail

map:
  input: "work-items.json"
  json_path: "$.items[*]"
  max_parallel: 10

  agent_template:
    - shell: "process-item ${item.id}"
      retry_config:
        attempts: 3
        backoff: exponential
        initial_delay: "1s"
        max_delay: "30s"
        jitter: true          # Important for parallel agents
        retry_on:
          - network
          - timeout
        on_failure: stop      # Let DLQ handle final failures

reduce:
  - shell: "aggregate-results ${map.results}"

Error handling flow: 1. Each work item is retried up to 3 times per agent 2. If all retries fail → Item goes to DLQ 3. Processing continues for other items 4. After map phase, retry DLQ items with: prodigy dlq retry <job_id>

Source: Workflow-level retry in src/cook/workflow/error_policy.rs:90-129

Testing Your Retry Configuration¶

Validate retry behavior with controlled failures:

name: test-retry-behavior
mode: standard

commands:
  # Use a script that fails N times then succeeds
  - shell: "./fail-then-succeed.sh 2"  # Fails 2 times, succeeds on 3rd
    retry_config:
      attempts: 5
      backoff: exponential
      initial_delay: "1s"

fail-then-succeed.sh example:

#!/bin/bash
FAIL_COUNT=${1:-2}
STATE_FILE="/tmp/retry-test-$$"

if [ ! -f "$STATE_FILE" ]; then
  echo "0" > "$STATE_FILE"
fi

CURRENT=$(cat "$STATE_FILE")
NEXT=$((CURRENT + 1))
echo "$NEXT" > "$STATE_FILE"

if [ "$NEXT" -le "$FAIL_COUNT" ]; then
  echo "Attempt $NEXT: Simulated failure"
  exit 1
else
  echo "Attempt $NEXT: Success!"
  rm "$STATE_FILE"
  exit 0
fi