Docker-Compose: Auto-Restart Containers on Failure — When to Use always vs on-failure

Docker-Compose: Auto-Restart Containers on Failure — When to Use always vs on-failure

When running multi-container setups, unexpected crashes happen. Docker-Compose’s restart policies help avoid downtime by automatically restarting containers based on failure conditions. Choosing the right restart policy ensures your services stay resilient without masking genuine issues.

Restart Policies at a Glance

Docker supports four primary restart policies:

  • no (default): Don’t restart containers automatically.
  • always: Always restart container regardless of exit code.
  • on-failure[:max-retries]: Restart only if container exits with a non-zero status, optionally limiting retries.
  • unless-stopped: Like always, but don’t restart if the container was manually stopped.

Practical Guidance: When to Use always vs on-failure

Use Case Recommended Policy Why?
Critical services needing uptime always Ensures container restarts no matter what.
Services where failure means something needs fixing on-failure (with max retries) Avoids infinite restart loops on config bugs.

Use always for Core Infrastructure Components

Example: Databases, API gateways, or workers that must run continuously regardless of exit codes.

services:
  db:
    image: postgres:latest
    restart: always
  • Restarts even on manual exits or exit 0.
  • Keeps core services highly available.
  • Stops only if explicitly stopped.

Use on-failure to Catch Genuine Failures Without Masking Issues

Example: A batch job container that should restart on crashes but fail loudly and stop after repeated errors.

services:
  batch-job:
    image: my-batch-job:latest
    restart: on-failure:5
  • Retries up to 5 times on non-zero exit codes.
  • Does not restart on clean shutdown (exit 0).
  • Prevents endless restart loops if there's a persistent bug.

Pro Tips and Edge Cases

  • Combine on-failure with proper logging and alerting to surface recurring failures.
  • Use unless-stopped to avoid restarts after manual intervention, ideal during maintenance windows.
  • Be cautious with always in development; it can obscure problems by hiding container crashes in an endless restart loop.
  • on-failure does not restart containers that exit with status 0, so use it only when non-zero exit means “needs restart.”

Summary

Policy Restarts on zero? Restarts on failure? Stops on manual stop? Best For
no No No N/A Temporary/test containers
always Yes Yes No Critical always-on services
on-failure No Yes (with optional max retries) No Fault detection with controlled retries
unless-stopped Yes Yes Yes Persistent services with manual stop control

For production, prefer always for crucial containers, and on-failure with retry limits to safely restart transient jobs without masking bugs.


Implement restart policies thoughtfully to improve availability and observability—your infrastructure will thank you.


Keep the containers running. Keep the downtime minimal.