Docker-Compose: Auto-Restart Containers on Failure — When to Use always vs on-failure
Docker-Compose: Auto-Restart Containers on Failure — When to Use always
vs on-failure
When running multi-container setups, unexpected crashes happen. Docker-Compose’s restart policies help avoid downtime by automatically restarting containers based on failure conditions. Choosing the right restart policy ensures your services stay resilient without masking genuine issues.
Restart Policies at a Glance
Docker supports four primary restart policies:
no
(default): Don’t restart containers automatically.always
: Always restart container regardless of exit code.on-failure[:max-retries]
: Restart only if container exits with a non-zero status, optionally limiting retries.unless-stopped
: Likealways
, but don’t restart if the container was manually stopped.
Practical Guidance: When to Use always
vs on-failure
Use Case | Recommended Policy | Why? |
---|---|---|
Critical services needing uptime | always |
Ensures container restarts no matter what. |
Services where failure means something needs fixing | on-failure (with max retries) |
Avoids infinite restart loops on config bugs. |
Use always
for Core Infrastructure Components
Example: Databases, API gateways, or workers that must run continuously regardless of exit codes.
services:
db:
image: postgres:latest
restart: always
- Restarts even on manual exits or
exit 0
. - Keeps core services highly available.
- Stops only if explicitly stopped.
Use on-failure
to Catch Genuine Failures Without Masking Issues
Example: A batch job container that should restart on crashes but fail loudly and stop after repeated errors.
services:
batch-job:
image: my-batch-job:latest
restart: on-failure:5
- Retries up to 5 times on non-zero exit codes.
- Does not restart on clean shutdown (
exit 0
). - Prevents endless restart loops if there's a persistent bug.
Pro Tips and Edge Cases
- Combine
on-failure
with proper logging and alerting to surface recurring failures. - Use
unless-stopped
to avoid restarts after manual intervention, ideal during maintenance windows. - Be cautious with
always
in development; it can obscure problems by hiding container crashes in an endless restart loop. on-failure
does not restart containers that exit with status 0, so use it only when non-zero exit means “needs restart.”
Summary
Policy | Restarts on zero? | Restarts on failure? | Stops on manual stop? | Best For |
---|---|---|---|---|
no |
No | No | N/A | Temporary/test containers |
always |
Yes | Yes | No | Critical always-on services |
on-failure |
No | Yes (with optional max retries) | No | Fault detection with controlled retries |
unless-stopped |
Yes | Yes | Yes | Persistent services with manual stop control |
For production, prefer always
for crucial containers, and on-failure
with retry limits to safely restart transient jobs without masking bugs.
Implement restart policies thoughtfully to improve availability and observability—your infrastructure will thank you.
Keep the containers running. Keep the downtime minimal.