Error handling is a critical aspect of robust automation. Ansible provides several mechanisms to control how your playbooks respond to failures, define what constitutes a failure or change, and recover gracefully from unexpected conditions. Understanding these tools allows you to create resilient automation that can handle real-world scenarios effectively.
failed_when to define custom failure conditionschanged_whenEnsure you are in the ciq-basics directory
Create a folder for this lab, let’s call it lab04
You are now ready to start the lab.
The failed_when directive allows you to define custom conditions that determine when a task should be considered failed, regardless of the command’s exit code.
- hosts: all
gather_facts: false
tasks:
- name: Check disk space with custom failure condition
shell: df -h / | tail -n 1 | awk '{print $5}' | sed 's/%//'
register: disk_usage
failed_when: disk_usage.stdout|int > 80
- name: Command that always succeeds but we define failure
command: echo "Operation completed"
register: result
failed_when: "'completed' not in result.stdout"
- name: Check service status with custom failure logic
shell: systemctl is-active NetworkManager || echo "inactive"
register: service_status
failed_when:
- service_status.rc != 0
- "'inactive' in service_status.stdout"
- name: Validate configuration file exists
stat:
path: /etc/hosts
register: config_file
failed_when: not config_file.stat.exists
- name: Multiple failure conditions
shell: uptime
register: uptime_result
failed_when:
- uptime_result.rc != 0
- "'load average' not in uptime_result.stdout"
- uptime_result.stdout == ""Sample Output
TASK [Check disk space with custom failure condition] **********************
ok: [localhost]
TASK [Command that always succeeds but we define failure] ******************
ok: [localhost]
TASK [Check service status with custom failure logic] **********************
ok: [localhost]
TASK [Validate configuration file exists] **********************************
ok: [localhost]
failed_when with register variables from multiple tasksThe changed_when directive allows you to define when a task should be marked as “changed”, giving you precise control over change detection for idempotent playbooks.
# Application Configuration
app_name={{ app_name | default('MyApp') }}
mode={{ app_mode | default('development') }}
debug={{ debug_enabled | default(true) }}
port={{ app_port | default(8080) }}
- hosts: all
gather_facts: false
tasks:
- name: Command that never reports changed
command: date
changed_when: false
- name: Command that always reports changed
command: echo "Configuration updated"
changed_when: true
- name: Update hosts file entry
lineinfile:
path: /etc/hosts
line: "127.0.0.1 myapp.local"
regexp: '^127\.0\.0\.1.*myapp\.local'
state: present
register: hosts_update
changed_when: hosts_update.changed
become: yes
- name: Install package with conditional change detection
package:
name: curl
state: present
register: package_install
changed_when: package_install.changed and 'Nothing to do' not in package_install.msg|default('')
become: yes
- name: Set file permissions with custom change detection
file:
path: /tmp/test_permissions.txt
state: touch
mode: '0644'
owner: root
group: root
register: file_perms
changed_when:
- file_perms.mode != '0644' or
file_perms.owner != 'root' or
file_perms.group != 'root'
become: yes
- name: Create user with conditional change detection
user:
name: testuser
state: present
shell: /bin/bash
home: /home/testuser
register: user_creation
changed_when: user_creation.changed and user_creation.state == 'present'
become: yes
- name: Template deployment with checksum-based change detection
template:
src: app.conf.j2
dest: /tmp/app.conf
backup: yes
vars:
app_mode: production
debug_enabled: false
register: template_result
changed_when:
- template_result.changed
- template_result.checksum != template_result.dest_checksum|default('')
- name: Service management with state-based change detection
service:
name: NetworkManager
state: started
enabled: yes
register: service_result
changed_when:
- service_result.changed
- service_result.state == 'started' or service_result.enabled == true
become: yesSample Output
TASK [Command that never reports changed] **********************************
ok: [localhost]
TASK [Command that always reports changed] *********************************
changed: [localhost]
TASK [Update hosts file entry] *********************************************
changed: [localhost]
TASK [Install package with conditional change detection] *******************
ok: [localhost]
TASK [Template deployment with checksum-based change detection] ************
changed: [localhost]
Notice how:
The ignore_errors directive allows tasks to fail without stopping playbook execution, useful for optional operations or when you want to handle failures manually.
- hosts: all
gather_facts: false
become: true
tasks:
- name: Attempt to stop a service that might not exist
service:
name: nonexistent-service
state: stopped
ignore_errors: yes
- name: Try to remove optional packages
package:
name:
- some-optional-package
- another-optional-package
state: absent
ignore_errors: yes
become: yes
- name: Command that might fail but we continue anyway
shell: cat /path/to/optional/file.txt || echo "File not found, using defaults"
register: file_content
ignore_errors: yes
- name: Show what we got from the previous task
debug:
msg: "File content result: {{ file_content.stdout }}"
- name: Cleanup task that shouldn't stop execution
file:
path: /tmp/temporary-file-that-might-not-exist
state: absent
ignore_errors: yes
- name: This task will always run
debug:
msg: "Playbook execution continued despite previous failures"
- name: Conditional logic based on previous failed tasks
debug:
msg: "Previous file operation failed, using alternative approach"
when: file_content is failedSample Output
TASK [Attempt to stop a service that might not exist] **********************
fatal: [localhost]: FAILED! => {"msg": "Could not find the requested service nonexistent-service"}
...ignoring
TASK [Show what we got from the previous task] *****************************
ok: [localhost] => {
"msg": "File content result: File not found, using defaults"
}
TASK [This task will always run] *******************************************
ok: [localhost] => {
"msg": "Playbook execution continued despite previous failures"
}
ignore_errors with conditionals to create fallback logicignore_errors for optional cleanup operationsRescue blocks provide structured error handling, allowing you to define recovery actions when tasks fail, similar to try-catch blocks in programming languages.
- hosts: all
gather_facts: false
become: true
tasks:
- name: Primary configuration with fallback
block:
- name: Try to copy primary configuration
copy:
src: /path/to/primary/config.yml
dest: /tmp/app-config.yml
- name: Validate configuration
shell: python -c "import yaml; yaml.safe_load(open('/tmp/app-config.yml'))"
rescue:
- name: Primary config failed, using backup
debug:
msg: "Primary configuration failed, falling back to default config"
- name: Create default configuration
copy:
content: |
app:
name: "Default App"
port: 8080
debug: false
dest: /tmp/app-config.yml
- name: Log the fallback action
lineinfile:
path: /tmp/deployment.log
line: "{{ ansible_date_time.iso8601 }}: Used default configuration due to primary config failure"
create: yes
always:
- name: Ensure configuration exists
stat:
path: /tmp/app-config.yml
register: final_config
- name: Report final configuration status
debug:
msg: "Configuration file exists: {{ final_config.stat.exists }}"
- name: Package installation with fallback
block:
- name: Install package from primary repository
package:
name: httpd
state: present
become: yes
- name: Start and enable web service
service:
name: httpd
state: started
enabled: yes
become: yes
rescue:
- name: Package installation failed, trying alternative
debug:
msg: "Primary package installation failed, installing alternative web server"
- name: Install alternative web server
package:
name: nginx
state: present
become: yes
- name: Start and enable nginx service
service:
name: nginx
state: started
enabled: yes
become: yes
- name: Log fallback action
lineinfile:
path: /tmp/deployment.log
line: "{{ ansible_date_time.iso8601 }}: Used nginx instead of httpd due to installation failure"
create: yes
always:
- name: Check if a web server is running
shell: ss -tuln | grep ':80 '
register: webserver_check
ignore_errors: yes
- name: Report web server status
debug:
msg: "Web server is {{ 'running' if webserver_check.rc == 0 else 'not running' }} on port 80"Sample Output
TASK [Try to copy primary configuration] ***********************************
fatal: [localhost]: FAILED! => {"msg": "Could not find or access '/path/to/primary/config.yml'"}
TASK [Primary config failed, using backup] *********************************
ok: [localhost] => {
"msg": "Primary configuration failed, falling back to default config"
}
TASK [Create default configuration] ****************************************
changed: [localhost]
TASK [Report final configuration status] ***********************************
ok: [localhost] => {
"msg": "Configuration file exists: true"
}
TASK [Install package from primary repository] ****************************
ok: [localhost]
TASK [Start and enable web service] ****************************************
changed: [localhost]
TASK [Check if a web server is running] ************************************
ok: [localhost]
TASK [Report web server status] ********************************************
ok: [localhost] => {
"msg": "Web server is running on port 80"
}
when conditions for conditional recoveryError strategies determine how Ansible behaves when tasks fail across multiple hosts, allowing you to control whether execution continues on other hosts when failures occur.
- hosts: localhost
strategy: linear
gather_facts: false
vars:
error_strategy_test: "{{ strategy_type | default('fail_fast') }}"
tasks:
- name: Set error strategy dynamically
set_fact:
ansible_strategy: "{{ error_strategy_test }}"
- name: Task that might fail on some hosts
shell: |
# Simulate random failure for demonstration
if [ $(( RANDOM % 3 )) -eq 0 ]; then
echo "Simulated failure"
exit 1
else
echo "Task succeeded"
fi
register: random_task
- name: This task only runs if previous succeeded
debug:
msg: "Previous task output: {{ random_task.stdout }}"- hosts: localhost
strategy: free
gather_facts: false
serial: 1
max_fail_percentage: 30
tasks:
- name: Show current strategy
debug:
msg: "Running with strategy: {{ ansible_strategy | default('linear') }}"
- name: Demonstrate free strategy behavior
shell: sleep {{ ansible_play_hosts.index(inventory_hostname) + 1 }}; echo "Host {{ inventory_hostname }} completed"
- name: Task with failure tolerance
shell: |
# Different behavior based on host
case "{{ inventory_hostname }}" in
*1) exit 0 ;; # Success
*2) exit 1 ;; # Failure
*) echo "Processing..." ;;
esac
ignore_errors: "{{ ansible_strategy == 'free' }}"
- name: Cleanup task that always runs
debug:
msg: "Cleanup completed on {{ inventory_hostname }}"- hosts: localhost
gather_facts: false
strategy: linear
any_errors_fatal: false
max_fail_percentage: 50
tasks:
- name: Critical task that must succeed
shell: echo "Critical operation"
any_errors_fatal: true
- name: Optional task with custom error handling
block:
- name: Risky operation
shell: |
if [ $(( RANDOM % 2 )) -eq 0 ]; then
echo "Operation succeeded"
else
echo "Operation failed"
exit 1
fi
register: risky_op
rescue:
- name: Handle the failure
set_fact:
operation_status: "failed"
fallback_used: true
- name: Implement fallback
shell: echo "Using fallback procedure"
register: fallback_result
always:
- name: Log operation result
debug:
msg: |
Operation status: {{ operation_status | default('success') }}
Fallback used: {{ fallback_used | default(false) }}
- name: Final validation
assert:
that:
- risky_op is succeeded or fallback_result is succeeded
fail_msg: "Neither primary operation nor fallback succeeded"
success_msg: "Operation completed successfully (primary or fallback)"Sample Output
TASK [Show current strategy] ********************************************
ok: [localhost] => {
"msg": "Running with strategy: linear"
}
TASK [Critical task that must succeed] **********************************
ok: [localhost]
TASK [Handle the failure] ************************************************
ok: [localhost]
TASK [Final validation] *************************************************
ok: [localhost] => {
"msg": "Operation completed successfully (primary or fallback)"
}
Available Error Strategies:
linear (default): Tasks execute on all hosts before moving to next taskfree: Each host executes tasks as fast as possible independentlyhost_pinned: Tasks are assigned to specific hosts and stay theredebug: Interactive debugging mode for troubleshootingStrategy Options:
serial: Control how many hosts execute tasks simultaneouslymax_fail_percentage: Set failure threshold before stopping executionany_errors_fatal: Stop all execution if any host failsmax_fail_percentage in multi-host scenariosany_errors_fatal for critical sections of deployment playbooks- hosts: localhost
gather_facts: false
vars:
max_retries: 3
retry_delay: 2
tasks:
- name: Retry logic with custom failure handling
include_tasks: retry_task.yml
vars:
task_name: "Connect to external service"
command_to_run: "curl -f http://httpbin.org/status/{{ item }}"
expected_failures: [404, 500]
loop: [200, 404, 200]
register: retry_results
- name: Conditional error handling based on error type
shell: |
case $RANDOM in
*1) exit 1 ;; # Retriable error
*2) exit 2 ;; # Fatal error
*) echo "Success" ;;
esac
register: operation_result
failed_when: false
- name: Handle different error types
block:
- name: Check for fatal errors
fail:
msg: "Fatal error occurred, cannot continue"
when: operation_result.rc == 2
- name: Handle retriable errors
debug:
msg: "Retriable error detected, implement retry logic"
when: operation_result.rc == 1
rescue:
- name: Log error details
copy:
content: |
Error occurred at: {{ ansible_date_time.iso8601 }}
Task: {{ ansible_failed_task.name }}
Error: {{ ansible_failed_result.msg }}
dest: /tmp/error.log
- name: Send notification (placeholder)
debug:
msg: "Would send alert about critical failure"- name: "{{ task_name }} - Attempt {{ item }}"
shell: "{{ command_to_run }}"
register: task_result
failed_when:
- task_result.rc != 0
- task_result.rc not in (expected_failures | default([]))
retries: "{{ max_retries }}"
delay: "{{ retry_delay }}"
until: task_result.rc == 0Sample Output
TASK [Retry logic with custom failure handling] ************************
ok: [localhost] => (item=200)
failed: [localhost] (item=404) => {"msg": "Expected failure code encountered"}
ok: [localhost] => (item=200)