Ansible Parallelism and Asynchronous Execution Lab

One of Ansible’s key strengths is its ability to execute tasks across multiple hosts simultaneously and handle long-running operations efficiently. Understanding forks, asynchronous execution, and creative host management allows you to build highly efficient automation that can scale to handle large workloads and time-consuming operations.

Goals

Lab Setup

  1. Ensure you are in the ciq-basics directory

  2. Create a folder for this lab, let’s call it lab05

You are now ready to start the lab.

Understanding Forks

Forks control how many hosts Ansible processes simultaneously. By default, Ansible uses 5 forks, meaning it can execute tasks on up to 5 hosts at the same time. Since we’re running in Ascender with only one physical host, we’ll simulate multiple hosts using the add_host technique.

  1. Create a playbook called forks_demo.yml to demonstrate fork behavior:
- hosts: localhost
  gather_facts: false
  tasks:
    - name: Create virtual hosts to demonstrate fork behavior
      add_host:
        name: "host{{ item }}"
        groups: test_servers
        ansible_connection: local
      loop: "{{ range(1, 9) | list }}"
      changed_when: false

- hosts: test_servers
  gather_facts: true
  tasks:
    - name: Show execution timing with limited forks
      debug:
        msg: "Host {{ inventory_hostname }} starting at {{ ansible_date_time.iso8601 }}"
    
    - name: Simulate work with sleep
      command: sleep 3
    
    - name: Show completion timing
      debug:
        msg: "Host {{ inventory_hostname }} completed at {{ ansible_date_time.iso8601 }}"
  1. Create the Job template in Ascender. Change the forks setting in the Job template to 2. Execute the playbook and observe the timing in the job output.

Sample Output

    TASK [Show execution timing with limited forks] ****************************
    ok: [host1] => {
        "msg": "Host host1 starting at 1703520123"
    }
    ok: [host2] => {
        "msg": "Host host2 starting at 1703520123"
    }
    
    TASK [Simulate work with sleep] *********************************************
    changed: [host1]
    changed: [host2]
    
    TASK [Show completion timing] ***********************************************
    ok: [host1] => {
        "msg": "Host host1 completed at 1703520126"
    }
    ok: [host2] => {
        "msg": "Host host2 completed at 1703520126"
    }
    
    TASK [Show execution timing with limited forks] ****************************
    ok: [host3] => {
        "msg": "Host host3 starting at 1703520126"
    }
    ok: [host4] => {
        "msg": "Host host4 starting at 1703520126"
    }

Notice how only 2 hosts execute simultaneously due to forks: 2.

Things to try

Asynchronous Task Execution

Asynchronous execution allows Ansible to start long-running tasks and continue with other work, checking back later for completion.

  1. Create a playbook called async_demo.yml:
- hosts: localhost
  gather_facts: true
  tasks:
    - name: Start long-running task asynchronously
      command: sleep 10
      async: 15
      poll: 0
      register: long_task
    
    - name: Do other work while task runs
      debug:
        msg: "Doing other work at {{ ansible_date_time.time }}"
      loop: "{{ range(1, 4) | list }}"
    
    - name: Check on async task status
      async_status:
        jid: "{{ long_task.ansible_job_id }}"
      register: job_result
      until: job_result.finished
      retries: 20
      delay: 1
    
    - name: Show final result
      debug:
        msg: "Long task completed: {{ job_result.finished }}"
    
    - name: Multiple async tasks example
      command: "sleep {{ item }}"
      async: 30
      poll: 0
      register: multiple_tasks
      loop: [5, 7, 3, 9]
    
    - name: Wait for all async tasks to complete
      async_status:
        jid: "{{ async_result_item.ansible_job_id }}"
      loop: "{{ multiple_tasks.results }}"
      loop_control:
        loop_var: async_result_item
      register: async_jobs
      until: async_jobs.finished
      retries: 35
      delay: 1
  1. Execute the playbook to see async behavior:

Sample Output

    TASK [Start long-running task asynchronously] ******************************
    changed: [localhost]
    
    TASK [Do other work while task runs] ***************************************
    ok: [localhost] => (item=1) => {
        "msg": "Doing other work at 15:45:23"
    }
    ok: [localhost] => (item=2) => {
        "msg": "Doing other work at 15:45:24"
    }
    ok: [localhost] => (item=3) => {
        "msg": "Doing other work at 15:45:25"
    }
    
    TASK [Check on async task status] ******************************************
    FAILED - RETRYING: [localhost]: Check on async task status (20 retries left).
    FAILED - RETRYING: [localhost]: Check on async task status (19 retries left).
    ok: [localhost]

Things to try

Processing Data as Hosts

One of Ansible’s most powerful features is the ability to treat arbitrary data as hosts using add_host. Normally, processing data items in a loop executes sequentially - one item at a time, which can be very slow for large datasets. However, by treating data as hosts, Ansible can leverage its built-in parallelism (forks) to process multiple data items simultaneously, dramatically speeding up operations. This allows you to process lists, sequences, or any data structure as if each item were a separate host, unlocking parallel processing capabilities for data operations that would otherwise be bottlenecks.

  1. Create a playbook called data_as_hosts.yml:
- hosts: localhost
  gather_facts: false
  connection: local
  tasks:
    - name: Create dynamic host inventory from sequence data
      add_host:
        name: "task_{{ item }}"
        groups: mydata
        task_id: "{{ item }}"
        task_data: "Processing item {{ item }}"
      loop: "{{ range(1, 100) | list }}"
      changed_when: false

- hosts: mydata
  gather_facts: false
  connection: local
  tasks:
    - name: Create file for each data item
      file:
        path: "/tmp/{{ inventory_hostname }}.txt"
        state: touch
    
    - name: Write data to file
      copy:
        content: |
          Host: {{ inventory_hostname }}
          Task ID: {{ task_id }}
          Data: {{ task_data }}
        dest: "/tmp/{{ inventory_hostname }}.txt"
  1. In the Job Template, set the forks to 100. Run the playbook and observe the parallel processing:

Sample Output

    PLAY [localhost] ************************************************************
    
    TASK [Create dynamic host inventory from sequence data] ********************
    ok: [localhost] => (item=1)
    ok: [localhost] => (item=2)
    ...
    ok: [localhost] => (item=20)
    
    PLAY [newhosts] *************************************************************
    
    TASK [Show task start time] ************************************************
    ok: [task_1] => {
        "msg": "Starting task_1 (ID: 1) at 15:47:30"
    }
    ok: [task_2] => {
        "msg": "Starting task_2 (ID: 2) at 15:47:30"
    }
    
    TASK [Simulate processing work] ********************************************
    Pausing for 5 seconds
    (ctrl+C then 'C' = continue early, ctrl+C then 'A' = abort)
    ok: [task_1]
    ok: [task_2]
    
    TASK [Show task completion] *************************************************
    ok: [task_1] => {
        "msg": "Completed task_1 - Processing item 1 at 15:47:35"
    }
    ok: [task_2] => {
        "msg": "Completed task_2 - Processing item 2 at 15:47:35"
    }
    
    TASK [Show task start time] ************************************************
    ok: [task_3] => {
        "msg": "Starting task_3 (ID: 3) at 15:47:35"
    }
    ok: [task_4] => {
        "msg": "Starting task_4 (ID: 4) at 15:47:35"
    }

Things to try

Advanced Parallelism Patterns

  1. Create a playbook demonstrating complex data processing in complex_processing.yml:
- hosts: localhost
  gather_facts: false
  vars:
    processing_data:
      - { id: web01, port: 80, service: nginx, priority: high }
      - { id: web02, port: 80, service: nginx, priority: medium }
      - { id: db01, port: 3306, service: mysql, priority: high }
      - { id: db02, port: 3306, service: mysql, priority: low }
      - { id: cache01, port: 6379, service: redis, priority: medium }
      - { id: app01, port: 8080, service: tomcat, priority: high }
      - { id: app02, port: 8080, service: tomcat, priority: low }
      - { id: app03, port: 8080, service: tomcat, priority: medium }
  
  tasks:
    - name: Create hosts from complex data structures
      add_host:
        name: "{{ item.id }}"
        groups: 
          - services
          - "{{ item.service }}"
          - "{{ item.priority }}_priority"
        service_type: "{{ item.service }}"
        service_port: "{{ item.port }}"
        priority_level: "{{ item.priority }}"
      loop: "{{ processing_data }}"
      changed_when: false

- hosts: services
  gather_facts: false
  forks: 3
  strategy: free
  tasks:
    - name: Process high priority services first
      block:
        - name: High priority service processing
          debug:
            msg: "Processing HIGH priority {{ service_type }} service {{ inventory_hostname }} on port {{ service_port }}"
        
        - name: Simulate service validation
          pause:
            seconds: 2
            
      when: priority_level == "high"
    
    - name: Process medium priority services  
      block:
        - name: Medium priority service processing
          debug:
            msg: "Processing MEDIUM priority {{ service_type }} service {{ inventory_hostname }} on port {{ service_port }}"
        
        - name: Simulate service setup
          pause:
            seconds: 4
            
      when: priority_level == "medium"
    
    - name: Process low priority services
      block:
        - name: Low priority service processing
          debug:
            msg: "Processing LOW priority {{ service_type }} service {{ inventory_hostname }} on port {{ service_port }}"
        
        - name: Simulate background processing
          pause:
            seconds: 6
            
      when: priority_level == "low"
  1. Create a playbook combining async and data processing in async_data_processing.yml:
- hosts: localhost
  gather_facts: false
  vars:
    batch_size: 5
    total_items: 15
  
  tasks:
    - name: Create processing batches
      add_host:
        name: "batch_{{ item }}"
        groups: processing_batches
        batch_id: "{{ item }}"
        start_item: "{{ (item - 1) * batch_size + 1 }}"
        end_item: "{{ item * batch_size }}"
      loop: "{{ range(1, (total_items // batch_size) + 2) | list }}"
      when: "(item - 1) * batch_size < total_items"
      changed_when: false

- hosts: processing_batches  
  gather_facts: false
  forks: 3
  tasks:
    - name: Start batch processing asynchronously
      shell: |
        echo "Processing batch {{ batch_id }}: items {{ start_item }} to {{ end_item }}"
        for i in $(seq {{ start_item }} {{ end_item if end_item <= total_items else total_items }}); do
          echo "Processing item $i in batch {{ batch_id }}"
          sleep 2
        done
        echo "Batch {{ batch_id }} completed"
      async: 30
      poll: 0
      register: batch_job
    
    - name: Show batch started
      debug:
        msg: "Batch {{ batch_id }} started asynchronously with job ID {{ batch_job.ansible_job_id }}"

- hosts: localhost
  gather_facts: false
  tasks:
    - name: Monitor all batch jobs
      async_status:
        jid: "{{ hostvars[item]['batch_job']['ansible_job_id'] }}"
      loop: "{{ groups['processing_batches'] }}"
      register: all_batches
      until: all_batches.finished
      retries: 20
      delay: 2
      
    - name: Show final results
      debug:
        msg: "All batches completed successfully"

Sample Output for Complex Processing

    PLAY [services] *************************************************************
    
    TASK [High priority service processing] ************************************
    ok: [web01] => {
        "msg": "Processing HIGH priority nginx service web01 on port 80"
    }
    ok: [db01] => {
        "msg": "Processing HIGH priority mysql service db01 on port 3306" 
    }
    ok: [app01] => {
        "msg": "Processing HIGH priority tomcat service app01 on port 8080"
    }

Performance Comparison Examples

  1. Create a timing comparison playbook performance_test.yml:
- hosts: localhost
  gather_facts: true
  tasks:
    - name: Record start time
      set_fact:
        test_start: "{{ ansible_date_time.epoch }}"
    
    - name: Create test data (serial processing simulation)
      add_host:
        name: "serial_task_{{ item }}"
        groups: serial_test
        task_num: "{{ item }}"
      loop: "{{ range(1, 11) | list }}"
      changed_when: false

- name: Serial processing test (forks=1)
  hosts: serial_test
  gather_facts: false
  forks: 1
  tasks:
    - name: Simulate work (serial)
      pause:
        seconds: 2
      
- hosts: localhost
  gather_facts: true
  tasks:
    - name: Record serial completion time
      set_fact:
        serial_end: "{{ ansible_date_time.epoch }}"
        serial_duration: "{{ ansible_date_time.epoch | int - test_start | int }}"
    
    - name: Create test data (parallel processing simulation)
      add_host:
        name: "parallel_task_{{ item }}"
        groups: parallel_test  
        task_num: "{{ item }}"
      loop: "{{ range(1, 11) | list }}"
      changed_when: false

- name: Parallel processing test (forks=5)
  hosts: parallel_test
  gather_facts: false
  forks: 5
  tasks:
    - name: Simulate work (parallel)
      pause:
        seconds: 2

- hosts: localhost
  gather_facts: true
  tasks:
    - name: Calculate timing results
      set_fact:
        parallel_end: "{{ ansible_date_time.epoch }}"
        parallel_duration: "{{ ansible_date_time.epoch | int - serial_end | int }}"
    
    - name: Show performance comparison
      debug:
        msg: |
          Performance Comparison:
          Serial (1 fork): {{ serial_duration }} seconds
          Parallel (5 forks): {{ parallel_duration }} seconds
          Performance improvement: {{ ((serial_duration | int - parallel_duration | int) / serial_duration | int * 100) | round(1) }}%

Sample Output

    TASK [Show performance comparison] ******************************************
    ok: [localhost] => {
        "msg": "Performance Comparison:\nSerial (1 fork): 20 seconds\nParallel (5 forks): 4 seconds\nPerformance improvement: 80.0%\n"
    }

Things to try

Real-World Applications

Return to Exercises