Ubuntu Dynamic Workers are failing to lease

Incident Report for Octopus Deploy

Postmortem

Summary

On 2 Feb 2026, between 20:13:34 to 22:56:04 UTC, Octopus Cloud customers in West US 2 and West Europe may have experienced failed deployments or failed runbook runs due to Ubuntu Dynamic Worker steps failing on Leasing timeout. 

This disruption was caused by Azure failing to provision Virtual Machines across multiple regions - see Azure Issue FNJ8-VQZ on Azure Status History

Background

Octopus Cloud Dynamic Workers are isolated virtual machines that we provide as part of our Octopus Cloud Subscription offering as a way to execute deployment and runbook steps and scripts, without needing to run on the Octopus Server or deployment targets themselves. Customers can use both Windows and Ubuntu Dynamic Workers. 

Octopus provides a dynamic worker pool of these virtual machine types from which, as required by your deployment/runbook steps, your Octopus Cloud will exclusively lease a freshly provisioned dynamic worker VM for a limited time.

Dynamic Workers Lifecycle

  1. Provisioning - a new Azure Virtual Machine is provisioned, using the requested Dynamic Worker image (Windows/Ubuntu with a set of pre-installed tools).
  2. Pool - the newly created Dynamic Worker (VM) is placed in a worker pool until an instance requests a Dynamic Worker in one of its deployments/runbooks steps. Each region (US, Europe and Australia) has different pools for Windows and Ubuntu workers, with additional standby pools that can be turned on in cases of a temporary outage in an Azure region (see below). Octopus Cloud continuously monitors the pools’ levels and provision new workers automatically to keep them full.
  3. Leasing - When a deployment/runbook that uses a Dynamic Worker starts, if the instance doesn’t have a leased worker already (in which case, it will continue using this worker, extending the worker’s lease for the new run), the Octopus Server will request a new worker from the appropriate pool. This worker will be exclusively leased to this instance until it is no longer needed (i.e. wasn’t used for an hour) or until its maximum lifespan is reached (3 days by default). 
  4. Deletion - After a worker is no longer needed or it has reached its maximum lifespan, it is considered expired and will be deleted by the system automatically. The next time that the same instance will require a worker, it will lease a new one from the pool (see above).

System Resilience & Safeguards

Octopus Cloud implements multiple layers of protection to ensure Dynamic Workers’ availability and minimize customer impact in cases of service disruptions:

  • Pre-provisioned Worker Pools: We maintain multiple dynamic worker pools (for the different virtual machine OS and sizes) with ready-to-use workers to provide immediate availability when deployments/runbooks are triggered, rather than waiting for on-demand provisioning. This also provides a safety buffer in cases where we can’t provision new Dynamic Workers due to temporary outages.
  • Standby Services in Multiple Regions: We maintain standby Dynamic Worker services in alternate Azure regions that we activate to provide continuity when a primary region experiences issues

Key Timing

Timeline and Impact

All dates and times below are in UTC

Feb 2 2026:

18:52: 1st failed Dynamic Worker provisioning - At this point, our Dynamic Workers Service continued supplying workers successfully from the pools. However, since we couldn’t provision new workers, the pools started depleting. 

20:13: 1st Ubuntu Dynamic Worker lease failed in West US2 Azure region (once the pool was depleted) - start of customer impact

20:47: Octopus on-call was paged after 3 Dynamic Worker lease requests failed in West US2. The on-call then started the incident to investigate the issue

20:47-21:20

  • Octopus engineers started setting up a Dynamic Workers Service on a different Azure region in the US to mitigate the issue. 
  • During the investigation we saw that Dynamic Workers were also failing to provision in the West Europe and East Australia Azure regions. At this point we realized that this was a multi-region outage in Azure and decided to open a support ticket with them.
  • Octopus Engineers turned off non-essential services (e.g. Instance Upgrades) to preserve the available Dynamic Workers for customer use.

21:24: an on-call engineer opened a Sev A support ticket with Azure

21:31: We published the initial partial outage alert for Octopus Cloud on https://status.octopus.com/ 

21:33: 1st Dynamic Worker lease failed in West Europe

21:42: Azure acknowledged the multi-region issue and reported that they are investigating it.

22:56: We saw the last Dynamic Worker lease failure. After this time, we were able to provision the required Virtual Machines and return the Dynamic Workers successfully to all lease requests.

23:46: After verifying that all Dynamic Worker pools have been restored and we didn’t see any additional provisioning failures, we updated the incident status to “Mitigated”, and updated the Status page.

Feb 3 2026:

6:05: Azure confirmed that the issue was fully resolved on their side.

21:38: Incident was resolved and Status page updated

Technical Details

Octopus Cloud uses Azure Virtual Machines in order to supply Dynamic Workers to customers. During this Azure outage, we couldn’t provision new Azure Virtual Machines for our Ubuntu Dynamic Workers. Our pre-provisioned Dynamic Workers’ pools continued supplying Dynamic Workers for additional 

  • 1:21 hours in West US 2 and 
  • 2:41 hours in West Europe 

before they were depleted and customer requests for new Dynamic Workers started failing. It’s worth noting that East Australia customers were not impacted because the pools in this region didn’t deplete during the incident. Additionally, customers that already had a Dynamic Worker leased at the time of the outage were not impacted (unless their Dynamic Worker expired so a new one was requested during the incident).

We made preparations to switch to our Dynamic Workers Standby Service on different Azure regions. However, once we realized that this was a multi-region outage, we decided to revert the switch since it wouldn’t have resolved the issue.

Once we saw that we couldn’t supply Dynamic Workers from our standby regions, we turned off non-essential services (e.g. Instance Upgrades) to preserve the available Dynamic Workers for customer use. 

Once Azure mitigated the issue on their side, our system recovered automatically and resumed providing new Dynamic Workers successfully. 

Remediation

Octopus takes service availability seriously. Despite the difficulty with upstream cloud provider outages, especially ones that are widespread across multiple regions, we fully review and remediate any outages that occur. We do this so that we’re continuously improving and maintaining the best possible service we can.

Following our post mortem, we identified the following improvements to our system to help identify and mitigate (where possible) similar issues earlier:

  • Page an on-call earlier -  add an alert to page the on-call when multiple Dynamic Workers in the same region fail to provision. This will allow us to detect similar incidents quicker and give us more time to mitigate the issue, before any Dynamic Worker leases fail and customers are impacted.
  • Improve our Dynamic Workers Incident Playbook to identify multi-regions incidents quicker in order to engage Azure support earlier to resolve the root cause of similar incidents. 

Conclusion

We apologize to our customers for any disruption and inconvenience as a result of this incident.

We have started work on the identified remediations to ensure that we can detect similar incidents more quickly and reduce the impact on our customers as much as possible.

Posted Feb 13, 2026 - 05:10 UTC

Resolved

Azure has resolved the issue that was causing our Dynamic Workers lease failures and we haven't seen any additional failures since yesterday.
Posted Feb 03, 2026 - 21:38 UTC

Update

We are continuing to monitor for any further issues.
Posted Feb 02, 2026 - 23:49 UTC

Monitoring

Azure have rolled out a fix for this issue and we are seeing Dynamic Workers return to normal operation across all regions.
Posted Feb 02, 2026 - 23:46 UTC

Identified

Azure have identified the following issue on their side that affects the Dynamic Workers: Virtual Machines and dependent services - Service management issues in multiple regions. They are actively working to mitigate impact and expect it to be resolved by approximately 00:00 UTC. See Azure status page for additional details: https://azure.status.microsoft/en-gb/status
Posted Feb 02, 2026 - 22:09 UTC

Investigating

Octopus Cloud instances are failing to lease Ubuntu Dynamic Workers due to an issue with our upstream provider. We are currently investigating and working on mitigating the issue.
Posted Feb 02, 2026 - 21:31 UTC
This incident affected: Octopus Cloud.