Octopus.com sign-in, docs, blogs and downloads unavailable
Incident Report for Octopus Deploy
Postmortem

Between 8:00am and 10:30am UTC, February 9, 2023, sections of octopus.com intermittently returned 503 responses. The affected routes were /signin, /blogs, and /docs.

Background

Octopus Deploy recently migrated our DNS management to a new provider to centralize our infrastructure.

During the migration, we set the web application firewall (WAF) in front of octopus.com to detection mode. At the same time, we tuned the ruleset to prevent false positives from blocking legitimate customer access to Octopus systems.

Key timings

Timeline

(All dates and times below are shown in UTC.)

Thursday, February 9, 2023

08:05 Our automated systems detected decreased availability in sections of the octopus.com website.

08:35 Engineers on call were notified.

08:56 Status Page updated: An incident was declared.

10:30 We updated the WAF to block malicious traffic.

10:48 Status Page updated: Incident status changed to `Monitoring`.

12:24 Status Page updated: Incident status changed to `Resolved`.

What happened?

An attacker ran a fuzzing application across our public-facing website during the time the WAF was in “detection” mode. This caused excessive load that would normally have been prevented by the WAF, in turn reducing availability of octopus.com.

Engineers mitigated the outage by applying a cut-down implementation of the WAF that protected the website from single origin attacks.

Remediation and next steps

Since this incident, we've completed the migration to our new DNS provider, and the WAF is fully enabled.

During our incident review process, we identified and corrected gaps in our defense to reduce the time from detection to mitigation.

We identified the internal oversight in risk management that led to this situation: by mitigating one risk, we became susceptible to another risk. We have since updated our project risk assessment process to include more formal internal reviews of our planned changes to core systems.

Conclusion

Octopus Deploy takes service availability seriously. In the past month, we’ve had multiple incidents affecting sign-in infrastructure, which is below our desired standard. We apologize for the disruption to our customers and are working to reduce the likelihood and severity of future disruptions.

Posted Feb 20, 2023 - 04:00 UTC

Resolved
This incident has been resolved.
Posted Feb 09, 2023 - 12:24 UTC
Monitoring
We have applied a mitigation that will improve the availability of the affected URL's and are monitoring it's effects.
Posted Feb 09, 2023 - 10:48 UTC
Investigating
We are aware of issues affecting parts of Octopus.com including /signin, /docs, /blog, and /downloads. Engineers are investigating.
Posted Feb 09, 2023 - 08:56 UTC
This incident affected: Octopus.com / OctopusId.