hachyderm.io is one of the many independent Mastodon servers you can use to participate in the fediverse.
Hachyderm is a safe space, LGBTQIA+ and BLM, primarily comprised of tech industry professionals world wide. Note that many non-user account types have restrictions - please see our About page.

Administered by:

Server stats:

9.3K
active users

Lorin Hochstein :verified:

I have no information about how this incident came to be but I can confidently predict that people will blame it on greedy execs and sloppy devs, regardless of what the actual details are. And they will therefore learn nothing from the details.

@norootcause spoiler: it will turn out to be a process gap, as almost always, and people will waste a lot of time looking for an individual to blame

@norootcause one of my favorite thought exercises here is "let's say this was entirely due to one person's total failure to follow process, or even acting with malice: it's still a process gap if one single person's actions (or lack thereof) have the potential to cause this kind of outage.”

It's always a process gap.

@darkuncle @norootcause
I am guessing that CloudStrike doesn't do gradual rollout, which would seem to be a well-known best practice. (Of course it costs extra to develop and use such a system)

@norootcause It's way too public and people need that simple answer. :(

@dtauvdiodr I wouldn’t be surprised if there was a congressional hearing!

@norootcause honestly, I'm really hoping we eventually learn what mitigations they put in place. It's all well and good for us to say you should have secondary access channels or automated rollbacks or whatever else. But that's a hard enough problem in userspace application code. How do you even do these things in the bootloader?