hachyderm.io is one of the many independent Mastodon servers you can use to participate in the fediverse.
Hachyderm is a safe space, LGBTQIA+ and BLM, primarily comprised of tech industry professionals world wide. Note that many non-user account types have restrictions - please see our About page.

Administered by:

Server stats:

10K
active users

Ellie Huxtable

What’s your worst ops blunder?

Mine is the time I took down an entire AWS account by miss-spelling an environment variable 😳

Orrrr the time I accidentally deleted all the code from self hosted GitLab right before end of day

@ellie I have a couple of things that I did/saw/took part in, so much worse than those, that they're actually classified and I can't talk about them 😅

@ellie not mine but a co-worker was going to delete a folder on traffic node for a customer, and issued the command `rm -rf <foldername>*`

Except his copy/paste had grabbed a space at the end of the folder name.......

@transitory oh shit 😭 although I think I’ve done something similar 🙈

@ellie That time I almost pushed customer's code to #NPM. #Yarn didn't honor publishConfig setting so didn't push to our own package repo but straight to npmjs, and didn't display which repo it was pushing to... Saved by NPM not publishing packages under and organisation unless the organisation is created first.

@ellie At one point, I was working as one of the sysadmins of a university. We had every device (computer, printer, you name it) on the internal network for Reasons.

I accidentally sent a 300-page print job to all of the printers, instead of just one. That would not have been so bad, because we could cancel them, if only sending a gazillion large print jobs all over the place would not have killed the network while we were remote, an hour's drive from campus.

@algernon ahahaha oh my god 💀 so much paper! What were you printing?

@ellie I do not remember anymore, unfortunately. It was some university stuff for a prof, I think. Something he definitely did not need hundreds of copies of. =)

@algernon I am fairly sure we call that "redundancy"

@ellie
I'm more dev than ops, but I did manage to freeze our production base for ~15 minutes by turning on the child safety feature of my DB client and not realising it had turned off auto-commit until someone pinged me asking why I had more than a thousand connections to the base.

Does that count?

@ellie once enabled *every* email account managed by Yahoo small business. we all have to learn about the importance of WHERE in your SQL statements at some point. I'm lucky mine was pretty benign :)

@reassuringurl I think we've all ran a query that went a bit wrong at some point!

@ellie I pasted a fairly high privileged root password on a public logged IRC channel accidentally

@ellie i removed the awsOrganizationAccessRole for one of our Accounts by accident and prevent to access this Account for all users.

@ellie@hachyderm.io not mine but a co-worker was going to delete a folder on traffic node for a customer, and issued the command rm -rf *

Except his copy/paste had grabbed a space at the end of the folder name.......

@rachel oh god 😭

I hope everything was backed up ok!

Edit, seeing the dates: …not entirely sure why mastodon just notified me of this

@ellie For some reason, I thought I needed access to prod DB from my dev machine (basically me being lazy and not doing the debugging and reproduction properly)

So the DB_URL env var, pointed to the live database with payment details of over a million users in it.

Guess what happened when an hour later, I ran the tests in that terminal? Yeah, those with the before() function that called db_reset().

(We had backups. And failover. We're down for less than a minute. Still...)

@berkes oh ouchhhh I bet that one stung 😭😭😭

@ellie I once ran a data migration on an old codebase, updating the user model, not understanding that this update would trigger real emails to all users. I only realized after notifications to around 10k people
had been sent out, notifying them about “upcoming” conferences that had happened years prior. Almost gave me a heart attack, super embarrassing.

@ellie honestly, I never had anything worse than some git conflicts. I’ve had colleagues do drop db on prod in 2 separate companies, and I had to do the saving. Does that count?

@ellie i deleted a full prod database because someone didn't enable backups and did not name it and I thought it's a leftover database.

That was a nice post mortem.

@ellie

The scene: AWS console, EC2 instances tab.
Me: clicks "select all", and then "terminate instances"
Me, a half second later: "oh SHIT it didn't save my filters, I just terminated every EC2 instance in the account."

Fortunately it was just our dev account and we could recreate everything with terraform but.... Still.

@ellie many many years ago, running a large Solaris prod box for the first time, after Linux. ‘killall’ command behaviour was different.