hachyderm.io is one of the many independent Mastodon servers you can use to participate in the fediverse.
Hachyderm is a safe space, LGBTQIA+ and BLM, primarily comprised of tech industry professionals world wide. Note that many non-user account types have restrictions - please see our About page.

Administered by:

Server stats:

9.5K
active users

*pokes the hachyderm*

Just restarted pgbouncer on the @hachyderm database server, hopefully that fixes our 500s. Very strange that we only started seeing this issue yesterday though.

*pokes the hachyderm again*

Sorry if there's been some outage! Had a few issues with the fix, but we should now have it right.

@thisismissem
I just saw this post now. Thinking back, I thought I was having issues with my self-hosted DNS. 🤣

Emelia 👸🏻

@badnetmask nah, we'd accidentally had pgbouncer configured for 2000-2500 connections, but the process had LimitNOFILESoft of 1024, so we were running out of sockets and accept() calls were failing when mastodon tried to connect to the database.

Issue only surfaced 2 days ago but we hadn't changed server configuration in 20 days, just upgraded system packages on non-database nodes in our infrastructure, so we thought we'd broken a dll linked library in ruby like libvips or libicu

we thought we'd fixed it yesterday, but had the wrong root cause. The error message was obtuse and the promtail log lines dropped the stack trace lines that followed, so we only had partial information & thought the error must've been from the upgrade, when in reality it was just a freak coincidence.

Finally figured it out earlier & deployed a fix. But it took a few tries to get it right.

We also now have alerts in place if pgbouncer errors again (we're in the process of rebuilding the primary database server & switching to pgcat which has much greater observerability)

The other option was that it was a networking error, but we could access the database node via the tailnet that connects our infrastructure.

@thisismissem so in the end, the database issue that we thought wasn't a database issue was, in fact, a database issue. fun :blobfoxscience:

@esk I know right?? That error line was so confusing — we probably need to recalculate how many database connections we're configured for, too

@thisismissem @esk
Thanks for the explanation. Those OS level file limits are always the last thing anyone looks at. Been there, done that (multiple times). 😄