So, having a home #k8s cluster something about Posgres HA has been bugging me a lot. When there's an electric blackout (this is #Spain after all), all the pods go down. But what happens with Postgres is that the replicas go into a process to sync and elect a new master, and this takes time.
Meanwhile, the pgpool will give successful database connections to apps in pods, but only read-only.
What happens with an app like #Matrix #Synapse is that I think it gets database connections in a pool at start-up, and as it succeeds, it just continues. However, when it actually tries to make updates and inserts, it will get errors, but now it will only log them; they aren't fatal. Or would log them unless the logs were by default off because of privacy and security.
The initial read-only database connections are never upgraded to read-write because the application doesn't expect this kind of a failure, even when the new master is chosen.
Meanwhile the Matrix server continues in a highly degraded mode without being able to persist messages sent. It will only be able to relay them to currently connected online clients. This leads to users getting diverging views to the messages on channels.
I solved this by adding an initContainer to check for read-write connection to Postgres before the Synapse pod start-up, but it's a hack.