hachyderm.io is one of the many independent Mastodon servers you can use to participate in the fediverse.
Hachyderm is a safe space, LGBTQIA+ and BLM, primarily comprised of tech industry professionals world wide. Note that many non-user account types have restrictions - please see our About page.

Administered by:

Server stats:

9.8K
active users

slamp

I Investigated processes on my home server today. Here a

It started by: "why the server is slow today?" after I ssh into it and do a `ls`

I looked at the last 7 days of the dashboard in but found nothing. Well in fact there is a small memory leak (+1% per day) but I didn't notice it at first sight.

1/n

Then I checked the running processes.
- Why the hell I have 30K Tasks ?
- Ups it's 30997 process !

2/n

Definition: A process is a process that has completed execution but still has an entry in the process table.

Causes: Zombie processes occur when child processes have completed execution, and their exit status needs to be read by the parent process.

Effects: Zombie processes can cause resource leaks by consuming memory and holding file descriptors.

The presence of a few zombie processes is usually harmless, but having too many can indicate a bug in the parent process

3/n

Let's kill them !
I can't: processes cannot be killed using regular signals like `SIGKILL` since they are already dead.

It explains their name: The term 'zombie process' is metaphorical, comparing it to an 'undead' person that has not been 'reaped'.

To remove zombie processes, the parent process should be signaled (e.g., SIGCHLD) to read the child's exit status, or the parent process can be terminated if it is unresponsive.

4/n

`ps -A -ostat,pid,ppid | grep -e '[zZ]' | tail -10`

I used tail to avoid listing the 30k processed and ppid to list the parent process id
Then I kill the parents, I kept one for investigation

`sudo kill -9 240816 236637`

5/n

Time for investigation.

I checked one of the parent and found [ssl_client] <defunct>

I checked a second parent and found [wget] <defunct>

This reminded me that the last change I made was to enable using certificate for most services

is used in the healthcheck section inside but it doesn't explain the zombie, or may be ?

6/n

I checked that I set the parameter to not check the certificate as i'm using 127.0.0.1 instead of the FQDN and don't provide for IP addresses

I exec inside the container to run manually the `wget --no-check-certificate`

It's working correctly

When I remove the healthcheck section in there is no more process

Root cause found: It's the wget used by the healthcheck that create the zombies !

7/n

I summarize: I have processes created by command when doing an https request in the section of

Zombie processes occur when child processes have completed execution, and their exit status needs to be read by the parent process.

A process in a is still a process on the host, so it takes up a PID on the host. Whatever you run in a container is PID 1 which means it has to install a signal handler to get that signal.

8/n

The first thing to understand is an init process doesn't magically remove zombies. A (normal) init is designed to reap zombies when the parent process that failed to wait on them exits and the zombies hang around. The init process then becomes the zombies parent and they can be cleaned up.

9/n

Next, a is a of processes running in their own PID namespace. This cgroup is cleaned up when the container is stopped. Any zombies that are in a container are removed on stop. They don't reach the hosts init.

10/n

Third is the different ways containers are used. Most run one main process and nothing else. If there is another process spawned it is usually a child of that main process. So until the parent exits, the zombie will exist. Then see point 2 (the zombies will be cleared on exit).

11/n

The other role an process can provide is to install signal handlers so signals sent from the host can be passed onto the container process. PID 1 is a bit special as it requires the process to listen for a signal for it to be received.

If you can install a SIGINT and SIGTERM signal handler in your PID 1 process then an init process doesn't add much here.

!!! Those explanations come from this superb article in
stackoverflow.com/questions/49 !!!

12/n

Stack OverflowDocker - init, zombies - why does it matter?I did read this article: https://blog.phusion.nl/2015/01/20/docker-and-the-pid-1-zombie-reaping-problem/ To set some context: Article is about problem with zombies in containers, it try to convinc...

The syntaxe in is:
init: true

What is advantage of ? github.com/krallin/tini/issues

14/n

To finish I checked again the dashboard and this time I saw the memory !

End of on process on
I hope you enjoyed it !

15/15

@slamp Time to get out the shotguns when there are that many zombies 😉

@slamp hello yes I would like to subscribe to this

@slamp Great deduction, love how you analyze the issue. Subbing!

@lgeurts Thanks a lot ! I already known what are zombies processes and the fact they missed their parent. It helps me for the investigation.

@lgeurts I was surprised when I googled the problem and found many solutions which are: "remove health check" 😠
This is not a solution, it's a bad workaround.
Additionally, I still need to investigate why the problem only occurs when using https (I didn't have it before), if it is only the case for or also for and if it only happens on a docker image based on

@slamp Am curious. And you're right, a workaround should be considered a temporary solution so it's always bad to implement something like that for the long term. About the https, could you explain didn't have that before. Is this a new problem?

@lgeurts
I didn't have the issue when doing a GET using http

This configuration didn't create zombie

healthcheck:
test: ["CMD-SHELL", "wget -q -0 - http://127.0.0.1"]
interval: 60s
timeout: 5s