After requesting a backup, Nginx failed with nginx.service: Failed with result 'core-dump' error

Hi everyone,

I’ve just run my first website backup and I see that Nginx failed with the following (taken from syslog):

May 22 07:33:02 hp systemd[1]: Reloaded A high performance web server and a reverse proxy server.
May 22 07:33:02 hp systemd[1]: Reloading Dovecot IMAP/POP3 email server.
May 22 07:33:02 hp systemd[1]: Reloaded Dovecot IMAP/POP3 email server.
May 22 07:33:02 hp systemd[1]: Reloading LSB: exim Mail Transport Agent.
May 22 07:33:02 hp exim4[667270]:  * Reloading exim4 configuration files
May 22 07:33:02 hp systemd[1]: nginx.service: Main process exited, code=dumped, status=11/SEGV
May 22 07:33:02 hp systemd[1]: nginx.service: Killing process 565812 (nginx) with signal SIGKILL.
May 22 07:33:02 hp systemd[1]: nginx.service: Killing process 565813 (nginx) with signal SIGKILL.
May 22 07:33:02 hp systemd[1]: nginx.service: Killing process 565814 (nginx) with signal SIGKILL.
May 22 07:33:02 hp systemd[1]: nginx.service: Killing process 565815 (nginx) with signal SIGKILL.
May 22 07:33:02 hp systemd[1]: nginx.service: Killing process 565816 (nginx) with signal SIGKILL.
May 22 07:33:02 hp systemd[1]: nginx.service: Killing process 565817 (nginx) with signal SIGKILL.
May 22 07:33:02 hp systemd[1]: nginx.service: Killing process 565819 (nginx) with signal SIGKILL.
May 22 07:33:02 hp systemd[1]: nginx.service: Killing process 565821 (nginx) with signal SIGKILL.
May 22 07:33:02 hp systemd[1]: nginx.service: Killing process 565822 (nginx) with signal SIGKILL.
May 22 07:33:02 hp systemd[1]: nginx.service: Killing process 565812 (nginx) with signal SIGKILL.
May 22 07:33:02 hp systemd[1]: nginx.service: Killing process 565813 (nginx) with signal SIGKILL.
May 22 07:33:02 hp systemd[1]: nginx.service: Killing process 565814 (nginx) with signal SIGKILL.
May 22 07:33:02 hp systemd[1]: nginx.service: Killing process 565815 (nginx) with signal SIGKILL.
May 22 07:33:02 hp systemd[1]: nginx.service: Killing process 565816 (nginx) with signal SIGKILL.
May 22 07:33:02 hp systemd[1]: nginx.service: Killing process 565817 (nginx) with signal SIGKILL.
May 22 07:33:02 hp systemd[1]: nginx.service: Killing process 565819 (nginx) with signal SIGKILL.
May 22 07:33:02 hp systemd[1]: nginx.service: Killing process 565821 (nginx) with signal SIGKILL.
May 22 07:33:02 hp systemd[1]: nginx.service: Killing process 565822 (nginx) with signal SIGKILL.
May 22 07:33:02 hp systemd[1]: nginx.service: Failed with result 'core-dump'.
May 22 07:33:02 hp exim4[667270]:    ...done.
May 22 07:33:02 hp systemd[1]: Reloaded LSB: exim Mail Transport Agent.
May 22 07:34:01 hp CRON[667738]: (admin) CMD (sudo /usr/local/hestia/bin/v-update-sys-queue restart)
May 22 07:35:01 hp CRON[667767]: (root) CMD (command -v debian-sa1 > /dev/null && debian-sa1 1 1)
May 22 07:35:01 hp CRON[667768]: (admin) CMD (sudo /usr/local/hestia/bin/v-update-sys-queue backup)

A simple systemctl restart nginx fixed the issue, but does anyone of you can think of a reason why this happened in the first place?

It’s hard to know the reason it failed without analyzing the core file. I’ve no idea where the system has left the core file but you could try to find it.

find / -type file -name core

If you find it, you could use gdb to debug the core file:

Note: The path to nginx binary is usually here: /usr/sbin/nginx

gdb /path/to/nginx /path/to/nginx/core
1 Like

Unfortunately, no joy:

root@oo:/# find / -type f -name core
root@oo:/#

Maybe it is disabled:

ulimit -c

If the output is 0, it is disabled.

1 Like

Yes the output is indeed 0, what is a reasonable size? I’ll try to change that and then run a new backup to see if I am able to replicate the problem

To be safe it should be unlimited

Changed :+1: I’ll run a backup now. If I use v-backup-user instead of the web interface, will the process send an email notification to the user?

1 Like

Yes, it will.

I tried again but so Nginx is doing fine. I guess now I have the logs enabled in case it happens again :man_shrugging:

Is there any way to run site backups as admin and avoid a user notification? :thinking:

1 Like

By default, when executing automatic backups, users shouldn’t be notified (if the backup fails, the user will receive a mail).

1 Like

Hi!

I’d like to reopen this as the same issue just happened again, in this case, Hestia failed to install a SSL certificate (it was right) but this caused Nginx to fail.

nginx -t shows that everything is fine, a nginx restart will fix that but I’m trying to understand what is causing this.

ulimit -c shows unlimited but find / -type f -name core gives no results

Any other logs I can check?

How did you try to install the certificate? Did you see the nginx error? When it happens, instead of restart, check the status first systemctl status nginx --no-pager -l

You can always check these logs:

/var/log/hestia/error.log
/var/log/hestia/system.log
/var/log/nginx/error.log
/var/log/nginx/domains/YourDomain.error.log

I’ve tried to install the SSL certificate from the site settings

I did check the status before restarting Nginx but there was no error, just a killing process and a Failed with result ‘core-dump’ message

Without a core dump it’s really hard to analyze the problem.

You could try to install systemd-coredump and it should save core dumps in /var/lib/systemd/coredump

apt install systemd-coredump

And if a core dump is created, you could use coredumpctl to interact with it.

Thank you @sahsanu, I’ve installed systemd-coredump, I’ll send a message if I’m able to replicate the issue once again

1 Like

Just happened again while I was trying to generate a SSL certificate (simply enabling the checkbox and the HTTPS redirect checkbox)

Maybe it’s related to the nginx extras I’ve installed some time ago :thinking:
I’m going to try what has been suggested here

1 Like