$ sudo tail -2 /var/log/nginx/error.log
2023/02/20 11:56:01 [emerg] 21114#21114: socket() 212.224.115.156:80 failed (24: Too many open files)
2023/02/20 11:56:01 [emerg] 21122#21122: socket() 212.224.115.156:80 failed (24: Too many open files)
The whole issue is weird, because I do not think we get thousands of clients connecting at a time considering our typical customers. And the crash always happens at exactly HH:MM:01 on every even minute…
I increased it while debugging, unfortunately neither decrease nor increase helps
janek@main:~ :) $ sudo tail -30 /var/log/nginx/error.log
2023/04/03 13:24:38 [emerg] 5873#5873: epoll_create() failed (24: Too many open files)
2023/04/03 13:24:38 [emerg] 5874#5874: epoll_create() failed (24: Too many open files)
2023/04/03 13:24:38 [emerg] 5875#5875: epoll_create() failed (24: Too many open files)
2023/04/03 13:24:38 [emerg] 5876#5876: epoll_create() failed (24: Too many open files)
2023/04/03 13:24:38 [emerg] 5877#5877: epoll_create() failed (24: Too many open files)
2023/04/03 13:24:38 [emerg] 5879#5879: epoll_create() failed (24: Too many open files)
2023/04/03 13:24:38 [alert] 5871#5871: worker process 5872 exited with fatal code 2 and cannot be respawned
2023/04/03 13:24:38 [alert] 5871#5871: worker process 5873 exited with fatal code 2 and cannot be respawned
2023/04/03 13:24:38 [alert] 5871#5871: worker process 5874 exited with fatal code 2 and cannot be respawned
2023/04/03 13:24:38 [alert] 5871#5871: worker process 5875 exited with fatal code 2 and cannot be respawned
2023/04/03 13:24:38 [alert] 5871#5871: worker process 5876 exited with fatal code 2 and cannot be respawned
2023/04/03 13:24:38 [emerg] 5880#5880: epoll_create() failed (24: Too many open files)
2023/04/03 13:24:38 [alert] 5871#5871: worker process 5877 exited with fatal code 2 and cannot be respawned
2023/04/03 13:24:38 [alert] 5871#5871: cache manager process 5879 exited with fatal code 2 and cannot be respawned
2023/04/03 13:26:01 [warn] 6318#6318: conflicting server name "mail.REDACTEDda.de" on IP_REDACTED:80, ignored
2023/04/03 13:26:01 [emerg] 6318#6318: open() "/var/log/apache2/domains/REDACTEDühn-fachhandel.de.bytes" failed (24: Too many open files)
2023/04/03 13:26:01 [warn] 6326#6326: conflicting server name "mail.REDACTEDda.de" on IP_REDACTED:80, ignored
2023/04/03 13:26:01 [emerg] 6326#6326: open() "/var/log/apache2/domains/REDACTEDühn-fachhandel.de.bytes" failed (24: Too many open files)
2023/04/03 13:26:21 [warn] 6924#6924: conflicting server name "mail.REDACTEDda.de" on IP_REDACTED:80, ignored
2023/04/03 13:26:21 [emerg] 6924#6924: open() "/var/log/apache2/domains/REDACTEDühn-fachhandel.de.bytes" failed (24: Too many open files)
2023/04/03 13:26:22 [warn] 6988#6988: conflicting server name "mail.REDACTEDda.de" on IP_REDACTED:80, ignored
2023/04/03 13:26:22 [emerg] 6988#6988: open() "/var/log/apache2/domains/REDACTEDühn-fachhandel.de.bytes" failed (24: Too many open files)
2023/04/03 13:28:01 [warn] 7263#7263: conflicting server name "mail.REDACTEDda.de" on IP_REDACTED:80, ignored
2023/04/03 13:28:01 [emerg] 7263#7263: open() "/var/log/apache2/domains/REDACTEDühn-fachhandel.de.bytes" failed (24: Too many open files)
2023/04/03 13:28:01 [warn] 7270#7270: conflicting server name "mail.REDACTEDda.de" on IP_REDACTED:80, ignored
2023/04/03 13:28:01 [emerg] 7270#7270: open() "/var/log/apache2/domains/REDACTEDühn-fachhandel.de.bytes" failed (24: Too many open files)
2023/04/03 13:30:01 [warn] 7753#7753: conflicting server name "mail.REDACTEDda.de" on IP_REDACTED:80, ignored
2023/04/03 13:30:01 [emerg] 7753#7753: open() "/var/log/apache2/domains/REDACTEDühn-fachhandel.de.bytes" failed (24: Too many open files)
2023/04/03 13:30:01 [warn] 7880#7880: conflicting server name "mail.REDACTEDda.de" on IP_REDACTED:80, ignored
2023/04/03 13:30:01 [emerg] 7880#7880: open() "/var/log/apache2/domains/REDACTEDühn-fachhandel.de.bytes" failed (24: Too many open files)
janek@main:~ :) $ sudo head /etc/nginx/nginx.conf
# Server globals
user www-data;
worker_processes 6; #auto;
worker_rlimit_nofile 1024; #65535;
error_log /var/log/nginx/error.log;
pid /run/nginx.pid;
include /etc/nginx/modules-enabled/*.conf;
# Worker config
janek@main:~ :) $ sudo head -15 /etc/nginx/nginx.conf
# Server globals
user www-data;
worker_processes 6; #auto;
worker_rlimit_nofile 1024; #65535;
error_log /var/log/nginx/error.log;
pid /run/nginx.pid;
include /etc/nginx/modules-enabled/*.conf;
# Worker config
events {
worker_connections 124; #1024
use epoll;
multi_accept on;
}
Also limited worker processes to 6 as the VM has only 8 cores.
Interesting that these errors again pop up every two minutes
The amount of open files climbed, the issue persists:
$ sudo lsof | wc -l
136478
$ sudo prlimit -p 1
RESOURCE DESCRIPTION SOFT HARD UNITS
AS address space limit unlimited unlimited bytes
CORE max core file size 0 unlimited bytes
CPU CPU time unlimited unlimited seconds
DATA max data size unlimited unlimited bytes
FSIZE max file size unlimited unlimited bytes
LOCKS max number of file locks held unlimited unlimited locks
MEMLOCK max locked-in-memory address space 8430415360 8430415360 bytes
MSGQUEUE max bytes in POSIX mqueues 819200 819200 bytes
NICE max nice prio allowed to raise 0 0
NOFILE max number of open files 1048576 1048576 files
NPROC max number of processes 257124 257124 processes
RSS max resident set size unlimited unlimited bytes
RTPRIO max real-time priority 0 0
RTTIME timeout for real-time tasks unlimited unlimited microsecs
SIGPENDING max number of pending signals 257124 257124 signals
STACK max stack size 8388608 unlimited bytes
I was still having this issue on Ubuntu 22.04. I’ve just resolved it I think.
I had set LimitNOFILE=65535 in the systemd unit systemctl edit --full nginx.service
Which worked fine for a systemctl restart nginx, but configtest nginx -t would still fail, which was stopping hestia restarting it.
…and then I set the limits in /etc/security/limits.d/99-open-files.conf
* soft nofile 65536
* hard nofile 65536
…and rebooted, and what I found interesting there is that ulimit -n for root was still 1024, when for other users it updated to 65536 as requested.
It turns out that you need to add lines for root specifically
* soft nofile 65536
* hard nofile 65536
root soft nofile 65536
root hard nofile 65536
Quick logout/in to the root user, and it was returning the limit correctly now, and nginx -t works again, and cross fingers, Hestia appears to be able to restart the service again normally.
Long story short, it needs both the LimitNOFILE in the service unit, and root’s limit raising in /etc/security/limits.d (I haven’t tried removing the * entry yet, so ymmv on that one)
Did you set LimitNOFILE=65535 for the [Service] or the [Unit]?
Either way, the issue on my side seems to be that there really are that many open files, and I am a bit confused why.
The issue has returned despite the user staying disabled and persists across restarts.
Would be interesting to see sudo lsof | wc -l on their systems - do they have higher limits or is there an issue on my system that causes unnecessarily many open files?