Too many open files despite config adjustments

xeruf · February 20, 2023, 1:14pm

Issue

$ sudo tail -2 /var/log/nginx/error.log
2023/02/20 11:56:01 [emerg] 21114#21114: socket() 212.224.115.156:80 failed (24: Too many open files)
2023/02/20 11:56:01 [emerg] 21122#21122: socket() 212.224.115.156:80 failed (24: Too many open files)

On a server with hundreds of domains, Even after adjusting all kinds of limits everywhere, nginx keeps crashing every 2 minutest due to Too many open files (see [Feature] Discuss raising LimitNOFILE for Nginx · Issue #2431 · hestiacp/hestiacp · GitHub and https://development.hestiacp.com/docs/server-administration/configuration.html#error-24-too-many-open-files)

Hestia Control Panel Version

v1.6.14

Operating system

Debian 11.6 (x86_64)

Details

janek@main:~ :) $ cat /etc/systemd/system/nginx.service.d/override.conf
[Service]
TimeoutStart=5m
Restart=on-failure
RestartSec=60
StartLimitBurst=10
StartLimitInterval=1h
LimitNOFILE=65536
janek@main:~ :) $ sudo ps ux | grep nginx
root         862  0.0  0.0  20168  1288 ?        Ss   10:54   0:00 nginx: master process /usr/local/hestia/nginx/sbin/hestia-nginx
root       19315  0.0  0.0  89992 40088 ?        Ss   11:50   0:00 nginx: master process /usr/sbin/nginx -c /etc/nginx/nginx.conf
janek@main:~ :) $ sudo cat /proc/862/limits
Limit                     Soft Limit           Hard Limit           Units     
Max cpu time              unlimited            unlimited            seconds   
Max file size             unlimited            unlimited            bytes     
Max data size             unlimited            unlimited            bytes     
Max stack size            8388608              unlimited            bytes     
Max core file size        0                    unlimited            bytes     
Max resident set          unlimited            unlimited            bytes     
Max processes             257124               257124               processes 
Max open files            1024                 524288               files     
Max locked memory         65536                65536                bytes     
Max address space         unlimited            unlimited            bytes     
Max file locks            unlimited            unlimited            locks     
Max pending signals       257124               257124               signals   
Max msgqueue size         819200               819200               bytes     
Max nice priority         0                    0                    
Max realtime priority     0                    0                    
Max realtime timeout      unlimited            unlimited            us        
janek@main:~ :) $ sudo cat /proc/19315/limits
Limit                     Soft Limit           Hard Limit           Units     
Max cpu time              unlimited            unlimited            seconds   
Max file size             unlimited            unlimited            bytes     
Max data size             unlimited            unlimited            bytes     
Max stack size            8388608              unlimited            bytes     
Max core file size        0                    unlimited            bytes     
Max resident set          unlimited            unlimited            bytes     
Max processes             257124               257124               processes 
Max open files            65536                65536                files     
Max locked memory         65536                65536                bytes     
Max address space         unlimited            unlimited            bytes     
Max file locks            unlimited            unlimited            locks     
Max pending signals       257124               257124               signals   
Max msgqueue size         819200               819200               bytes     
Max nice priority         0                    0                    
Max realtime priority     0                    0                    
Max realtime timeout      unlimited            unlimited            us        
janek@main:~ :) $ cat /etc/default/nginx
# Defaults for nginx initscript
# sourced by /etc/init.d/nginx

# Additional options that are passed to nginx
DAEMON_ARGS=""
ULIMIT="-n 15000"
janek@main:~ :) $ sudo nginx -T
nginx: the configuration file /etc/nginx/nginx.conf syntax is ok
nginx: [emerg] socket() 212.224.115.156:80 failed (24: Too many open files)
nginx: configuration file /etc/nginx/nginx.conf test failed
janek@main:~ :) $ head -20 /etc/nginx/nginx.conf
# Server globals
user                    www-data;
worker_processes        auto;
worker_rlimit_nofile    65535;
error_log               /var/log/nginx/error.log;
pid                     /var/run/nginx.pid;

include /etc/nginx/modules-enabled/*.conf;

# Worker config
events {
        worker_connections  10240;
        use                 epoll;
        multi_accept        on;
}

http {
    # Main settings
    sendfile                        on;
    tcp_nopush                      on;
janek@main:~ :) $ head -20 /etc/nginx/nginx.conf
# Server globals
user                    www-data;
worker_processes        auto;
worker_rlimit_nofile    65535;
error_log               /var/log/nginx/error.log;
pid                     /var/run/nginx.pid;

include /etc/nginx/modules-enabled/*.conf;

# Worker config
events {
        worker_connections  10240;
        use                 epoll;
        multi_accept        on;
}

http {
    # Main settings
    sendfile                        on;
    tcp_nopush                      on;

Further Infos

Very insightful post, but even putting connection limits did not fix it: 500 error - nginx 500 (24: too many open files) - Server Fault

The whole issue is weird, because I do not think we get thousands of clients connecting at a time considering our typical customers. And the crash always happens at exactly HH:MM:01 on every even minute…

Related to Error: nginx restart failed when issuing SSL
See [Bug] Persistent Too many open files · Issue #3286 · hestiacp/hestiacp · GitHub

eris · February 20, 2023, 2:20pm

https://www.cyberciti.biz/faq/linux-unix-nginx-too-many-open-files/

xeruf · February 20, 2023, 2:45pm

I went through this exact post and tried all suggestions over there, as you can see in my terminal log

eris · February 21, 2023, 8:53am

No still missing:

ulimit -Hn
ulimit -Sn

I know some VPS software can impose some limits it as well…

xeruf · March 28, 2023, 9:32am

Will check again, as the issue persisted.
As for VPS, we run HestiaCP in our own Proxmox, that should not be an issue I think…

The restart every 2 minutes was unrelated btw, I have fixed that already.

xeruf · March 28, 2023, 9:43am

it seems to work now after raising the limits to an absurd 100.000/300.000 in limits.conf…

eris · March 28, 2023, 10:37am

I don’t know how many websites run on 1 server but the fact probally 1 site needs at least 10 open files

3 for each log
2 for conf
and a few for the other conf files

And probally xx openfiles for serving the files

I know users run +400 sites on one server with out issues

xeruf · April 2, 2023, 9:12pm

We have about 100 users with about 200 sites in total, there is no reason this should continually exceed the limits.

It worked for a few seconds after a server restart, but the issue immediately returned and persists…

xeruf · April 2, 2023, 9:45pm

Collected all the details I could:

root@main:proc :) $ ulimit -Hn; ulimit -Sn
1048576
1024
root@main:proc :) $ cat /etc/sysctl.d/local.conf 
fs.file-max = 70000
root@main:proc :) $ tail /etc/security/limits.conf
#@student        hard    nproc           20
#@faculty        soft    nproc           20
#@faculty        hard    nproc           50
#ftp             hard    nproc           0
#ftp             -       chroot          /ftp
#@student        -       maxlogins       4
nginx       soft    nofile  100000
nginx       hard    nofile  300000

# End of file
root@main:proc :) $ sudo -u nginx sh -c 'ulimit -Hn; ulimit -Sn'
1048576
1024
root@main:proc :) $ runuser -u nginx -- sh
$ ulimit -Hn; ulimit -Sn
300000
100000
root@main:proc :) $ sudo -u nginx whoami
nginx
root@main:proc :) $ lsof | wc -l
110778
root@main:proc :) $ for pid in /proc/[0-9]*; do p=$(basename $pid); printf "%4d FDs for PID %6d; command=%s\n" $(ls $pid/fd | wc -l) $p "$(ps -p $p -o comm=)"; d
one | sort -nr | head -20
ls: cannot access '/proc/23917/fd': No such file or directory
ls: cannot access '/proc/23918/fd': No such file or directory
ls: cannot access '/proc/23923/fd': No such file or directory
1074 FDs for PID   1046; command=nginx
1074 FDs for PID   1043; command=nginx
1068 FDs for PID  15026; command=apache2
1068 FDs for PID  15025; command=apache2
1067 FDs for PID   1047; command=nginx
1067 FDs for PID   1044; command=nginx
1065 FDs for PID    954; command=apache2
1064 FDs for PID   1056; command=nginx
1064 FDs for PID   1055; command=nginx
1064 FDs for PID   1054; command=nginx
1064 FDs for PID   1052; command=nginx
1064 FDs for PID   1051; command=nginx
1064 FDs for PID   1050; command=nginx
1064 FDs for PID   1049; command=nginx
1064 FDs for PID   1048; command=nginx
1064 FDs for PID   1045; command=nginx
1063 FDs for PID  15024; command=apache2
1061 FDs for PID   1057; command=nginx
 982 FDs for PID    764; command=mariadbd
 282 FDs for PID    670; command=php-fpm8.1

Last command as found in linux - Counting open files per process - Stack Overflow

The total open file count fluctuates between 110.000 and 112.000, which I find rather irritating.

xeruf · April 2, 2023, 10:07pm

Current error log: /var/log/nginx/error.log

eris · April 3, 2023, 12:00am

Try decreasing it…

xeruf · April 3, 2023, 11:34am

I increased it while debugging, unfortunately neither decrease nor increase helps

janek@main:~ :) $ sudo tail -30 /var/log/nginx/error.log
2023/04/03 13:24:38 [emerg] 5873#5873: epoll_create() failed (24: Too many open files)
2023/04/03 13:24:38 [emerg] 5874#5874: epoll_create() failed (24: Too many open files)
2023/04/03 13:24:38 [emerg] 5875#5875: epoll_create() failed (24: Too many open files)
2023/04/03 13:24:38 [emerg] 5876#5876: epoll_create() failed (24: Too many open files)
2023/04/03 13:24:38 [emerg] 5877#5877: epoll_create() failed (24: Too many open files)
2023/04/03 13:24:38 [emerg] 5879#5879: epoll_create() failed (24: Too many open files)
2023/04/03 13:24:38 [alert] 5871#5871: worker process 5872 exited with fatal code 2 and cannot be respawned
2023/04/03 13:24:38 [alert] 5871#5871: worker process 5873 exited with fatal code 2 and cannot be respawned
2023/04/03 13:24:38 [alert] 5871#5871: worker process 5874 exited with fatal code 2 and cannot be respawned
2023/04/03 13:24:38 [alert] 5871#5871: worker process 5875 exited with fatal code 2 and cannot be respawned
2023/04/03 13:24:38 [alert] 5871#5871: worker process 5876 exited with fatal code 2 and cannot be respawned
2023/04/03 13:24:38 [emerg] 5880#5880: epoll_create() failed (24: Too many open files)
2023/04/03 13:24:38 [alert] 5871#5871: worker process 5877 exited with fatal code 2 and cannot be respawned
2023/04/03 13:24:38 [alert] 5871#5871: cache manager process 5879 exited with fatal code 2 and cannot be respawned
2023/04/03 13:26:01 [warn] 6318#6318: conflicting server name "mail.REDACTEDda.de" on IP_REDACTED:80, ignored
2023/04/03 13:26:01 [emerg] 6318#6318: open() "/var/log/apache2/domains/REDACTEDühn-fachhandel.de.bytes" failed (24: Too many open files)
2023/04/03 13:26:01 [warn] 6326#6326: conflicting server name "mail.REDACTEDda.de" on IP_REDACTED:80, ignored
2023/04/03 13:26:01 [emerg] 6326#6326: open() "/var/log/apache2/domains/REDACTEDühn-fachhandel.de.bytes" failed (24: Too many open files)
2023/04/03 13:26:21 [warn] 6924#6924: conflicting server name "mail.REDACTEDda.de" on IP_REDACTED:80, ignored
2023/04/03 13:26:21 [emerg] 6924#6924: open() "/var/log/apache2/domains/REDACTEDühn-fachhandel.de.bytes" failed (24: Too many open files)
2023/04/03 13:26:22 [warn] 6988#6988: conflicting server name "mail.REDACTEDda.de" on IP_REDACTED:80, ignored
2023/04/03 13:26:22 [emerg] 6988#6988: open() "/var/log/apache2/domains/REDACTEDühn-fachhandel.de.bytes" failed (24: Too many open files)
2023/04/03 13:28:01 [warn] 7263#7263: conflicting server name "mail.REDACTEDda.de" on IP_REDACTED:80, ignored
2023/04/03 13:28:01 [emerg] 7263#7263: open() "/var/log/apache2/domains/REDACTEDühn-fachhandel.de.bytes" failed (24: Too many open files)
2023/04/03 13:28:01 [warn] 7270#7270: conflicting server name "mail.REDACTEDda.de" on IP_REDACTED:80, ignored
2023/04/03 13:28:01 [emerg] 7270#7270: open() "/var/log/apache2/domains/REDACTEDühn-fachhandel.de.bytes" failed (24: Too many open files)
2023/04/03 13:30:01 [warn] 7753#7753: conflicting server name "mail.REDACTEDda.de" on IP_REDACTED:80, ignored
2023/04/03 13:30:01 [emerg] 7753#7753: open() "/var/log/apache2/domains/REDACTEDühn-fachhandel.de.bytes" failed (24: Too many open files)
2023/04/03 13:30:01 [warn] 7880#7880: conflicting server name "mail.REDACTEDda.de" on IP_REDACTED:80, ignored
2023/04/03 13:30:01 [emerg] 7880#7880: open() "/var/log/apache2/domains/REDACTEDühn-fachhandel.de.bytes" failed (24: Too many open files)

janek@main:~ :) $ sudo head /etc/nginx/nginx.conf
# Server globals
user                    www-data;
worker_processes        6; #auto;
worker_rlimit_nofile    1024; #65535;
error_log               /var/log/nginx/error.log;
pid                     /run/nginx.pid;

include /etc/nginx/modules-enabled/*.conf;

# Worker config
janek@main:~ :) $ sudo head -15 /etc/nginx/nginx.conf
# Server globals
user                    www-data;
worker_processes        6; #auto;
worker_rlimit_nofile    1024; #65535;
error_log               /var/log/nginx/error.log;
pid                     /run/nginx.pid;

include /etc/nginx/modules-enabled/*.conf;

# Worker config
events {
        worker_connections  124; #1024
        use                 epoll;
        multi_accept        on;
}

Also limited worker processes to 6 as the VM has only 8 cores.

Interesting that these errors again pop up every two minutes

xeruf · April 3, 2023, 12:01pm

at least I can apparently reproduce it now:

$ sudo lnav -n -r /var/log
error: Too many open files -- '/var/log/apache2/domains/DOMAIN.error.log'

xeruf · April 3, 2023, 4:57pm

Nevermind, this concerned the local user limits.
After raising them for all users the lnav command proceeds without error but nginx still errors out…

$ sudo tail /etc/security/limits.conf
#@faculty        hard    nproc           50
#ftp             hard    nproc           0
#ftp             -       chroot          /ftp
#@student        -       maxlogins       4
*       soft    nofile  200000
*       hard    nofile  300000
nginx       soft    nofile  200000
nginx       hard    nofile  300000

# End of file

xeruf · April 4, 2023, 11:31am

The amount of open files climbed, the issue persists:

$ sudo lsof | wc -l
136478
$ sudo prlimit -p 1
RESOURCE   DESCRIPTION                              SOFT       HARD UNITS
AS         address space limit                 unlimited  unlimited bytes
CORE       max core file size                          0  unlimited bytes
CPU        CPU time                            unlimited  unlimited seconds
DATA       max data size                       unlimited  unlimited bytes
FSIZE      max file size                       unlimited  unlimited bytes
LOCKS      max number of file locks held       unlimited  unlimited locks
MEMLOCK    max locked-in-memory address space 8430415360 8430415360 bytes
MSGQUEUE   max bytes in POSIX mqueues             819200     819200 bytes
NICE       max nice prio allowed to raise              0          0 
NOFILE     max number of open files              1048576    1048576 files
NPROC      max number of processes                257124     257124 processes
RSS        max resident set size               unlimited  unlimited bytes
RTPRIO     max real-time priority                      0          0 
RTTIME     timeout for real-time tasks         unlimited  unlimited microsecs
SIGPENDING max number of pending signals          257124     257124 signals
STACK      max stack size                        8388608  unlimited bytes

xeruf · April 4, 2023, 12:15pm

The issue was one user with 80 domains, the error always happened on domain number 79.
After suspending the user everything was fine again.

erador · May 25, 2023, 11:19am

I was still having this issue on Ubuntu 22.04. I’ve just resolved it I think.

I had set LimitNOFILE=65535 in the systemd unit systemctl edit --full nginx.service

Which worked fine for a systemctl restart nginx, but configtest nginx -t would still fail, which was stopping hestia restarting it.

…and then I set the limits in /etc/security/limits.d/99-open-files.conf

* soft nofile 65536
* hard nofile 65536

…and rebooted, and what I found interesting there is that ulimit -n for root was still 1024, when for other users it updated to 65536 as requested.

It turns out that you need to add lines for root specifically

* soft nofile 65536
* hard nofile 65536
root soft nofile 65536
root hard nofile 65536

Quick logout/in to the root user, and it was returning the limit correctly now, and nginx -t works again, and cross fingers, Hestia appears to be able to restart the service again normally.

Long story short, it needs both the LimitNOFILE in the service unit, and root’s limit raising in /etc/security/limits.d (I haven’t tried removing the * entry yet, so ymmv on that one)

xeruf · August 4, 2023, 9:36pm

Did you set LimitNOFILE=65535 for the [Service] or the [Unit]?
Either way, the issue on my side seems to be that there really are that many open files, and I am a bit confused why.
The issue has returned despite the user staying disabled and persists across restarts.

xeruf · August 4, 2023, 9:52pm

Seems like apache is the issue:

$ sudo lsof | awk '{print $1}' | sort | uniq -c | sort -h | tail -5
   2809 exim4
   8138 named
  15657 nginx
  24092 mariadbd
  65246 apache2

But raising the limit in the apache2 Service config did not help…

And I am not sure whether such a number of open files is even reasonable?

xeruf · August 4, 2023, 9:59pm

Would be interesting to see sudo lsof | wc -l on their systems - do they have higher limits or is there an issue on my system that causes unnecessarily many open files?