Mariadb crashing under spike in traffic/ddos

So the solution was:

 innodb_buffer_pool_size = 64G

and

max_connections=500

Pointed by @alber

Then @eris spend some time on the server tweaking php-fpm.

Thank you both guys. Many thanks. Hope that’ll end my problems for now.

2 Likes

Unfortunately I celebrated way to early. Server crashed again under DDOS. This time Mariadb and PHP-FPM crashed. Had to reboot server.
What’s weird for me is that server is not reaching max CPU/Memory usage. It’s just ,stuck" without any obvious reason in log.

So far I changed my php-fpm.tpl a little.

@eris set it to :

pm = ondemand
pm.max_children = 32
pm.start_servers = 8
pm.max_requests = 8000

I changed

pm.max_children = 42

Maybe someone have another idea. I’m fine if the server crash under 100%CPU/RAM usage, but it’s strange when it’s dying under 30% CPU load and 30GB/90GB ram usage.

I also tweaked opcache a little.
Changed:

;opcache.force_restart_timeout=180

to:

opcache.force_restart_timeout=0

as mentioned here: https://www.cogmentis.com/php-fpm-crashing-on-cpanel-server-fixed/

Rest of my settings is the same (opcache). Couple of weeks ago I tweaked some memory limits with opcache, but this looks fine.

Offtopic, meanwhile you can use autorestart the fallen services with script running via cron, e.g.:

#!/usr/bin/env bash

set -e
# set -x

services="mysql nginx php7.4-fpm php8.1-fpm etc"

for service in $services; do
  if systemctl is-failed --quiet $service ; then
    if systemctl restart $service ; then
      message="$service is down, restarted at $(date +"%Y-%m-%d %T")"
    else message="ERROR: $service is down, can't be autorestarted at $(date +"%Y-%m-%d %T")"
    fi
    # echo $message;
    echo $message >> /var/log/check-restart-services.log
  fi
done

Maybe it killed by OOM Killer?
dmesg -T | egrep -i 'killed process'

Thanks for that. The problem is bit more complicated than than.

This services just ,hang" no obvious error. For example when site is down doing

systemctl status mariadb/php8.3-fpm 

is returing that everything’s fine.

This returning nothing. So it’s not OOM. This server have 96GB of memory and only one Wordpress site on it ( someone decided to throw money at the problem instead of fixing).

Also current RAM usage under normal site load is

Memory: 8525MiB / 96312MiB

Also you can protect server from direct connections to IP that bypassed Cloudflare with Authenticated Origin Pulls CF certificate validation.

        # Cloudflare Origin CA
        ssl_client_certificate /etc/nginx/certs/cloudflare.crt;
        ssl_verify_client optional;
        # ssl_verify_client on;

enable Authenticated Origin Pulls at Cloudflare SSL > Origin Server and download cert from there Zone-level authenticated origin pulls

and killing process not helps? what shows lsof for process that hung?

I’m already using cloudflare certificate on this server. Where should I put this settings? Nginx template?

/usr/local/hestia/data/templates/web/nginx/php-fpm/

Or directly in

/home/user/conf/web/domain/nginx.conf

I’m using this Nginx template:

I rebooted this server. If it hangs again I’ll check with lsof.
Is:

lsof -i | grep mariadb

enough?

No it can’t start this process again ( mariadb).
Restarting PHP-FPM is fine.

Reboot is my only option then.

For some reasons query logging was enabled

So the drives could be a bottleneck

I enabled that few days ago hoping it’ll help me to find the issue.

I guess drives are not bottlenecks. It’s 2 ssd’s in raid.

Here’s screenshot from new relic. Highlighted moment when it crashed.

Try downgrading the php version back to 7.4 and see if it’s working fine or not?

The issue might be due to memory leaks.

Unfortunately this is not helping.

I’m marking this post as a solution for crashing database. Problem still persist (getting ddosed and server crashing) but now only PHP8.3-FPM is crashing without any error log. I’ll open another thread with another question. Also @alber offered to help me tomorrow, but I hope I’ll resolve this issue faster.

Thanks again.

1 Like

yes, nginx templates, server section, it described at Cloudflare manual by link