Server gone down because of an unknown issue

I ran a lot of Telgram bots on my server simultaneously and then suddenly I noticed I’m not able to connect to the domain name which the bots are hosted on that domain, so I thought maybe my server is gone down, so I’ve connected to the server via SSH and by using htop I’ve seen that CPU usage is less than %1
I’ve checked other domains on the server which are using the default template but they were working properly! What happened to this domain that it doesn’t allow any new connection? Is there any limitation that I should increase it?

What is the following info written in IP/list/server/web ? is it related to my problem?

NGINX STATUS

Active connections: 12
server accepts handled requests
3001 3001 3577
Reading: 0 Writing: 3 Waiting: 9

Did you checked the logs for the related domain? Do you see something in the access logs? Could you have a dns issue with the related domain?

Just guessing here, without additional informations, it would be hard to help you :slight_smile:.

Yes I’ve checked, there isn’t any error except the following one:

[Thu Aug 06 19:32:25.520873 2020] [proxy_fcgi:error] [pid 24155] (70007)The timeout specified has expired: [client 91.1.1.3:0] AH01075: Error dispatching request to : (polling)

The strange thing is that my bots were working properly as a background process, but I wasn’t able to connect to my domain via http/https request. But as soon as I stopped 4 of the bots, I was able to connect!


Could you have a dns issue with the related domain?
No, it’s not a DNS related issue, I’m using cloudflare. I don’t handle DNS requests myself.

are these bots using a different IP? could it be that your provider blackholed your IP, for instance if they detected a ddos attack or something like that?

I am not very familiar with how telegram bots work. is that stuff also going through cloudflare as proxy/protection?

No, all of them using my server IP. The provider doesn’t handle DDos attacks, they don’t block me. I think something is not configured properly in my HestiaCP setup, like max_connections in something like webservers, that it doesn’t allow more connections.

No, I just call mydomain.com/bot1-start.php and then bot starts and continue its process as a background process, it’s MadelineProto-bot. The issue is not related to Cloudflare.

do you use nginx+apache2 or nginx only? I agree that you could run into connection limits maybe, especially if the bots keep their connections open and occupied.

you said you couldn’t connect while they were running, do you recall what error you got in the browser? a timeout maybe?

Hestia does not offer specific functions to adjust connection limits and such, so I think you should check, what recommendations are out there for the bots you use and then need to adjust nginx and/or apache2 manually.

1 Like

I was using the default setup of HestiaCP, but I needed to access all directories of my server in my codes, so I tried to change “php_admin_value open_basedir” to none value.
So I used this tutorial:
https://forum.vestacp.com/viewtopic.php?t=15464

But this solution didn’t worked in HestiaCP, I’ve also edited the following files too:

/usr/local/hestia/data/templates/web/apache2/default.tpl
/usr/local/hestia/data/templates/web/apache2/default.stpl

/usr/local/vesta/data/templates/web/php-fpm/default.tpl
/usr/local/vesta/data/templates/web/php-fpm/default.stpl

/usr/local/vesta/data/templates/web/php-fpm/PHP-7_3.tpl
/usr/local/vesta/data/templates/web/php-fpm/PHP-7_3.stpl

But the “php_admin_value open_basedir” didn’t changed successfully, after all I’ve changed Backend Template (PHP-FPM) to PHP_7.3 and it worked, because as you see I’ve edited it before in the upper lines. I don’t know if changing the Backend Template was a correct decision or not.

Please also tell me how to change the “php_admin_value open_basedir” value for the default setup in HestiaCP.

I don’t remember the exact error code, it was the cloudflare error page, a 503 error. But it wasn’t a timeout error which appears by cloudflare after 60 seconds of not getting any response from the origin server. Whole domain was down, even PHPmyadmin url.
But other domains on the server had no problem, and the CPU usage was %1. And also the Apache was not down! I don’t know what limited new requests.

in general editing these templates is a bad idea, because they eventually get overwritten on any hestia update.

so maybe check if your config files have been rebuild on the last update and therefore your edits are gone already. this could then be the reason why everything stopped working.

you always want to copy the original template to another name and edit this clone to your needs. you can select the cloned template from within the UI for your domain…

if I add a file named my999.tpl in the following directory, will it be added to UI of my domain with this name automatically? Isn’t there any more step for having the added custom template in the UI panel?

/usr/local/vesta/data/templates/web/php-fpm/my999.tpl

you may want to read on here: https://docs.hestiacp.com/admin_docs/server_management.html#how-do-web-templates-work :wink:

3 Likes

From this line, I presume you are using Active Cloudflare caching, which means all traffic is being filtered through Cloudflare’s proxies. Plus as you can’t trace any blockage/timeout/errors in the webserver logs, my best guess is the telegram specific bot traffic could have been “blocked” by cloudflare itself. If you experience such issue again, you should quickly disable (grey out) the orange cloud on your domain and let all traffic pass through straight to your server and see if the site comes up fine.

Cloudflare can sometimes be really nasty, especially if you are running bots or other services that attract DDos attacks.

If the Services go down for domain on server level, you should raise limits / timeouts in the specific PHP-FPM config. Once you identify what settings work, better put them in the vhost template and apply that.

Good Luck

1 Like