Hestia Updates & Failing Safely

bluemarble · October 19, 2021, 9:32pm

One of the updates installed a new /etc/nginx.conf, which caused a conflict with my configuration. As a result, NGINX didn’t restart and all my sites were down for several days until I noticed the error.

Is there any way to add a configuration check (e.g. nginx -t) before trying to restart NGINX after an update? That way, if there’s a conflict, the webserver will continue to run with the old configuration while giving me a chance to correct any errors.

eris · October 19, 2021, 9:39pm

Nginx update is not arranged by Hestia and there for we are not even able to detect if Nginx restart. Also during this update the software will ask you to keep the current version or replace it with the new version… Always say keep the current version.

bluemarble · October 19, 2021, 10:06pm

In apt (Debian’s package management), I have NGINX held so it can’t update. I have some custom modules so if I accidentally updated NGINX somehow, I’d know because I’d have to recompile the modules to be compatible with other versions.

At this point, I have no other suspect except Hestia.

eris · October 19, 2021, 10:10pm

Hestia does not update a new nginx.conf so it can’t be Hestia… It doesn’t make sense as Hestia uses an custom nginx.conf

bluemarble · October 19, 2021, 10:42pm

At this point, I don’t know who updated /etc/nginx/nginx.conf. NGINX is held in the package manager and any updates would immediately break because the modules aren’t compiled for the right version.

I spun up a test version of Debian 9 on a VM and /etc/nginx/nginx.conf is totally different than the one running Hestia.

Regardless, I still want a configuration check before Hestia attempts to restart the webserver. I want the updates to fail safely and keep my sites running.

eris · October 19, 2021, 10:47pm

Hestia does not restart Nginx during the update of package that is not controlled by Hestia.

Also Hestia has already a check in place before Hestia restarts Nginx. But it wasn’t triggered as the hestia didn’t request the update…

So what do we want to solve?

bluemarble · October 19, 2021, 11:07pm

Sorry for the deletion of my previous post. I realized I left my domain in one of the screenshots.

hestia_restart_failure

Both the upgrade and NGINX restart failure occurred within a minute.

And the duplicate directive, I originally wrote it in /etc/nginx/conf.d/cache.conf. It was a surprise to me when I found similar directives in /etc/nginx/nginx.conf.

I can only speculate that Hestia did something during the upgrade to change /etc/nginx/nginx.conf.

eris · October 19, 2021, 11:13pm

This was added in:

github.com

hestiacp/hestiacp/blob/a2845d4c71d5b5e130b0856a232a15825517900c/install/upgrade/versions/1.4.0.sh#L10-L22

    
      
          if [ -e "/etc/nginx/nginx.conf" ]; then
              check=$(cat /etc/nginx/nginx.conf | grep 'fastcgi_cache_path');
              if [ -z "$check" ]; then 
                  echo "[ * ] Enabling nginx FastCGI cache support..."
                  sed  -i 's/# Cache bypass/# FastCGI cache\n    fastcgi_cache_path \/var\/cache\/nginx\/micro levels=1:2 keys_zone=microcache:10m max_size=1024m inactive=30m;\n    fastcgi_cache_key \"$scheme$request_method$host$request_uri\";\n    fastcgi_cache_methods GET HEAD;\n    fastcgi_cache_use_stale updating error timeout invalid_header http_500 http_503;\n    fastcgi_ignore_headers Cache-Control Expires Set-Cookie;\n    add_header X-FastCGI-Cache \$upstream_cache_status;\n\n    # Cache bypass/g' /etc/nginx/nginx.conf
                  sed  -i 's/    fastcgi_cache_lock_timeout      5s;/    fastcgi_cache_lock_timeout      5s;\n    fastcgi_cache_background_update on;\n    fastcgi_cache_revalidate        on;/g' /etc/nginx/nginx.conf
              fi
          fi
          
          
if [ -e "/etc/nginx/nginx.conf" ]; then
              echo "[ * ] Updating nginx configuration with changes to Cloudflare IP addresses"
              sed  -i 's/    set_real_ip_from 104.16.0.0\/12;/    set_real_ip_from 104.16.0.0\/13;\n    set_real_ip_from 104.24.0.0\/14;/g' /etc/nginx/nginx.conf
          fi

When we added released 1.4.0 in may 2021 with the introduction of fastcgi support nginx.

So it should fail then at an earlier time…

bluemarble · October 19, 2021, 11:23pm

No. You didn’t read the error message closely enough. You cannot have duplicate fastcgi_cache_key directives.

I guess this also means Hestia changed /etc/nginx/nginx.conf. I really wish you’d give people the benefit of the doubt.

Regardless, Hestia attempted to perform a webserver restart. My feature request is to have Hestia perform a configuration check before attempting to restart the webserver.

Raphael · October 20, 2021, 4:54am

You did an upgrade from 1.4.14 to 1.4.16, also hestia doesnt touch usualy the nginx.conf, just in rare cases like with the release of new features like we did in 1.4.0 - but in that case, it seems not to be related to your problem.

There has been already some improved validation steps for service restart handling, but its still work in progress.

I can’t tell you where it came your issue from, but in that update case (1.4.14 to 1.4.16) it was clearly not hestia - expect the nginx restart which showed up the issue.

About the “…went days offline”-part I would suggest to use a monitoring service like statuscake.com or uptimerobot.

eris · October 20, 2021, 6:14am

Unless the server was never restarted since 1.4.0 release / When the update was applied nginx should all ready be off-line for a very long period. As after 1.4.0 was installed the server was already restarted.

It is always smart to enable monitoring for your server with the use of nagios or any other monitor tool.

There plans to improve the restart and allow the system to go back to the last template used (or default) template used.

Also for LE, creation / deleting / modifying web settings nginx is already restarted regullary …

bluemarble · October 20, 2021, 3:22pm

I agree that the changes to nginx.conf may not be directly related to 1.4.16 update. Hestia may have implemented those changes in previous upgrades.

However, Hestia trying to restart the webserver without checking the configuration is an issue. A simple ‘nginx -t | grep successful’ would’ve detected the issue.

Yes, that sounds like a good idea. Debian being rock solid and my VPS host having great uptime lulled me into a false sense of security. I honestly didn’t think Hestia would be the one to cause a DoS.

bluemarble · October 20, 2021, 3:30pm

No, that’s not true. I originally installed Hestia 1.4.1 or 1.4.2 and I specifically implemented fastcgi caching because it wasn’t present in the current NGINX configuration.

eris · October 20, 2021, 3:36pm

FastCGI caching was introduced in 1.4.0…

[1.4.0] - Major Release (Feature / Quality Update)

Features

Introduced support for NGINX FastCGI cache.

…

github.com

hestiacp/hestiacp/blob/main/CHANGELOG.md#140---major-release-feature--quality-update

# Changelog
All notable changes to this project will be documented in this file.

## [DEVELOPMENT] - Service release 

## Features

### Bugfixes

- Disable /reset/ endpoint when POLICY_SYSTEM_PASSWORD_RESET = no

## [1.4.17] - Service release 

### Bugfixes

- Fix bug with nginx and phmyadmin not loading
- Fix #2166 Search function broken 
- Update Quick installers to the last version

## [1.4.16] - Service release

This file has been truncated. show original

bluemarble · October 20, 2021, 4:46pm

To be honest, I don’t really know which version of Hestia I installed. According to the notifications, the earliest update that I see is 1.4.3, which is why I assumed I installed 1.4.1 or 1.4.2. But I may have installed an earlier version.

But I do know that at the time of installation, Hestia didn’t have fastcgi cache enabled because I had to manually create the config files.

Regardless, my issue is that Hestia made changes to a core NGINX config file, attempted to restarted the webserver, and it failed. It doesn’t matter whether Hestia made those changes several versions prior or just then. It just matters that at the time of the webserver restart, Hestia didn’t make any attempts to see whether the configuration was good.

I’d like to see Hestia do a configuration check before attempting to restart the webserver. If the configuration check fails, abort the webserver restart and send an email warning that there’s something’s wrong with the webserver config. That way, the webserver keeps running, no sites go down, and the administrator has time to take corrective action.