Exim4 outbound email disappeared

Hi all, a few hours ago I updated a test server “srv02.srv.mydomain.tld” to HestiaCP 1.6.6. But I noticed that I hadn’t received the usual “Update Installed - v1.x.y” email message, like before. And the mail queue seems empty.

root@srv02:~# tail -100 /var/log/exim4/mainlog.1
[...]
2022-08-05 22:04:03 End queue run: pid=26075
2022-08-05 22:34:03 Start queue run: pid=28186
2022-08-05 22:34:03 End queue run: pid=28186
2022-08-05 23:04:03 Start queue run: pid=30197
2022-08-05 23:04:03 End queue run: pid=30197
2022-08-05 23:24:26 exim 4.92 daemon started: pid=1102, -q30m, listening for SMTP on port 25 (IPv4) port 587 (IPv4) and for SMTPS on port 465 (IPv4)
2022-08-05 23:24:26 Start queue run: pid=1104
2022-08-05 23:24:26 End queue run: pid=1104
2022-08-05 23:24:32 1oK3s0-0000Tu-9X <= [email protected] U=root P=local S=2264 id=69aYk2xFdkYD2eJu1QDrm0u5FSYjkWxVQeancHmqF0@srv02.srv.mydomain.tld
2022-08-05 23:24:35 1oK3s0-0000Tu-9X H=mail.mydomain.tld [mail.mydomain.tld-ipv4]: mail.mydomain.tld [mail.mydomain.tld-ipv4]: No route to host
2022-08-05 23:26:45 1oK3s0-0000Tu-9X H=mx.mydomain.tld [mx.mydomain.tld-ipv4]: SMTP timeout after initial connection: Connection timed out
2022-08-05 23:26:45 1oK3s0-0000Tu-9X == [email protected] R=dnslookup T=remote_smtp defer (110): Connection timed out H=mx.mydomain.tld [mx.mydomain.tld-ipv4]: SMTP timeout after initial connection
2022-08-05 23:26:45 1oK3s0-0000Tu-9X ** [email protected]: retry timeout exceeded
2022-08-05 23:26:45 1oK3u9-0000aM-If <= <> R=1oK3s0-0000Tu-9X U=Debian-exim P=local S=3573
2022-08-05 23:26:45 1oK3s0-0000Tu-9X Completed
2022-08-05 23:26:45 1oK3u9-0000aM-If remote host address is the local host: srv02.srv.mydomain.tld
2022-08-05 23:26:45 1oK3u9-0000aM-If == [email protected] R=dnslookup defer (-1): remote host address is the local host
2022-08-05 23:26:45 1oK3u9-0000aM-If ** [email protected]: retry timeout exceeded
2022-08-05 23:26:45 1oK3u9-0000aM-If [email protected]: error ignored
2022-08-05 23:26:45 1oK3u9-0000aM-If Completed
2022-08-05 23:54:26 Start queue run: pid=4240
2022-08-05 23:54:26 End queue run: pid=4240
root@srv02:~#

root@srv02:~# tail -100 /var/log/exim4/mainlog
2022-08-06 00:24:26 Start queue run: pid=8681
2022-08-06 00:24:26 End queue run: pid=8681
2022-08-06 00:54:26 Start queue run: pid=10854
2022-08-06 00:54:26 End queue run: pid=10854
2022-08-06 01:24:26 Start queue run: pid=12855
2022-08-06 01:24:26 End queue run: pid=12855
2022-08-06 01:54:26 Start queue run: pid=14975
2022-08-06 01:54:26 End queue run: pid=14975
root@srv02:~#

root@srv02:~#
root@srv02:~# mailq
root@srv02:~# 

root@srv02:~# cat /etc/debian_version 
10.12
root@srv02:~# dpkg -l|fgrep hestia
ii  hestia                            1.6.6                                                                      amd64        hestia
ii  hestia-nginx                      1.23.0                                                                     amd64        hestia Nginx
ii  hestia-php                        8.1.8                                                                      amd64        hestia php-fpm
root@srv02:~#

root@srv02:~# host -t mx mydomain.cctld.
mydomain.cctld mail is handled by 10 mail.mydomain.tld.
mydomain.cctld mail is handled by 50 mx.mydomain.tld.
root@srv02:~#

root@srv02:~# host -t mx srv.mydomain.tld
srv.mydomain.tld has no MX record
root@srv02:~# host -t mx srv02.srv.mydomain.tld
srv02.srv.mydomain.tld has no MX record

The 2 mail-servers (mail & mx) were temporary unavailable, but shouldn’t Exim4 queue the email and keep retrying for 3-4 days? DNS resolution is fine for both domains.

Note: I have redacted the actual domain names from the log excerpt above, but there are 2 second-level domains involved here: 1st is mydomain.tld and 2nd is mydomain.cctld
tld = [ .com, .org, .net etc] and
cctld = [ .de, .fr, .fi etc] not that any of this should matter …

Any ideas what might have gone wrong here would be much appreciated!

Thank you, KP.

Perhaps the fact that I’m using a 3rd level subdomain (.srv.) for grouping my servers ( srv02.srv.mydomain.tld ) might be confusing Exim4 in this case?

2022-08-05 23:26:45 1oK3u9-0000aM-If remote host address is the local host: srv02.srv.mydomain.tld
2022-08-05 23:26:45 1oK3u9-0000aM-If == [email protected] R=dnslookup defer (-1): remote host address is the local host
2022-08-05 23:26:45 1oK3u9-0000aM-If ** [email protected]: retry timeout exceeded
2022-08-05 23:26:45 1oK3u9-0000aM-If [email protected]: error ignored
2022-08-05 23:26:45 1oK3u9-0000aM-If Completed

Also a unrelated question: Are there any disadvantages and/or caveats with putting HestiaCP behind NAT ? (and using a private RFC1918 address for the eth0 interface)

it looks like the DNS look up for srv02.srv.mydomain.tld fails…

As far as I can tell, DNS has been working fine for all systems involved (forward & reverse DNS names match etc):

root@srv02:~# hostname -f
srv02.srv.mydomain.tld
root@srv02:~# ifconfig eth0
eth0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet xx.yy.zz.59  netmask 255.255.255.240  broadcast xx.yy.zz.63
        inet6 fe80::250:56ff:fe00:f299  prefixlen 64  scopeid 0x20<link>
        ether 00:00:99:99:99:99  txqueuelen 1000  (Ethernet)
        RX packets 10873731  bytes 1715543251 (1.5 GiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 5351057  bytes 1810648430 (1.6 GiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

root@srv02:~# 

root@srv02:~# host srv02.srv.mydomain.tld
srv02.srv.mydomain.tld has address xx.yy.zz.59

root@srv02:~# host mail.mydomain.tld
mail.mydomain.tld has address xx.yy.zz.60

root@srv02:~# host mx.mydomain.tld
mx.mydomain.tld has address aa.bb.cc.131
root@srv02:~# 

root@srv02:~# host -t mx mydomain.tld
mydomain.tld mail is handled by 10 mail.mydomain.tld.
mydomain.tld mail is handled by 50 mx.mydomain.tld.

root@srv02:~# host -t mx mydomain.cctld
mydomain.cctld mail is handled by 50 mx.mydomain.tld.
mydomain.cctld mail is handled by 10 mail.mydomain.tld.

root@srv02:~# host -t mx srv02.srv.mydomain.tld
srv02.srv.mydomain.tld has no MX record

root@srv02:~# host -t mx srv.mydomain.tld
srv.mydomain.tld has no MX record
root@srv02:~#

Please note that I didn’t include an MX record for srv02.srv.mydomain.tld

The upgrade notification sent to [email protected] would have been delivered, if both MX’s for mydomain.cctld didn’t happen to be “down” simultaneously.

I’m not familiar with Exim4 (HestiaCP is the first time I’m going to use it), but Postfix would never delete an email in this way … I wish we had an Exim4 expert to look into this.

Anyway, the question is how should DNS MX and Exim4 be configured for each HestiaCP server hostname ? (best practices). Should I add a DNS MX record for each HestiaCP server to itself and also configure its fqdn as an Exim4 virtual domain for local delivery (I think we have discussed this again several months ago).

I think this issue might “bite” others too, because it doesn’t manifest itself unless all MXes are momentarily unavailable.

I received every HestiaCP update email to [email protected] before, except the email sent last night regarding update to v1.6.6

The config suggestions I remembered were posted 1.5yr ago here

But I would still be interested in any “best practices” config suggestions for operating e.g. 10 HestiaCP servers ( srv01.srv.mydomain.tld to srv10.srv.mydomain.tld )

My config is:

It seems I have found the culprit, apparently AppArmor was interfering with DNS resolution. (I had disabled AppArmor many months ago, but apparently it was re-enabled at some point)

I was scratching my head, because every DNS query resolved fine for me as root from shell, using host / nslookup / dig …

I noticed that ClamAV also had problems downloading updates due to DNS failure (clamav-freshclam has saner error messages than Exim4)

root@srv02:~# tail -50 /var/log/exim4/mainlog
2022-08-09 00:04:03 Start queue run: pid=24572
2022-08-09 00:04:03 End queue run: pid=24572
2022-08-09 00:34:03 Start queue run: pid=29007
2022-08-09 00:34:03 End queue run: pid=29007
2022-08-09 01:04:03 Start queue run: pid=31056
2022-08-09 01:04:03 End queue run: pid=31056
2022-08-09 01:34:03 Start queue run: pid=707
2022-08-09 01:34:03 End queue run: pid=707
2022-08-09 02:04:03 Start queue run: pid=2723
2022-08-09 02:04:03 End queue run: pid=2723
2022-08-09 02:34:03 Start queue run: pid=5455
2022-08-09 02:34:03 End queue run: pid=5455
2022-08-09 03:04:03 Start queue run: pid=7930
2022-08-09 03:04:03 End queue run: pid=7930
2022-08-09 03:33:06 1oLDBC-0002gr-Qt <= [email protected] U=root P=local S=773
2022-08-09 03:33:07 1oLDBC-0002gr-Qt remote host address is the local host: srv02.srv.mydomain.tld
2022-08-09 03:33:07 1oLDBC-0002gr-Qt == [email protected] R=dnslookup defer (-1): remote host address is the local host
2022-08-09 03:33:07 1oLDBC-0002gr-Qt ** [email protected]: retry timeout exceeded
2022-08-09 03:33:07 1oLDBD-0002gx-21 <= <> R=1oLDBC-0002gr-Qt U=Debian-exim P=local S=2034
2022-08-09 03:33:07 1oLDBD-0002gx-21 remote host address is the local host: srv02.srv.mydomain.tld
2022-08-09 03:33:07 1oLDBD-0002gx-21 == [email protected] R=dnslookup defer (-1): remote host address is the local host
2022-08-09 03:33:07 1oLDBD-0002gx-21 ** [email protected]: retry timeout exceeded
2022-08-09 03:33:07 1oLDBC-0002gr-Qt Completed
2022-08-09 03:33:07 1oLDBD-0002gx-21 [email protected]: error ignored
2022-08-09 03:33:07 1oLDBD-0002gx-21 Completed
2022-08-09 03:34:03 Start queue run: pid=10592
2022-08-09 03:34:03 End queue run: pid=10592
2022-08-09 03:34:04 exim 4.92 daemon started: pid=10902, -q30m, listening for SMTP on port 25 (IPv4) port 587 (IPv4) and for SMTPS on port 465 (IPv4)
2022-08-09 03:34:04 Start queue run: pid=10904
2022-08-09 03:34:04 End queue run: pid=10904
2022-08-09 04:04:04 Start queue run: pid=13839
2022-08-09 04:04:04 End queue run: pid=13839
root@srv02:~# 

root@srv02:~# zmore /var/log/syslog.7.gz 
Aug  2 00:24:01 srv02 CRON[11545]: (admin) CMD (sudo /usr/local/hestia/bin/v-update-sys-queue restart)
Aug  2 00:25:01 srv02 CRON[11575]: (root) CMD (command -v debian-sa1 > /dev/null && debian-sa1 1 1)
Aug  2 00:25:01 srv02 CRON[11576]: (admin) CMD (sudo /usr/local/hestia/bin/v-update-sys-queue backup)
Aug  2 00:25:01 srv02 CRON[11577]: (admin) CMD (sudo /usr/local/hestia/bin/v-update-sys-rrd)
Aug  2 00:26:01 srv02 CRON[11929]: (admin) CMD (sudo /usr/local/hestia/bin/v-update-sys-queue restart)
Aug  2 00:28:01 srv02 CRON[11957]: (admin) CMD (sudo /usr/local/hestia/bin/v-update-sys-queue restart)
Aug  2 00:30:01 srv02 CRON[11987]: (admin) CMD (sudo /usr/local/hestia/bin/v-update-sys-rrd)
Aug  2 00:30:01 srv02 CRON[11988]: (admin) CMD (sudo /usr/local/hestia/bin/v-update-sys-queue backup)
Aug  2 00:30:01 srv02 CRON[11989]: (admin) CMD (sudo /usr/local/hestia/bin/v-update-sys-queue restart)
Aug  2 00:32:01 srv02 CRON[12260]: (admin) CMD (sudo /usr/local/hestia/bin/v-update-sys-queue restart)
Aug  2 00:32:03 srv02 freshclam[15250]: Tue Aug  2 00:32:03 2022 -> Received signal: wake up
Aug  2 00:32:03 srv02 freshclam[15250]: Tue Aug  2 00:32:03 2022 -> ClamAV update process started at Tue Aug  2 00:32:03 2022
Aug  2 00:32:03 srv02 freshclam[15250]: Tue Aug  2 00:32:03 2022 -> ^Can't query current.cvd.clamav.net
Aug  2 00:32:03 srv02 freshclam[15250]: Tue Aug  2 00:32:03 2022 -> ^Invalid DNS reply. Falling back to HTTP mode.
Aug  2 00:32:03 srv02 freshclam[15250]: Tue Aug  2 00:32:03 2022 -> Trying to retrieve CVD header from https://database.clamav.net/daily.cvd
Aug  2 00:32:03 srv02 freshclam[15250]: Tue Aug  2 00:32:03 2022 -> ^remote_cvdhead: Download failed (6) Tue Aug  2 00:32:03 2022 -> ^ Message: Couldn't resolve host name
Aug  2 00:32:03 srv02 freshclam[15250]: Tue Aug  2 00:32:03 2022 -> ^Failed to get daily database version information from server: https://database.clamav.net
Aug  2 00:32:03 srv02 freshclam[15250]: Tue Aug  2 00:32:03 2022 -> !check_for_new_database_version: Failed to find daily database using server https://database.clamav.net.
Aug  2 00:32:03 srv02 freshclam[15250]: Tue Aug  2 00:32:03 2022 -> Trying again in 5 secs...
Aug  2 00:32:08 srv02 freshclam[15250]: Tue Aug  2 00:32:08 2022 -> Trying to retrieve CVD header from https://database.clamav.net/daily.cvd
Aug  2 00:32:08 srv02 freshclam[15250]: Tue Aug  2 00:32:08 2022 -> ^remote_cvdhead: Download failed (6) Tue Aug  2 00:32:08 2022 -> ^ Message: Couldn't resolve host name
Aug  2 00:32:08 srv02 freshclam[15250]: Tue Aug  2 00:32:08 2022 -> ^Failed to get daily database version information from server: https://database.clamav.net
Aug  2 00:32:08 srv02 freshclam[15250]: Tue Aug  2 00:32:08 2022 -> !check_for_new_database_version: Failed to find daily database using server https://database.clamav.net.
Aug  2 00:32:08 srv02 freshclam[15250]: Tue Aug  2 00:32:08 2022 -> Trying again in 5 secs...
Aug  2 00:32:13 srv02 freshclam[15250]: Tue Aug  2 00:32:13 2022 -> Trying to retrieve CVD header from https://database.clamav.net/daily.cvd
Aug  2 00:32:13 srv02 freshclam[15250]: Tue Aug  2 00:32:13 2022 -> ^remote_cvdhead: Download failed (6) Tue Aug  2 00:32:13 2022 -> ^ Message: Couldn't resolve host name
Aug  2 00:32:13 srv02 freshclam[15250]: Tue Aug  2 00:32:13 2022 -> ^Failed to get daily database version information from server: https://database.clamav.net
Aug  2 00:32:13 srv02 freshclam[15250]: Tue Aug  2 00:32:13 2022 -> !check_for_new_database_version: Failed to find daily database using server https://database.clamav.net.
Aug  2 00:32:13 srv02 freshclam[15250]: Tue Aug  2 00:32:13 2022 -> Trying again in 5 secs...
Aug  2 00:32:18 srv02 freshclam[15250]: Tue Aug  2 00:32:18 2022 -> Trying to retrieve CVD header from https://database.clamav.net/daily.cvd
Aug  2 00:32:18 srv02 freshclam[15250]: Tue Aug  2 00:32:18 2022 -> ^remote_cvdhead: Download failed (6) Tue Aug  2 00:32:18 2022 -> ^ Message: Couldn't resolve host name
Aug  2 00:32:18 srv02 freshclam[15250]: Tue Aug  2 00:32:18 2022 -> ^Failed to get daily database version information from server: https://database.clamav.net
Aug  2 00:32:18 srv02 freshclam[15250]: Tue Aug  2 00:32:18 2022 -> !check_for_new_database_version: Failed to find daily database using server https://database.clamav.net.
Aug  2 00:32:18 srv02 freshclam[15250]: Tue Aug  2 00:32:18 2022 -> Trying again in 5 secs...
Aug  2 00:32:23 srv02 freshclam[15250]: Tue Aug  2 00:32:23 2022 -> Trying to retrieve CVD header from https://database.clamav.net/daily.cvd
Aug  2 00:32:23 srv02 freshclam[15250]: Tue Aug  2 00:32:23 2022 -> !remote_cvdhead: Download failed (6) Tue Aug  2 00:32:23 2022 -> ! Message: Couldn't resolve host name
Aug  2 00:32:23 srv02 freshclam[15250]: Tue Aug  2 00:32:23 2022 -> ^Failed to get daily database version information from server: https://database.clamav.net
Aug  2 00:32:23 srv02 freshclam[15250]: Tue Aug  2 00:32:23 2022 -> !check_for_new_database_version: Failed to find daily database using server https://database.clamav.net.
Aug  2 00:32:23 srv02 freshclam[15250]: Tue Aug  2 00:32:23 2022 -> Giving up on https://database.clamav.net...
Aug  2 00:32:23 srv02 freshclam[15250]: Tue Aug  2 00:32:23 2022 -> Trying to retrieve CVD header from https://database.clamav.net/daily.cvd
Aug  2 00:32:23 srv02 freshclam[15250]: Tue Aug  2 00:32:23 2022 -> ^remote_cvdhead: Download failed (6) Tue Aug  2 00:32:23 2022 -> ^ Message: Couldn't resolve host name
Aug  2 00:32:23 srv02 freshclam[15250]: Tue Aug  2 00:32:23 2022 -> ^Failed to get daily database version information from server: https://database.clamav.net
Aug  2 00:32:23 srv02 freshclam[15250]: Tue Aug  2 00:32:23 2022 -> !check_for_new_database_version: Failed to find daily database using server https://database.clamav.net.
Aug  2 00:32:23 srv02 freshclam[15250]: Tue Aug  2 00:32:23 2022 -> Trying again in 5 secs...
Aug  2 00:32:28 srv02 freshclam[15250]: Tue Aug  2 00:32:28 2022 -> Trying to retrieve CVD header from https://database.clamav.net/daily.cvd
Aug  2 00:32:28 srv02 freshclam[15250]: Tue Aug  2 00:32:28 2022 -> ^remote_cvdhead: Download failed (6) Tue Aug  2 00:32:28 2022 -> ^ Message: Couldn't resolve host name
Aug  2 00:32:28 srv02 freshclam[15250]: Tue Aug  2 00:32:28 2022 -> ^Failed to get daily database version information from server: https://database.clamav.net
Aug  2 00:32:28 srv02 freshclam[15250]: Tue Aug  2 00:32:28 2022 -> !check_for_new_database_version: Failed to find daily database using server https://database.clamav.net.
Aug  2 00:32:28 srv02 freshclam[15250]: Tue Aug  2 00:32:28 2022 -> Trying again in 5 secs...
Aug  2 00:32:33 srv02 freshclam[15250]: Tue Aug  2 00:32:33 2022 -> Trying to retrieve CVD header from https://database.clamav.net/daily.cvd
Aug  2 00:32:33 srv02 freshclam[15250]: Tue Aug  2 00:32:33 2022 -> ^remote_cvdhead: Download failed (6) Tue Aug  2 00:32:33 2022 -> ^ Message: Couldn't resolve host name
Aug  2 00:32:33 srv02 freshclam[15250]: Tue Aug  2 00:32:33 2022 -> ^Failed to get daily database version information from server: https://database.clamav.net
Aug  2 00:32:33 srv02 freshclam[15250]: Tue Aug  2 00:32:33 2022 -> !check_for_new_database_version: Failed to find daily database using server https://database.clamav.net.
Aug  2 00:32:33 srv02 freshclam[15250]: Tue Aug  2 00:32:33 2022 -> Trying again in 5 secs...
Aug  2 00:32:38 srv02 freshclam[15250]: Tue Aug  2 00:32:38 2022 -> Trying to retrieve CVD header from https://database.clamav.net/daily.cvd
Aug  2 00:32:38 srv02 freshclam[15250]: Tue Aug  2 00:32:38 2022 -> ^remote_cvdhead: Download failed (6) Tue Aug  2 00:32:38 2022 -> ^ Message: Couldn't resolve host name
Aug  2 00:32:38 srv02 freshclam[15250]: Tue Aug  2 00:32:38 2022 -> ^Failed to get daily database version information from server: https://database.clamav.net
Aug  2 00:32:38 srv02 freshclam[15250]: Tue Aug  2 00:32:38 2022 -> !check_for_new_database_version: Failed to find daily database using server https://database.clamav.net.
Aug  2 00:32:38 srv02 freshclam[15250]: Tue Aug  2 00:32:38 2022 -> Trying again in 5 secs...
Aug  2 00:32:43 srv02 freshclam[15250]: Tue Aug  2 00:32:43 2022 -> Trying to retrieve CVD header from https://database.clamav.net/daily.cvd
Aug  2 00:32:43 srv02 freshclam[15250]: Tue Aug  2 00:32:43 2022 -> !remote_cvdhead: Download failed (6) Tue Aug  2 00:32:43 2022 -> ! Message: Couldn't resolve host name
Aug  2 00:32:43 srv02 freshclam[15250]: Tue Aug  2 00:32:43 2022 -> ^Failed to get daily database version information from server: https://database.clamav.net
Aug  2 00:32:43 srv02 freshclam[15250]: Tue Aug  2 00:32:43 2022 -> !check_for_new_database_version: Failed to find daily database using server https://database.clamav.net.
Aug  2 00:32:43 srv02 freshclam[15250]: Tue Aug  2 00:32:43 2022 -> Giving up on https://database.clamav.net...
root@srv02:~#