[SOLVED] Upgrade to 1.2.0 borked Exim SMTP to GMail/GMX/Yahoo/others

Hi All

Since 9th July when my server automatically upgraded to 1,2,0, Google refuses to accept email. Their mx server connects and then immediately drops the connection at EHLO - redacted log below

[email protected]:~# echo “Subject: exim4 reconfiguration test” | sendmail -v [email protected]
LOG: MAIN
<= [email protected] U=root P=local S=334
[email protected]:~# delivering 1juaUq-000870-Kn
Connecting to alt2.gmail-smtp-in.l.google.com [142.250.4.27]:25 from 94.177.xxx.xx … connected
SMTP<< 220 mx.google.com ESMTP m3si263815pll.411 - gsmtp
SMTP>> EHLO ;; connection timed out; no servers could be reache
SMTP<< 501-5.5.4 HELO/EHLO argument “;; connection timed out; no servers could be
501-5.5.4 reache” invalid, closing connection.
501 5.5.4 https://support.google.com/mail/?p=helo m3si263815pll.411 - gsmtp
SMTP>> HELO ;; connection timed out; no servers could be reache
SMTP(close)>>
LOG: MAIN
H=alt2.gmail-smtp-in.l.google.com [142.250.4.27]: Remote host closed connection in response to HELO ;; connection timed out; no servers could be reache (EHLO response was: 501-5.5.4 HELO/EHLO argument “;; connection timed out; no servers could be
501-5.5.4 reache” invalid, closing connection.
501 5.5.4 https://support.google.com/mail/?p=helo m3si263815pll.411 - gsmtp)
LOG: MAIN
== [email protected] R=dnslookup T=remote_smtp defer (-18) H=alt2.gmail-smtp-in.l.google.com [142.250.4.27]: Remote host closed connection in response to HELO ;; connection timed out; no servers could be reache (EHLO response was: 501-5.5.4 HELO/EHLO argument “;; connection timed out; no servers could be\n501-5.5.4 reache” invalid, closing connection.\n501 5.5.4 https://support.google.com/mail/?p=helo m3si263815pll.411 - gsmtp)

dig mx.google.com returns the correct data when run from our server so I don’t understand their debug message that no servers can be reached. We use Cloudflare to manage the DNS and we have DKIM and DMARC TXT records which are correct (tested with multiple online toolboxes)

I need help debugging/fixing this ASAP as exim is only used to send cron and error messages to our corporate hostmaster account which is part of our domain hosted under GSuite - we run a number of mission critical cron jobs so we need the error messages.

TIA

did you have custom changes to your exim config made before, that could have been overwritten by the update?

Not to my knowledge but I’ve just found a file /etc/exim4/update-exim4.conf.conf (dated 20 May 2020) with a odd domain name in it so is it worth running it through update-exim4.conf do you think?

Also if there were some edited files, does the upgrade process not copy them somewhere? If not, might it be an idea if it did going forwards?

The updater should not overwrite the exim conf file, adjustments will be done using sed.

From what I see, it sounds more like a connection issue. What happens if you try to manualy communicate with the target mail server using telnet? Did I understand it properly, that your server (source) cant send emails to google (destination)?

Telnet/SMTP: https://www.sparkpost.com/blog/how-to-check-an-smtp-connection-with-a-manual-telnet-session/

testing the connection via telnet as @Scit wrote I’d also recommend.

I think we also need more information about your setup to narrow down on the issue. what OS are you using and which services you have installed alongside hestia. is there a (local) mail account set up in the control panel to your domain?

with plain sendmail you are eventually bypassing all DKIM and DMARC stuff - however it seems you are dropping far earlier then that anyway…

“Did I understand it properly, that your server (source) cant send emails to google (destination)?” - correct.

Telnet session works:

[email protected]:~# telnet smtp.gmail.com 25
Trying 74.125.133.109…
Connected to smtp.gmail.com.
Escape character is ‘^]’.
220 smtp.gmail.com ESMTP r10sm18961260wrm.17 - gsmtp
EHLO osprey.XXX.com
250-smtp.gmail.com at your service, [94.177.XXX.XXX]
250-SIZE 35882577
250-8BITMIME
250-STARTTLS
250-ENHANCEDSTATUSCODES
250-PIPELINING
250-CHUNKING
250 SMTPUTF8

I’m running Ubuntu 18.04 Server on a VPS. No other services apart from those installed by Hestia (ie web etc) We don’t run bind as all our domains use Cloudflare. Yes there are some mail local accounts on the server but they forward to GSuite accounts - one account (used very infrequently as a mailing list) holds data on the server, the rest don’t.

It’s a really simple set up that has worked perfectly for years with VestaCP before HestiaCP was around so I’m stumped!

On my normal echo test, it barfs before any response frfm EHLO which is bizarre…

besides the echo tests. what is your way of sending the mails (you mentioned cron)?

afaik gmail rejects empty EHLO commands, and from your first post this:

SMTP>> EHLO ;; connection timed out; no servers could be reache

looks pretty much empty :wink:

do you have a real world example of a mail that you are missing and the attached entry in exims logfile?

The empty EHLO definitely appears to be the issue. I just set up an account on the server to test with using Roundmail and the email never reaches any SMTP server, not just Google’s. The log from exim is below for an email sent to an account on GMX

2020-07-12 14:37:49 1jucAv-0007U9-NT <= [email protected] U=root P=local S=326
2020-07-12 14:37:49 1jucAv-0007U9-NT H=mx01.gmx.net [212.227.17.4]: SMTP error from remote mail server after HELO ;; connection timed out; no servers could be reache: 501 Syntax error in parameters or arguments
2020-07-12 14:37:50 dovecot_login authenticator failed for (User) [46.38.150.153]: 535 Incorrect authentication data ([email protected])
2020-07-12 14:37:50 1jucAv-0007U9-NT H=mx00.gmx.net [212.227.15.10]: SMTP error from remote mail server after HELO ;; connection timed out; no servers could be reache: 501 Syntax error in parameters or arguments
2020-07-12 14:37:50 1jucAv-0007U9-NT ** [email protected] R=dnslookup T=remote_smtp H=mx00.gmx.net [212.227.15.10]: SMTP error from remote mail server after HELO ;; connection timed out; no servers could be reache: 501 Syntax error in parameters or arguments
2020-07-12 14:37:50 1jucAw-0007UI-5G <= <> R=1jucAv-0007U9-NT U=Debian-exim P=local S=1768
2020-07-12 14:37:50 1jucAv-0007U9-NT Completed
2020-07-12 14:37:50 1jucAw-0007UI-5G ** [email protected]: Unrouteable address
2020-07-12 14:37:50 1jucAw-0007UI-5G Frozen (delivery error message)

I then sent an email to mail-tester.com which did get through, and looked at the source

Received: by mail-tester.com (Postfix, from userid 500)
id DD2FBA8746; Sun, 12 Jul 2020 16:26:04 +0200 (CEST)
Authentication-Results: mail-tester.com;
dkim=pass (1024-bit key; unprotected) header.d=XXX.com [email protected] header.b=Ws1KEzae;
dkim-atps=neutral
X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on mail-tester.com
X-Spam-Level:
X-Spam-Status: No/-0.2/5.0
X-Spam-Test-Scores: DKIM_SIGNED=0.1,DKIM_VALID=-0.1,DKIM_VALID_AU=-0.1,
DKIM_VALID_EF=-0.1,SPF_PASS=-0.001,URIBL_BLOCKED=0.001
X-Spam-Last-External-IP: 94.177.XXX.XXX
X-Spam-Last-External-HELO: ???connection?timed?out??no?servers?could?be?reache
X-Spam-Last-External-rDNS: osprey.XXX.com
X-Spam-Date-of-Scan: Sun, 12 Jul 2020 16:26:04 +0200
X-Spam-Report:
* 0.0 URIBL_BLOCKED ADMINISTRATOR NOTICE: The query to URIBL was
* blocked. See
* http://wiki.apache.org/spamassassin/DnsBlocklists#dnsbl-block
* for more information.
* [URIs: XXX.com]
* -0.0 SPF_PASS SPF: sender matches SPF record
* -0.1 DKIM_VALID_EF Message has a valid DKIM or DK signature from
* envelope-from domain
* -0.1 DKIM_VALID_AU Message has a valid DKIM or DK signature from
* author’s domain
* 0.1 DKIM_SIGNED Message has a DKIM or DK signature, not necessarily
* valid
* -0.1 DKIM_VALID Message has at least one valid DKIM or DK signature
Received-SPF: Pass (sender SPF authorized) identity=mailfrom; client-ip=94.177.XXX.XXX; helo=???connection?timed?out??no?servers?could?be?reache; [email protected]; [email protected]
DMARC-Filter: OpenDMARC Filter v1.3.1 mail-tester.com BCD83A0578
Authentication-Results: mail-tester.com; dmarc=pass header.from=XXX.com
Authentication-Results: mail-tester.com;
dkim=pass (1024-bit key; unprotected) header.d=XXX.com [email protected] header.b=Ws1KEzae;
dkim-atps=neutral
Received: from ???connection?timed?out??no?servers?could?be?reache (osprey.XXX.com [94.177.XXX.XXX])
(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
(No client certificate requested)
by mail-tester.com (Postfix) with ESMTPS id BCD83A0578
for [email protected]; Sun, 12 Jul 2020 16:26:02 +0200 (CEST)
DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=XXX.com;
s=mail; h=Message-ID:Subject:To:From:Date:Content-Transfer-Encoding:
Content-Type:MIME-Version:Sender:Reply-To:Cc:Content-ID:Content-Description:
Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:
In-Reply-To:References:List-Id:List-Help:List-Unsubscribe:List-Subscribe:
List-Post:List-Owner:List-Archive;
bh=r2uoTjB8+0WyaY56vX7vzz3Gp6gjaCFkUrm5xiKb3ZM=; b=Ws1KEzae3IyQuLqpORjVro7hP/
AEJHTpZsT1G2epumRoZezHUAFihQH6YAIeDBNVr9hhecKUyD1ROOOQGEWov1vT4t2wGGNfc0n55Xx
ZXGPCSiOvm6LrJXq5pWfJlB3lRMcS0Ci0iIE9h8lKrVi0dxJP9Qf2WQCsgb0+rlD3HFo=;
Received: from localhost ([127.0.0.1] helo=webmail.XXX.com)
by osprey.XXX.com with esmtp (Exim 4.90_1)
(envelope-from [email protected])
id 1jucvX-0006uD-7l
for [email protected]; Sun, 12 Jul 2020 15:25:59 +0100
MIME-Version: 1.0
Content-Type: text/plain; charset=US-ASCII;
format=flowed
Content-Transfer-Encoding: 7bit
Date: Sun, 12 Jul 2020 15:25:59 +0100
From: [email protected]
To: [email protected]
Subject: Need Headers
Message-ID: [email protected]
X-Sender: [email protected]
User-Agent: Roundcube Webmail

lots of lines containing
“from ???connection?timed?out??no?servers?could?be?reache (osprey.XXX.com [94.177.XXX.XXX])”

which I presume is caused by the lack of server name send by EHLO

So how do I get exim to send the server name with EHLO? Run the update-exim-conf command?

BTW I really appreciate your help on a Sunday. Above and beyond!

afaik there is no need to ‘run’ anything with the update.conf file, usually the files needed are referenced or included in the main config

I just checked, and afaik the handling of HELO data has changed, there should eventually be a mailhelo.conf in /etc/exim4 which holds the names of the different maildomains on that server.
the corresponding line exim4.conf.template is 327 which now tries to determine the correct domain name to use when sending out an email. alternatively the primary hostname is used.

I have to look up the commit for that and what the reason was to get more insights. could you meanwhile check if you have set a proper hostname and/or if the domain name you are sending from via commandline/sendmail would be in the mailhelo.conf ?

Found the issue. The upgrade borked /etc/exim4/mailhelo.conf which looks like
osprey.XXX.com:;; connection timed out; no servers could be reache

It’s now working for GMX but not for Google which I think might be a timing issue so I’ll check tomorrow when I’m at work and report back here.

Thanks guys, I hope I can mark this as solved tomorrow

1 Like

Our messages crossed - sorry. At the moment, hold off on doing any more to help me and I’ll report back once I get to the bottom of Google.

  1. Are you using multiple ips on that server?

  2. Can you share privately in a PM you public ip (/home/conf/mail/$domain/ip) so we can check the rdns validation function

  3. Is you dns working?

dig google.com @8.8.8.8
dig google.com @1.1.1.1

Hi all,
I had smiliar issue. Here is what solved my issue with mail delivery to gmail.com

Try,

#Delete files:

rm /var/spool/exim4/db/*
rm /var/spool/exim4/input/*
rm /var/spool/exim4/msglog/*
rm /var/log/exim4/*

#Clear file:

> /var/log/dovecot.log

#Then restart Exim4 OR Restart VPS

service exim4 restart

@awrobl - that’s what I tried first and it made no difference. For me it’s a borked /etc/exim4/mailhelo.conf file

@Lupu To answer your questions:

  1. No
  2. Yes if you still need it - the rdns appears to be working from the message headers. If you still want to check, PM me and I’ll provide the IP - won’t be until tomorrow as its 20:20 in the UK and my family expects my presence for a movie!
  3. Yes

The acid test will be around 05:00 tomorrow when all our cronjobs start running … I’ll report back when I know if they worked or not.

All cron messages arrived this morning. Problem solved.

Thanks guys, much appreciated

1 Like

Thanks @brackenhill-mob, we found the problem yesterday and pushed a fix if you want to apply it manually here:https://github.com/hestiacp/hestiacp/pull/924/files

My take on what happened is that mailhelo.conf file was updated when there was a (network) problem with resolving dns queries.

2 Likes

Thanks @Lupu. Having looked at the source changes, your hypothesis makes sense to me :wink:

Is there any need for me to manually update with this patch? I’m assuming that this file will be part of the next upgrade so will be installed and run automatically then. If I’m wrong, I’ll apply it :wink:

Nope, don’t think it is needed

how best to apply this patch?

thanks