Gmail SMTP timeout after sending data block, Connection timed out

Deepak · October 20, 2021, 8:11pm

After understanding what tcp_window_scaling is, I could no longer believe that your problem relates to that. The area in firewall is a good one to get into.

Ask your provider if he has following modules properly configured, which are injected into the kernel:

Main vps serverconfiguration
That is for CSF, whi9ch you do not use. But it will help to identify, if these modules are in there and in their absence, if they “may be” causing troubles (not all, though). You could turn off the firewall and install CSF instead, to find the difference.
BTW, CSF declared to give up support for OVZ long times ago. It has been many years, I have not seen OVZ. It has too many disadvantages.

eris · October 20, 2021, 8:13pm

OpenVZ and Iptables are not working together nice …

Deepak · October 20, 2021, 8:16pm

Thats precisely CSF refused to support OVZ, simply because they could not work in harmony. So Eris, you are correct. But to have a bad firewall setting as default, which causes hindrances to exim4 is worse than CSF firewall not in harmony. It will work - if and only if - the modules, as listed in the linked page, are properly configured in the host system.

trogvar · October 20, 2021, 8:25pm

I run my own libvirt/KVM host with lots of VM guests there. And for several years there was mail server with Debian 9 and VestaCP. I’ve migrated this server to new VM guest under Debian 11 and HestiaCP.
Just fresh install and backup recovery. Same settings in VM.
But it doesn’t work like before.
Changes only in OS (debian 9 - debian 11) and CP (vesta to hestia).

trogvar · October 21, 2021, 10:25pm

I finally solved my problem…
Changed virtio driver for bridged network card to e1000 in libvirt guest settings.
Whole day no issues with firewall on and tcp_window_scaling = 1.

zoroboak · October 22, 2021, 10:10am

Hello eris, something similar happens to me as to trogvar but I do not have access to the master computer of the VPS that rents me the hosting, this solution that you offer, has fixed one of the incidences that mxtoolbox warns me about. [Domain Health Check - Online Domain Tools - Blacklist, Email, Website, DNS - MxToolBox]

But the main problem is that every time I send an email to microsoft, for example, it never arrives, and when they send it to me, they get an error saying that it is a problem with the SMTP connection:

Error detectado:	550 5.0.350 Remote server returned an error -> 550 smtp auth required 
DSN generado por:	DB7P191MB0458.EURP191.PROD.OUTLOOK.COM

I haven’t changed anything since hestia v1.4.15 and it was working fine before, is it possible that some security policy has changed?

The complete error Outlook gives is:

1	22/10/2021
9:53:36	DB8P191MB0635.EURP191.PROD.OUTLOOK.COM	DB8P191MB0635.EURP191.PROD.OUTLOOK.COM	mapi	*
2	22/10/2021
9:53:36	DB8P191MB0635.EURP191.PROD.OUTLOOK.COM	DB7P191MB0458.EURP191.PROD.OUTLOOK.COM	Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384)	*

I look forward to hearing from you, best regards

eris · October 22, 2021, 10:25am

We haven’t changed a lot regarding email since 1.4.15 only fixed an bug in Debian 11…

SMTP error 550 means:

https://portal.smartertools.com/kb/a2628/550-authentication-is-required-for-relay-when-sending-email.aspx

I don’t have a lot fo expertise with email and try to avoid as much as possible

trogvar · October 22, 2021, 10:41am

1mdpCe-000Xvo-J1 => [email protected] R=dnslookup T=remote_smtp H=ozon-ru.mail.protection.outlook.com [104.47.12.36] X=TLS1.2:ECDHE_SECP384R1__RSA_SHA256__AES_256_GCM:256 CV=yes K C=“250 2.6.0 [email protected] [InternalId=12312312312312, Hostname=AM8P195MB0882.EURP195.PROD.OUTLOOK.COM] 48011 bytes in 0.076, 612.376 KB/sec Queued mail for delivery”

I don’t have such error with MS servers.

Check usernames or correct email addresses.

dpeca · November 18, 2021, 3:49pm

Debian 11 here, the same problem with GMail with emails that has attachments.


2021-11-18 16:26:39 1mnjFV-007OIa-0A H=gmail-smtp-in.l.google.com [172.253.120.26] TLS error on connection (send): The TLS connection was non-properly terminated.
2021-11-18 16:26:39 1mnjFV-007OIa-0A H=gmail-smtp-in.l.google.com [172.253.120.26] TLS error on connection (send): The specified session has been invalidated for some reason.
2021-11-18 16:26:39 1mnjFV-007OIa-0A H=gmail-smtp-in.l.google.com [172.253.120.26]: SMTP error from remote mail server after sending data block: 250 2.0.0 OK m184si76546wme.129 - gsmtp: Connection reset by peer

Emails without attachments goes perfectly, but emails with attachments gets stuck like this in above log.

The issue dissappear when I ran:

echo "net.ipv4.tcp_window_scaling = 0" >> /etc/sysctl.conf
sysctl -p

Can someone explain what the hell is going on?

I’m on baremetal, Hetzner AX41… so there is no virtualization.

dpeca · December 18, 2021, 12:39pm

The same is happening on Hetzner Cloud too, so this is not related to virtualization, definitely.

dpeca · December 25, 2021, 3:38pm

Looks like this solves a problem too.

Without these systemctl.conf fixes, I also noticed this:

An 11 MB attachment goes flawless, but 18.5 MB attachments got stuck
FTP upload/download works without problems (even with 1GB file), so this is somehow related only to Exim
Email is not stuck when I send it to another myVesta/Hestia server - so somehow this is related only to GMail SMTP

Does anyone have any explanation of what is going on?

EDIT:
Additional note:

Problem is not occurring always - I just removed fixes from sysctl.conf, rebooted the server, sent 18.5 MB large email to gmail, and it went flawless.

So, it’s hard to reproduce the issue when it’s not clear when it will happen

I’m even not sure now if mentioned fix is really fixing the issue.

For now I’m only sure that net.ipv4.tcp_window_scaling = 0 is confirmed fix (because I tested it on many servers).

Deepak · December 25, 2021, 5:07pm

Hi dpeca,

I respectfully oppose to your confirmation of above.

Eric is 1000% correct, when he said in his message above: “Be careful with 0”.

You can imagine as an example: “If firewall is turned down, some connection will work and therefore firewall should be turned off”. In this case, you confirm that firewall should be turned off always. No, that can only be wrong and Eric can only be correct.

I had my above configuration in Centos all the time and never had problems. Because of Hestia, I divorced with Centos and jumped to Ubuntu. Therefore, I have drawbacks in many areas. But in the area of TCP connections and networking, there is many similarities.

If you have problems with a non-zero config, I suggest that you further investigate it. It will help you and not leave you - like now - exposed to a different set of problems.

The zero config is a disaster.

dpeca · December 25, 2021, 6:13pm

I didn’t test with turned off firewall, since I have a problem to reproduce the issue even with default setup.

By the way, can someone explain me what can happen if we use net.ipv4.tcp_window_scaling = 0 ?

dpeca · December 25, 2021, 6:58pm

I will test next 10 servers with @Deepak suggested sysctl fix, so I will monitoring exim4 log and see if timeout shows up.
I have no other ideas how to be sure if something fixes this issue.

Deepak · December 25, 2021, 7:05pm

The firewall example in my message above was simply to tell you (by this example) that one cannot generalize from one area of problem with a different area of problem.

Scaling means magnification (увеличавање). Check what happens if you turn scaling off in the following articles:

A good explanation is here:

What is tcp window scaling?

See what happens when it is disabled and enabled here:

TCP window scaling before and after

So I think it is possible that one has to tune something in there with scaling instead of turning it off. But I am really not sure…

Deepak · December 25, 2021, 7:55pm

One more suggestion on how to tweak TCP connection and make the kernel more efficient. See here:

Tuning the Linux kernel for more aggressive network throughput

In cli type:

sysctl -a | grep net.core

Then see what are the values of rmem/ wmem. Then change it and find out if the changed values make things better.

Deepak · December 25, 2021, 8:08pm

The above finding means that there is a problem with theb receive window, when the sender had streamed datapackets but the receiver faild to handle it.

So I suggest that you increase the receiver wndow by scaling.

Read here on how to do this:

Read below the title —> Bandwidth Delay Product

Try by increasing the read value max. for default, reboot and see if this helps!

NB: DO NOT USE (AS MENTIOONED) IN THERE —> echo '16777216' > /proc/sys/net/core/rmem_max. CHANGE THESE VALUES IN sysctl.conf.

And you also need to make sure with the following command that the core and the ipv4 are in congruence and there is no bottleneck:

sysctl -a | grep net.core

./.

sysctl -a | grep net.ipv4.tcp_rmem
sysctl -a | grep net.ipv4.tcp_wmem

dpeca · December 25, 2021, 10:41pm

@Deepak
You are probably right, today I saw on one page (related to gmail timeout) that somebody also mentioned rmem_max.
I will compare values from Debian 10, where everything is fine.
Will write here if I conclude something.
Thanks a lot for help.

dpeca · December 25, 2021, 11:03pm

debian 10 left / debian 11 right

dpeca · December 26, 2021, 12:59am

I will test future servers with:

net.ipv4.tcp_window_scaling = 1
net.ipv4.tcp_syncookies = 1
net.core.rmem_max = 16777216
net.core.wmem_max = 16777216
net.ipv4.tcp_rmem = 4096 87380 16777216
net.ipv4.tcp_wmem = 4096 65536 16777216

I will write here results.