Gmail SMTP timeout after sending data block, Connection timed out

Deepak · December 26, 2021, 5:30am

In my system, I have these values as follows:

net.core.rmem_default = 212992
net.core.rmem_max = 212992
net.ipv4.tcp_rmem = 4096 131072 6291456

I see that you have scaled up the rmem more.

In the above rmem, there is one thing I could not find an extra associative value, while I researched this aspect yesterday. This relates to timing.

It has been many years, I opened this chapter again. Before 6-8 years, I did something in this area of TCP stack tracing. I vaguely remember that the timing is one more parameter, which could control the returned value of the window that was opened and came back.

Yesterday, I could not find anything on this aspect. I assumed that this could be changed in the new kernel architecture or in Ubuntu/debian, it could be different.

So, you need to see if the scaling above value helps and if not, read further on timing of a big chunk of TCP data stream that was accepted and the communication was returned to your server. Either it has values inserted in the core or it is specific to ipv4. (But I am not sure at all, if this is necessary!)

This timing value is necessary IF AND ONLY IF the remote server is far distant and the ping timing is more. In case of Gmail, this should NOT BE A PROBLEM!

The values you compared in the image above and have marked in orange will - most likely - not affect the transmission of TCP streams, I believe. I do not know.

dpeca · January 18, 2022, 2:47pm

dpeca:

I will test future servers with:

net.ipv4.tcp_window_scaling = 1
net.ipv4.tcp_syncookies = 1
net.core.rmem_max = 16777216
net.core.wmem_max = 16777216
net.ipv4.tcp_rmem = 4096 87380 16777216
net.ipv4.tcp_wmem = 4096 65536 16777216

Tested values that @Deepak suggested.
Unfortunately, not helped.

Tested on two servers that had issue before (but issue gone with tcp_window_scaling=0 on those servers) and tested on completely new dedicated (where tcp_window_scaling=0 never tried).
With new testing values all three servers shows the problem.

On first two servers (where tcp_window_scaling=0 was configured) situation is clear:

Yesterday we had tcp_window_scaling=0, so:
cat /var/log/exim4/mainlog.1 | grep 'Connection timed out' | grep 'google'
returns nothing.

In midnight we applied values from the top of this message, and restarted the server.
Today’s log is:
cat /var/log/exim4/mainlog | grep 'Connection timed out' | grep 'google'

2022-01-18 07:50:10 1n9iA3-000aUn-Sv H=gmail-smtp-in.l.google.com [74.125.140.27]: SMTP timeout after end of data (168113 bytes written): Connection timed out
2022-01-18 08:00:11 1n9iA3-000aUn-Sv H=alt1.gmail-smtp-in.l.google.com [142.251.9.26]: SMTP timeout after end of data (168113 bytes written): Connection timed out
2022-01-18 08:10:12 1n9iA3-000aUn-Sv H=alt2.gmail-smtp-in.l.google.com [142.250.150.27]: SMTP timeout after end of data (168113 bytes written): Connection timed out
2022-01-18 08:51:27 1n9j5a-000ePy-DG H=gmail-smtp-in.l.google.com [66.102.1.27]: SMTP timeout after sending data block (468497 bytes written): Connection timed out
2022-01-18 08:52:59 1n9j76-000edw-E8 H=gmail-smtp-in.l.google.com [66.102.1.27]: SMTP timeout after sending data block (468525 bytes written): Connection timed out
2022-01-18 09:04:52 1n9j76-000edw-E8 H=alt1.gmail-smtp-in.l.google.com [142.251.9.26]: SMTP timeout after sending data block (460334 bytes written): Connection timed out
2022-01-18 09:20:06 1n9jZ6-000gSI-Bv H=gmail-smtp-in.l.google.com [74.125.140.27]: SMTP timeout after end of data (167036 bytes written): Connection timed out
2022-01-18 09:30:07 1n9jZ6-000gSI-Bv H=alt1.gmail-smtp-in.l.google.com [142.251.9.27]: SMTP timeout after end of data (167036 bytes written): Connection timed out
2022-01-18 09:55:11 1n9k73-000ioH-Jq H=gmail-smtp-in.l.google.com [172.253.120.27]: SMTP timeout after end of data (161097 bytes written): Connection timed out
2022-01-18 10:00:08 1n9kBq-000j8S-Nn H=gmail-smtp-in.l.google.com [172.253.120.27]: SMTP timeout after end of data (162096 bytes written): Connection timed out
2022-01-18 10:05:12 1n9k73-000ioH-Jq H=alt1.gmail-smtp-in.l.google.com [142.251.9.27]: SMTP timeout after end of data (161097 bytes written): Connection timed out
2022-01-18 10:10:09 1n9kBq-000j8S-Nn H=alt1.gmail-smtp-in.l.google.com [142.251.9.26]: SMTP timeout after end of data (162096 bytes written): Connection timed out
2022-01-18 10:15:12 1n9k73-000ioH-Jq H=alt2.gmail-smtp-in.l.google.com [142.250.150.27]: SMTP timeout after end of data (161097 bytes written): Connection timed out
2022-01-18 10:20:10 1n9kBq-000j8S-Nn H=alt2.gmail-smtp-in.l.google.com [142.250.150.26]: SMTP timeout after end of data (162096 bytes written): Connection timed out
2022-01-18 10:30:13 1n9kBq-000j8S-Nn H=alt3.gmail-smtp-in.l.google.com [74.125.200.26]: SMTP timeout after end of data (162096 bytes written): Connection timed out
2022-01-18 10:40:15 1n9kBq-000j8S-Nn H=alt4.gmail-smtp-in.l.google.com [142.250.157.27]: SMTP timeout after end of data (162096 bytes written): Connection timed out
2022-01-18 10:40:15 1n9kBq-000j8S-Nn == ******@gmail.com R=dnslookup T=remote_smtp defer (110): Connection timed out H=alt4.gmail-smtp-in.l.google.com [142.250.157.27]: SMTP timeout after end of data (162096 bytes written)
2022-01-18 10:43:58 1n9ksH-000m5s-88 H=gmail-smtp-in.l.google.com [66.102.1.27]: SMTP timeout after end of data (352932 bytes written): Connection timed out

etc…

The same errors on third server (where tcp_window_scaling=0 never tried, so we tested with values that @Deepak suggested on completely fresh server).

I’m becoming desperate.

Deepak · January 18, 2022, 3:39pm

@dpcea
The only thing is that scaling window to zero may open DDOS attacks, if I am not mistaken. As someone could send large packets. Then a server could get into trouble leading to DDOS. This may not always happen and I may be badly wrong here too. My knowledge is in this area superficial and outdated.

You have confirmed that 11.5 MB attachments goes through with default values but 18 MB ones remains in waiting list giving time outs. You have tested scaling the window to higher values.

Did you try lower values of rmem (/wmem) to test that the opposite is untrue? A Kernel expert could be helpful here.

It appears tuning the window properly with values. Even here, I could be badly wrong as I never had to do with these things.

dpeca · January 18, 2022, 4:03pm

@Deepak
I understood that tcp_window_scaling=0 can only limit the speed of traffic - as you linked here - TCP Window Scaling | qa | cafe
But since almost all cloud/dedicated servers comes with 1gbit/sec connection, it’s a questionable how much it affects the speed.

I didn’t tested with lower values of rmem/wmem.
Problem is that this is hardly reproducible problem - sometimes it get stuck with 200 KB email, and sometimes it works flawless even with 18 MB emails.
I don’t know how large emails my customers sent, but since I see timeouts on every few minutes, I feel that even smaller messages get stuck.
But still I believe that larger messages have more chances to get stuck… based on my tests.

dpeca · January 18, 2022, 6:41pm

@Deepak
i will test with this again on all three servers.
for some reasons it looks like the issue is gone, but let’s wait 24 hours.

rhh3591 · January 30, 2022, 3:36am

I had a similar problem on Ubuntu 20.04, HestiaCP 1.5.7, being able to send and receive between mail accounts hosted locally on the same server, but being unable to send to remote mail servers such as gmail.com, outlook.com and zoho.com.

Turned out to be caused by a combination of the csf firewall SMTP configuration being too restrictive on the server, and by using customer domains for SMTP instead of the primary mail server domain on the client.

I solved the problem by

adding ‘Debian-exim’ to the list at SMTP_ALLOWUSER in /etc/csf.conf on the server, as follows:

SMTP_BLOCK = “1”
SMTP_ALLOWLOCAL = “1”
SMTP_REDIRECT = “1”
SMTP_PORTS = “25,465,587”
SMTP_ALLOWUSER = “webmin,admin,Debian-exim”
SMTP_ALLOWGROUP = “mail,mailman”
SMTPAUTH_RESTRICT = “0”

2.Using the primary domain (the one that matches the hostname on a dedicated mail server) for outgoing mail via SMTP, and not the customer domain, on the client.

As a consequence of this, I have to ignore the ‘SMTP Hostname’ parameter offered by HestiaCP for customer mail account access, and use the primary hostname instead.

Other contextual constraints are: a) only the primary domain has an rDNS record provided by the VPS host, and b) the domain name registrar does not provide for PTR records.

trogvar · February 4, 2022, 11:31am

I’ve got bad knews everyone.
From 2.2.2022 got this errors back with another remote server.
Same simptoms but tcp.windows.scaling still 1 and sending to gmail is working.

[email protected] [email protected] R=dnslookup T=remote_smtp defer (110): Connection timed out H=mx1.server.com [111.111.111.111]: SMTP timeout after end of data (286095 bytes written)

system · March 6, 2022, 11:32am

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.

manuelserol · July 3, 2022, 3:07pm

I am facing the described issue.

When sending emails to gmail accounts with attachments they take a long period of time to be delivered, sometimes they are delivered and sometimes they don’t. It seems the holding time is random and the size of the attachment doesn’t matter.

I followed all the thread here and did a few tests where I concluded that setting net.ipv4.tcp_window_scaling = 0 fixes the problem but I don’t want to keep it that way once is a security issue. If I disable the firewall it also works fine but couldn’t find any relation to the existing hestia rules so I guess when disabling the firewall a specific port and/or UDP not on hestia rules becomes open and solves the issue.

However I am still not understanding where exactly is the problem and how to properly fix it.

I am wondering how did you guys fixed it @dpeca @trogvar @Deepak @cmstew

Any help is appreciated!

Occurring on two servers with Debian 11 (upgraded from Debian 10) and Hestia 1.6 (one upgraded from 1.5.15 and the other from 1.5.14)

dpeca · July 3, 2022, 7:59pm

@manuelserol
Unfortunately, I didn’t find any other solution except net.ipv4.tcp_window_scaling = 0

@Deepak suggested me to test with lower values of rmem/wmem, but I have no idea in what value range I should experiment.
Maybe @Deepak can suggest us lower and upper values, I’m ready to do testings, just in order to find some normal solution for this craziness.

manuelserol · July 3, 2022, 9:44pm

@dpeca I found a possible workaround here but unfortunately it is not working on my servers. A few people says that works for them but I guess they don’t use hestia. Don’t know if hestia has anything to do with this, just figured out that disabling iptables also “fixes” it but can’t really get why.

If you try this one just let me know the result please.

Did you have any issues by keeping the tcp_window_scaling disabled for a long period of time? Like performance and/or security issues?

trogvar · July 3, 2022, 9:57pm

net.ipv4.tcp_window_scaling = 0
Is the only solution for me for all this time. ;(
No issues been found since October.

dpeca · July 4, 2022, 5:50am

@manuelserol in last 12 months I setup at least 20 servers with net.ipv4.tcp_window_scaling = 0 without any issues. Those servers are under my active managing, so customers would report issues to me, but absolutely nobody noticed anything strange.

I will test with hosts_try_fastopen = !*.l.google.com
Thanks for the tip.

dpeca · July 6, 2022, 10:52am

Fix confirmed.
Solution is hosts_try_fastopen = !*.l.google.com in /etc/exim4/exim4.conf.template

After you add that line there, restart exim:
systemctl restart exim4

net.ipv4.tcp_window_scaling = 0 finally removed and server was rebooted to assure that window_scalling is not disabled anymore.
Two days on production server without any issue.

dpeca · July 6, 2022, 12:36pm

@trogvar @Deepak @manuelserol @eris

and here is the patch:

	if [ -f "/etc/exim4/exim4.conf.template" ]; then 
		check_grep=$(grep -c 'hosts_try_fastopen' /etc/exim4/exim4.conf.template)
		if [ "$check_grep" -eq 0 ]; then
					echo "=== Fixing Gmail hosts_try_fastopen in Exim4"
					FIND="dkim_strict = 0"
					ADD="  hosts_try_fastopen = \!\*.l.google.com"
					sed -i "s#$FIND#$FIND\n$ADD#g" /etc/exim4/exim4.conf.template
					systemctl restart exim4
					
					sed -i "s/net.ipv4.tcp_window_scaling/#net.ipv4.tcp_window_scaling/g" /etc/sysctl.conf
					echo 1 > /proc/sys/net/ipv4/tcp_window_scaling
		fi
	fi

manuelserol · July 6, 2022, 2:46pm

@dpeca that’s good news, after adding that line to /etc/exim4/exim4.conf.template it also works on my side.

It was not working for me because I was applying it to /etc/exim4/conf.d/transport/30_exim4-config_remote_smtp as mentioned on the link I shared, but it seems hestia does not use those files and I was not aware of that

trogvar · July 6, 2022, 3:00pm

@dpeca
Finally! I’ll try that fix right now!

trogvar · July 7, 2022, 10:08am

Confirm fix. It works! Thanks!

system · August 6, 2022, 10:09am

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.