In the above rmem, there is one thing I could not find an extra associative value, while I researched this aspect yesterday. This relates to timing.
It has been many years, I opened this chapter again. Before 6-8 years, I did something in this area of TCP stack tracing. I vaguely remember that the timing is one more parameter, which could control the returned value of the window that was opened and came back.
Yesterday, I could not find anything on this aspect. I assumed that this could be changed in the new kernel architecture or in Ubuntu/debian, it could be different.
So, you need to see if the scaling above value helps and if not, read further on timing of a big chunk of TCP data stream that was accepted and the communication was returned to your server. Either it has values inserted in the core or it is specific to ipv4. (But I am not sure at all, if this is necessary!)
This timing value is necessary IF AND ONLY IF the remote server is far distant and the ping timing is more. In case of Gmail, this should NOT BE A PROBLEM!
The values you compared in the image above and have marked in orange will - most likely - not affect the transmission of TCP streams, I believe. I do not know.
Tested values that @Deepak suggested.
Unfortunately, not helped.
Tested on two servers that had issue before (but issue gone with tcp_window_scaling=0 on those servers) and tested on completely new dedicated (where tcp_window_scaling=0 never tried).
With new testing values all three servers shows the problem.
On first two servers (where tcp_window_scaling=0 was configured) situation is clear:
Yesterday we had tcp_window_scaling=0, so: cat /var/log/exim4/mainlog.1 | grep 'Connection timed out' | grep 'google'
returns nothing.
In midnight we applied values from the top of this message, and restarted the server.
Todayâs log is: cat /var/log/exim4/mainlog | grep 'Connection timed out' | grep 'google'
2022-01-18 07:50:10 1n9iA3-000aUn-Sv H=gmail-smtp-in.l.google.com [74.125.140.27]: SMTP timeout after end of data (168113 bytes written): Connection timed out
2022-01-18 08:00:11 1n9iA3-000aUn-Sv H=alt1.gmail-smtp-in.l.google.com [142.251.9.26]: SMTP timeout after end of data (168113 bytes written): Connection timed out
2022-01-18 08:10:12 1n9iA3-000aUn-Sv H=alt2.gmail-smtp-in.l.google.com [142.250.150.27]: SMTP timeout after end of data (168113 bytes written): Connection timed out
2022-01-18 08:51:27 1n9j5a-000ePy-DG H=gmail-smtp-in.l.google.com [66.102.1.27]: SMTP timeout after sending data block (468497 bytes written): Connection timed out
2022-01-18 08:52:59 1n9j76-000edw-E8 H=gmail-smtp-in.l.google.com [66.102.1.27]: SMTP timeout after sending data block (468525 bytes written): Connection timed out
2022-01-18 09:04:52 1n9j76-000edw-E8 H=alt1.gmail-smtp-in.l.google.com [142.251.9.26]: SMTP timeout after sending data block (460334 bytes written): Connection timed out
2022-01-18 09:20:06 1n9jZ6-000gSI-Bv H=gmail-smtp-in.l.google.com [74.125.140.27]: SMTP timeout after end of data (167036 bytes written): Connection timed out
2022-01-18 09:30:07 1n9jZ6-000gSI-Bv H=alt1.gmail-smtp-in.l.google.com [142.251.9.27]: SMTP timeout after end of data (167036 bytes written): Connection timed out
2022-01-18 09:55:11 1n9k73-000ioH-Jq H=gmail-smtp-in.l.google.com [172.253.120.27]: SMTP timeout after end of data (161097 bytes written): Connection timed out
2022-01-18 10:00:08 1n9kBq-000j8S-Nn H=gmail-smtp-in.l.google.com [172.253.120.27]: SMTP timeout after end of data (162096 bytes written): Connection timed out
2022-01-18 10:05:12 1n9k73-000ioH-Jq H=alt1.gmail-smtp-in.l.google.com [142.251.9.27]: SMTP timeout after end of data (161097 bytes written): Connection timed out
2022-01-18 10:10:09 1n9kBq-000j8S-Nn H=alt1.gmail-smtp-in.l.google.com [142.251.9.26]: SMTP timeout after end of data (162096 bytes written): Connection timed out
2022-01-18 10:15:12 1n9k73-000ioH-Jq H=alt2.gmail-smtp-in.l.google.com [142.250.150.27]: SMTP timeout after end of data (161097 bytes written): Connection timed out
2022-01-18 10:20:10 1n9kBq-000j8S-Nn H=alt2.gmail-smtp-in.l.google.com [142.250.150.26]: SMTP timeout after end of data (162096 bytes written): Connection timed out
2022-01-18 10:30:13 1n9kBq-000j8S-Nn H=alt3.gmail-smtp-in.l.google.com [74.125.200.26]: SMTP timeout after end of data (162096 bytes written): Connection timed out
2022-01-18 10:40:15 1n9kBq-000j8S-Nn H=alt4.gmail-smtp-in.l.google.com [142.250.157.27]: SMTP timeout after end of data (162096 bytes written): Connection timed out
2022-01-18 10:40:15 1n9kBq-000j8S-Nn == ******@gmail.com R=dnslookup T=remote_smtp defer (110): Connection timed out H=alt4.gmail-smtp-in.l.google.com [142.250.157.27]: SMTP timeout after end of data (162096 bytes written)
2022-01-18 10:43:58 1n9ksH-000m5s-88 H=gmail-smtp-in.l.google.com [66.102.1.27]: SMTP timeout after end of data (352932 bytes written): Connection timed out
etcâŚ
The same errors on third server (where tcp_window_scaling=0 never tried, so we tested with values that @Deepak suggested on completely fresh server).
@dpcea
The only thing is that scaling window to zero may open DDOS attacks, if I am not mistaken. As someone could send large packets. Then a server could get into trouble leading to DDOS. This may not always happen and I may be badly wrong here too. My knowledge is in this area superficial and outdated.
You have confirmed that 11.5 MB attachments goes through with default values but 18 MB ones remains in waiting list giving time outs. You have tested scaling the window to higher values.
Did you try lower values of rmem (/wmem) to test that the opposite is untrue? A Kernel expert could be helpful here.
It appears tuning the window properly with values. Even here, I could be badly wrong as I never had to do with these things.
@Deepak
I understood that tcp_window_scaling=0 can only limit the speed of traffic - as you linked here - TCP Window Scaling | qa | cafe
But since almost all cloud/dedicated servers comes with 1gbit/sec connection, itâs a questionable how much it affects the speed.
I didnât tested with lower values of rmem/wmem.
Problem is that this is hardly reproducible problem - sometimes it get stuck with 200 KB email, and sometimes it works flawless even with 18 MB emails.
I donât know how large emails my customers sent, but since I see timeouts on every few minutes, I feel that even smaller messages get stuck.
But still I believe that larger messages have more chances to get stuck⌠based on my tests.
I had a similar problem on Ubuntu 20.04, HestiaCP 1.5.7, being able to send and receive between mail accounts hosted locally on the same server, but being unable to send to remote mail servers such as gmail.com, outlook.com and zoho.com.
Turned out to be caused by a combination of the csf firewall SMTP configuration being too restrictive on the server, and by using customer domains for SMTP instead of the primary mail server domain on the client.
I solved the problem by
adding âDebian-eximâ to the list at SMTP_ALLOWUSER in /etc/csf.conf on the server, as follows:
2.Using the primary domain (the one that matches the hostname on a dedicated mail server) for outgoing mail via SMTP, and not the customer domain, on the client.
As a consequence of this, I have to ignore the âSMTP Hostnameâ parameter offered by HestiaCP for customer mail account access, and use the primary hostname instead.
Other contextual constraints are: a) only the primary domain has an rDNS record provided by the VPS host, and b) the domain name registrar does not provide for PTR records.
Iâve got bad knews everyone.
From 2.2.2022 got this errors back with another remote server.
Same simptoms but tcp.windows.scaling still 1 and sending to gmail is working.
When sending emails to gmail accounts with attachments they take a long period of time to be delivered, sometimes they are delivered and sometimes they donât. It seems the holding time is random and the size of the attachment doesnât matter.
I followed all the thread here and did a few tests where I concluded that setting net.ipv4.tcp_window_scaling = 0 fixes the problem but I donât want to keep it that way once is a security issue. If I disable the firewall it also works fine but couldnât find any relation to the existing hestia rules so I guess when disabling the firewall a specific port and/or UDP not on hestia rules becomes open and solves the issue.
However I am still not understanding where exactly is the problem and how to properly fix it.
@manuelserol
Unfortunately, I didnât find any other solution except net.ipv4.tcp_window_scaling = 0
@Deepak suggested me to test with lower values of rmem/wmem, but I have no idea in what value range I should experiment.
Maybe @Deepak can suggest us lower and upper values, Iâm ready to do testings, just in order to find some normal solution for this craziness.
@dpeca I found a possible workaround here but unfortunately it is not working on my servers. A few people says that works for them but I guess they donât use hestia. Donât know if hestia has anything to do with this, just figured out that disabling iptables also âfixesâ it but canât really get why.
If you try this one just let me know the result please.
Did you have any issues by keeping the tcp_window_scaling disabled for a long period of time? Like performance and/or security issues?
@manuelserol in last 12 months I setup at least 20 servers with net.ipv4.tcp_window_scaling = 0 without any issues. Those servers are under my active managing, so customers would report issues to me, but absolutely nobody noticed anything strange.
I will test with hosts_try_fastopen = !*.l.google.com
Thanks for the tip.
Fix confirmed.
Solution is hosts_try_fastopen = !*.l.google.com in /etc/exim4/exim4.conf.template
After you add that line there, restart exim: systemctl restart exim4
net.ipv4.tcp_window_scaling = 0 finally removed and server was rebooted to assure that window_scalling is not disabled anymore.
Two days on production server without any issue.
@dpeca thatâs good news, after adding that line to /etc/exim4/exim4.conf.template it also works on my side.
It was not working for me because I was applying it to /etc/exim4/conf.d/transport/30_exim4-config_remote_smtp as mentioned on the link I shared, but it seems hestia does not use those files and I was not aware of that