Log errors BIND9 - Problem delivering SMTP emails

I am having trouble getting other DNS servers retrieve NS records from my Hestia.

I thought that It might be the cheap machine, but when I disabled it the main DNS in the cluster has connection problems.

When I query from any server I usually get responses in 5 ms

BUT

https://dnschecker.org/#MX/clientsdomain.com
https://dnschecker.org/#A/clientsdomain.com
https://dnschecker.org/#NS/clientsdomain.com

Shows me several failures over the world like 70% ok 30% not responding

The IPs of failing DNS servers of dnschecker.org are not banned by fail2ban (I checked just in the improbable case)

The thing is that I am only noticing those errors when delivering emails and the remote SMTP checks SPF and MX with no resolution. Emails get queued, deferred and rejected. Only for some recipient domains and some clients.

/var/log/exim4/mainlog

2021-02-11 14:23:50 1lAAez-0007lY-H7 == gema***@***abogados.com R=dnslookup T=remote_smtp defer (-44) H=mail.abogados.com [91.192.110.152]: SMTP error from remote mail server after RCPT TO:<gema@abogados.com>: 450 4.7.1 <gema@***abogados.com>: Recipient address rejected: SPF-Result=myclient.com: ā€˜SERVFAIL’ error on DNS ā€˜TXT’ lookup of ā€˜myclient.com’
2021-02-11 14:34:52 1lAC7N-0000Dz-RS => [email protected] R=dnslookup T=remote_smtp H=xxx-es.mail.protection.outlook.com [104.47.12.36] X=TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256 CV=yes K C="250 2.6.0 [email protected] [InternalId=38229503915433, Hostname=AM0PR10MB2626.EURPRD10.PROD.OUTLOOK.COM] 591646 bytes in 0.498, 1159.419 KB/sec Queued mail for delivery"

The majority of email is correctly delivered.

I have set DMARC to ā€œnoneā€ and double checked spf TXT record.

The relevant part of spf is: +mx and is activated

  • Question 1: Any ideas?
  • Quiestion 2: Where are bind9 logs located or how should I enable them.

logging is not enabled by default.

you can enable logging with adding following code right below include ā€œ/etc/bind/named.conf.default-zonesā€; in /etc/bind/named.conf

do not forget to mkdir /var/log/named/ before restarting bind

logging {
channel default_file {
    file "/var/log/named/default.log" versions 3 size 5m;
    severity dynamic;
    print-time yes;
};
channel general_file {
    file "/var/log/named/general.log" versions 3 size 5m;
    severity dynamic;
    print-time yes;
};
channel database_file {
    file "/var/log/named/database.log" versions 3 size 5m;
    severity dynamic;
    print-time yes;
};
channel security_file {
    file "/var/log/named/security.log" versions 3 size 5m;
    severity dynamic;
    print-time yes;
};
channel config_file {
    file "/var/log/named/config.log" versions 3 size 5m;
    severity dynamic;
    print-time yes;
};
channel resolver_file {
    file "/var/log/named/resolver.log" versions 3 size 5m;
    severity dynamic;
    print-time yes;
};
channel xfer-in_file {
    file "/var/log/named/xfer-in.log" versions 3 size 5m;
    severity dynamic;
    print-time yes;
};
channel xfer-out_file {
    file "/var/log/named/xfer-out.log" versions 3 size 5m;
    severity dynamic;
    print-time yes;
};
channel notify_file {
    file "/var/log/named/notify.log" versions 3 size 5m;
    severity dynamic;
    print-time yes;
};
channel client_file {
    file "/var/log/named/client.log" versions 3 size 5m;
    severity dynamic;
    print-time yes;
};
channel unmatched_file {
    file "/var/log/named/unmatched.log" versions 3 size 5m;
    severity dynamic;
    print-time yes;
};
channel queries_file {
    file "/var/log/named/queries.log" versions 3 size 5m;
    severity dynamic;
    print-time yes;
};
channel network_file {
    file "/var/log/named/network.log" versions 3 size 5m;
    severity dynamic;
    print-time yes;
};
channel update_file {
    file "/var/log/named/update.log" versions 3 size 5m;
    severity dynamic;
    print-time yes;
};
channel dispatch_file {
    file "/var/log/named/dispatch.log" versions 3 size 5m;
    severity dynamic;
    print-time yes;
};
channel dnssec_file {
    file "/var/log/named/dnssec.log" versions 3 size 5m;
    severity dynamic;
    print-time yes;
};
channel lame-servers_file {
    file "/var/log/named/lame-servers.log" versions 3 size 5m;
    severity dynamic;
    print-time yes;
};

category default { default_file; };
category general { general_file; };
category database { database_file; };
category security { security_file; };
category config { config_file; };
category resolver { resolver_file; };
category xfer-in { xfer-in_file; };
category xfer-out { xfer-out_file; };
category notify { notify_file; };
category client { client_file; };
category unmatched { unmatched_file; };
category queries { queries_file; };
category network { network_file; };
category update { update_file; };
category dispatch { dispatch_file; };
category dnssec { dnssec_file; };
category lame-servers { lame-servers_file; };
};

Have you set up a glue record at your domain registrar? The Domain should be setup to use your server as a nameserver with your registrar.

2 Likes

https://intodns.com/abogados.com
You do not have PTR set properly and that could cause delivery problems.

Yes, I did. And then I killed the DNS servers and renamed them and moved them to new machines. I manually updated many of the domains glue records BUT I didn’t update those that I didn’t control. That was the root cause.

Solution: I configured a new server in the old IP and set it as DNS Slave

Everything back to normal.

Thank you @Dennis it was driving me crazy.

Thank you @parzival for looking into it too. I anonnimized the logfiles so the domains are truncated that’s why it didn’t make sens the ptr part.

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.