I’m not very experienced with networking but I would guess most of the execution time of your script comes from communication with the DNS server, which means that your CPU is mostly just waiting for data, which means that you should be able to optimize the task by the use of multiple threads.
It is the easiest to use a ThreadPool
:
from multiprocessing.pool import ThreadPool
import socket
import dns.resolver
my_list = [
"www.google.com",
"www.facebook.com",
"doesnt.exist",
]
resolver = dns.resolver.Resolver()
resolver.nameservers=[
socket.gethostbyname("8.8.4.4"),
socket.gethostbyname("8.8.8.8"),
]
w = open("resolved.txt", "w")
x = open("not_resolved.txt", "w")
def resolve(domain):
try:
q = resolver.query(domain, "A")
for ipval in q:
print(domain, ipval, file=w)
except dns.resolver.NXDOMAIN:
print(domain, "NXDOMAIN", file=x)
except dns.resolver.NoNameservers:
print(domain, "NoNameservers", file=x)
except dns.resolver.NoAnswer:
print(domain, "NoAnswer", file=x)
except dns.name.BadEscape:
print(domain, "BadEscape", file=x)
pool = ThreadPool(processes=10) # increasing this number may speed things up
results = pool.map(resolve, my_list)
w.close()
x.close()
Results:
$ cat not_resolved.txt
doesnt.exist NXDOMAIN
$ cat resolved.txt
www.google.com 172.217.20.196
www.facebook.com 31.13.81.36
The above code doesn’t attempt to distribute the list of domains among the available DNS servers, unless the dnspython
package does it under the hood. But I would expect that even a single DNS server will respond really quickly to concurrent queries, because it probably uses multiple threads itself.
CLICK HERE to find out more related problems solutions.