Due to GIL a python interpreter can’t run more than one python code at any given time.
But, other than that, You’re appending
to list.
>>> import timeit
>>> def appending():
... output = []
... for i in range(1000000):
... output.append(i)
... return output
...
>>> def gen_exp():
... return [i for i in range(1000000)]
>>> print(f"{timeit.timeit(appending, number=100):.2}")
8.1
>>> print(f"{timeit.timeit(gen_exp, number=100):.2}")
5.2
This slow nature of appending to the list
is best shown on readline/readlines performance differances.
Without those, normally time-benchmarking would be simplified as following.
import math
import timeit
from concurrent import futures
import multiprocessing
def run(i):
i = i * math.pi
return i ** 2
def wrapper_sequential():
return [run(i) for i in range(20000)]
def wrapper_thread_pool():
with futures.ThreadPoolExecutor(max_workers=10) as exc:
fut = [exc.submit(run, i) for i in range(20000)]
output = [f.result() for f in fut]
return output
def wrapper_multiprocess():
with multiprocessing.Pool(10) as pool:
output = pool.map(run, (i for i in range(20000)))
return output
if __name__ == '__main__':
print(f"Thr: {timeit.timeit(wrapper_thread_pool, number=10):.4}")
print(f"Seq: {timeit.timeit(wrapper_sequential, number=10):.4}")
print(f"Mlt: {timeit.timeit(wrapper_multiprocess, number=10):.4}")
Thr: 5.146
Seq: 0.05411
Mlt: 4.055
Cost to create thread
is just not worth as GIL only allows a python interpreter single python code at any given moment.
For Multiprocessing
, as there is no direct way for python interpreter to communicate over processes, internally pickle
is used to serialize data for inter-process communications – this is overhead.
If calculation is heavy enough, Multiprocessing
will eventually overcome that overhead and starts getting ahead of sequential, but thread
will never.
CLICK HERE to find out more related problems solutions.