Forums

use multiprocessing module to handle a large file in python

The following is part of my code:

if name == 'main':

pool = mp.Pool(mp.cpu_count())
print("cpu counts:" + str(mp.cpu_count()))
with open("./test.txt") as f:
    nextLineByte = f.tell()
    for line in iter(f.readline, ''):
      print("nextLineByte"+str(nextLineByte))
      pool.apply_async(processWrapper, args=(nextLineByte,f) )
      nextLineByte = f.tell()
pool.close()
pool.join()

When I run this code, it got a results. Yet, it is not what I need. The issue is with this line:

      pool.apply_async(processWrapper, args=(nextLineByte,f) )

It looks like the target function processWrapper is not reached at all, which is really confusing. Any comments are greatly appreciated.

You're not waiting for or getting the results from the apply_async call before you close the pool. That probably means that the pool is destroyed before the processes have managed to start. apply_async returns an AsyncResult and you need to wait for the result: https://docs.python.org/3/library/multiprocessing.html#multiprocessing.pool.AsyncResult