Forums

Problem with multiprocessing module

I get the following error when I try to do stuff with the multiprocessing module. More specifically I can not create a Queue:

21:40 ~ $ python
Python 2.6.6 (r266:84292, Dec 26 2010, 22:31:48)
[GCC 4.4.5] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> from multiprocessing import Queue
>>> q = Queue()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/python2.6/multiprocessing/__init__.py", line 213, in Queue
return Queue(maxsize)
File "/usr/lib/python2.6/multiprocessing/queues.py", line 37, in __init__
self._rlock = Lock()
File "/usr/lib/python2.6/multiprocessing/synchronize.py", line 117, in __init__
SemLock.__init__(self, SEMAPHORE, 1, 1)
File "/usr/lib/python2.6/multiprocessing/synchronize.py", line 49, in __init__
sl = self._semlock = _multiprocessing.SemLock(kind, value, maxvalue)
OSError: [Errno 38] Function not implemented

Hi David,

we've got a fix for that bug under test now. Should be deploying it v. soon.

For info, it's because multiprocessing requires /dev/shm, which we hadn't implemented for users yet...

"v. soon" is actually right now, should be live within the hour. I'll double-check your code works once it's there.

OK, that should be working now. Do let us know how you get on - we've implemented quite a restrictive /dev/shm, which only has 1MB available - we think it should be fine for most needs wrt multiprocessing (it's been fine for ours), but we're keen to hear from our users' real-world applications...

Thanks for the quick response and fix.

I am now getting a different error and I can't figure out if it is because I have zombie processes hanging around. I don't have access to ps. Here are the errors I am getting:

Traceback (most recent call last):
File "subreddit_scraper.py", line 89, in <module>
process.start()
File "/usr/lib/python2.6/multiprocessing/process.py", line 104, in start
self._popen = Popen(self)
File "/usr/lib/python2.6/multiprocessing/forking.py", line 94, in __init__
self.pid = os.fork()
OSError: [Errno 11] Resource temporarily unavailable

Process Process-1:
Traceback (most recent call last):
File "/usr/lib/python2.6/multiprocessing/process.py", line 232, in _bootstrap
self.run()
File "/usr/lib/python2.6/multiprocessing/process.py", line 88, in run
self._target(*self._args, **self._kwargs)
File "subreddit_scraper.py", line 69, in subreddit_processor
q.put(([[d for d in data if data_filter(d)] for data in items], insert_string))
File "/usr/lib/python2.6/multiprocessing/queues.py", line 81, in put
self._start_thread()
File "/usr/lib/python2.6/multiprocessing/queues.py", line 161, in _start_thread
self._thread.start()
File "/usr/lib/python2.6/threading.py", line 474, in start
_start_new_thread(self.__bootstrap, ())
error: can't start new thread

Never mind. It looks like there is a limit on the number of active processes. So I've instrumented my script to not start more than 5 processes at a time. It seems to make the problem go away.

It would be nice to have a clearer idea on what our resource limits are. Could you give us an indication of things such as:

  • Memory
  • process count limit
  • thread count limit
  • what each processor core performs equivalent to.
  • any other things you would care to point out that I may be missing here.

-TIA

a2j are you asking me or the staff? It's not clear from your question.

Probably asking us David, because you would have to possess psychic powers to figure them out!

Processes limits are in place to prevent forkbombs. Registered users have a higher limit. The limit is currently set at 128. But in effect that doesn't help you figure this stuff out much because any action you perform might use an 5 - 10 processes doing other stuff that you are unaware of.

We are hoping to abstract this kind of stuff out really. Of course it does bubble up sometimes. So we need to put our minds to figuring out how to prevent that from happening.

a2j - We are currently using EC2 Large Instances to power the core servers that users run processes on. So if you have a look at them they should answer most of your questions. But as we move to our own hardware and redundant vendors this will change.

@hansel Thanks for the insights...:)

harry, I think /dev/shm being a mere 1 MB is tiny, and should be enlarged.

Using /dev/shm is a superb location for creating temporary files in memory (rather than disk). The burden is on the user to clean up, though. It is very useful in any program, not just multiprocessing, for speed-up.

A lot of programs use /dev/shm instead of /tmp for that reason.

I volunteer my basement for a PA redundancy data center/point of presence!!!!!!!!!

That's the kind of user commitment that we like to see. A PAW representative will be around to inspect your basement.

@rsvp - We are open to increasing the size. But we will need to monitor real world usage and make sure that no issues arise.

Hi, I'm having the same issues @davidk01 had about 4 years ago

I'm making use of the multiprocessing package and I'm using a max of 5 Pools. I reduced the pool size to 1 in an attempt to reduce the overall number of processes but this clearly isn't the way to go. I keep getting this error:

Traceback (most recent call last):
  File "/home/progen/.virtualenvs/progen/local/lib/python2.7/site-packages/flask/app.py", line 1817, in wsgi_app
    response = self.full_dispatch_request()
  File "/home/progen/.virtualenvs/progen/local/lib/python2.7/site-packages/flask/app.py", line 1477, in full_dispatch_request
    rv = self.handle_user_exception(e)
  File "/home/progen/.virtualenvs/progen/local/lib/python2.7/site-packages/flask/app.py", line 1381, in handle_user_exception
    reraise(exc_type, exc_value, tb)
  File "/home/progen/.virtualenvs/progen/local/lib/python2.7/site-packages/flask/app.py", line 1475, in full_dispatch_request
    rv = self.dispatch_request()
  File "/home/progen/.virtualenvs/progen/local/lib/python2.7/site-packages/flask/app.py", line 1461, in dispatch_request
    return self.view_functions[rule.endpoint](**req.view_args)
  File "/home/progen/pro-gen/api.py", line 67, in fetch_single_profile
    data = {'data': executr.fetch_profile()}
  File "/home/progen/pro-gen/scripts/executr.py", line 71, in fetch_profile
    return fetch_multiprocessed_chunks(num_profiles)
  File "/home/progen/pro-gen/scripts/executr.py", line 90, in fetch_multiprocessed_chunks
    pool = Pool(pool_size)
  File "/usr/lib/python2.7/multiprocessing/__init__.py", line 232, in Pool
    return Pool(processes, initializer, initargs, maxtasksperchild)
  File "/usr/lib/python2.7/multiprocessing/pool.py", line 177, in __init__
    self._task_handler.start()
  File "/usr/lib/python2.7/threading.py", line 745, in start
    _start_new_thread(self.__bootstrap, ())
error: can't start new thread

-----------------------

Traceback (most recent call last):
  File "/home/progen/.virtualenvs/progen/local/lib/python2.7/site-packages/flask/app.py", line 1817, in wsgi_app
    response = self.full_dispatch_request()
  File "/home/progen/.virtualenvs/progen/local/lib/python2.7/site-packages/flask/app.py", line 1477, in full_dispatch_request
    rv = self.handle_user_exception(e)
  File "/home/progen/.virtualenvs/progen/local/lib/python2.7/site-packages/flask/app.py", line 1381, in handle_user_exception
    reraise(exc_type, exc_value, tb)
  File "/home/progen/.virtualenvs/progen/local/lib/python2.7/site-packages/flask/app.py", line 1475, in full_dispatch_request
    rv = self.dispatch_request()
  File "/home/progen/.virtualenvs/progen/local/lib/python2.7/site-packages/flask/app.py", line 1461, in dispatch_request
    return self.view_functions[rule.endpoint](**req.view_args)
  File "/home/progen/pro-gen/api.py", line 55, in fetch_profiles
    res.update({'data': executr.fetch_profile(profile_count)})
  File "/home/progen/pro-gen/scripts/executr.py", line 71, in fetch_profile
    return fetch_multiprocessed_chunks(num_profiles)
  File "/home/progen/pro-gen/scripts/executr.py", line 90, in fetch_multiprocessed_chunks
    pool = Pool(pool_size)
  File "/usr/lib/python2.7/multiprocessing/__init__.py", line 232, in Pool
    return Pool(processes, initializer, initargs, maxtasksperchild)
  File "/usr/lib/python2.7/multiprocessing/pool.py", line 159, in __init__
    self._repopulate_pool()
  File "/usr/lib/python2.7/multiprocessing/pool.py", line 223, in _repopulate_pool
    w.start()
  File "/usr/lib/python2.7/multiprocessing/process.py", line 130, in start
    self._popen = Popen(self)
  File "/usr/lib/python2.7/multiprocessing/forking.py", line 121, in __init__
    self.pid = os.fork()
OSError: [Errno 11] Resource temporarily unavailable

I'm not sure why I'm getting this error as this works fine locally.

I'd appreciate any help please. Thanks.

We don't allow threads in web apps I'm afraid. You might want to look into using a scheduled task if you want to delegate some async work. Or if it's just something quick like sending an email, I would just do it synchronously within the request for now?

threads are not allowed in flask or at pythonanywhere ??

Thread are not allowed in web apps.

I am new for all this, so sorry if my question doesn't make sense.

Did you mean Threads are not allowed in webapps hosted by pythonanywhere or they are not allowed at any host?? If threads are not allowed anywhere in webapps, can you please just some alternate ?? My purpose is to send POST requests without waiting for their response, i.e. asychronously

Ah. I see. I can only speak for PythonAnywhere and threads are not allowed in web apps on PythonAnywhere. There are alternatives, if you need some sort of asynchronicity: http://help.pythonanywhere.com/pages/AsyncInWebApps/