How to set up Beanstalkd with PHP

joshholat picture joshholat · Oct 11, 2011 · Viewed 7.8k times · Source

Recently I've been researching the use of Beanstalkd with PHP. I've learned quite a bit but have a few questions about the setup on a server, etc.

Here is how I see it working:

  1. I install Beanstalkd and any dependencies (such as libevent) on my Ubuntu server. I then start the Beanstalkd daemon (which should basically run at all times).
  2. Somewhere in my website (such as when a user performs some actions, etc) tasks get added to various tubes within the Beanstalkd queue.
  3. I have a bash script (such as the following one) that is run as a deamon that basically executes a PHP script.

    #!/bin/sh
    php worker.php
    

4) The worker script would have something like this to execute the queued up tasks:

while(1) {
  $job = $this->pheanstalk->watch('test')->ignore('default')->reserve();
  $job_encoded = json_decode($job->getData(), false);
  $done_jobs[] = $job_encoded;
  $this->log('job:'.print_r($job_encoded, 1));
  $this->pheanstalk->delete($job);
}

Now here are my questions based on the above setup (which correct me if I'm wrong about that):

  1. Say I have the task of importing an RSS feed into a database or something. If 10 users do this at once, they'll all be queued up in the "test" tube. However, they'd then only be executed one at a time. Would it be better to have 10 different tubes all executing at the same time?

  2. If I do need more tubes, does that then also mean that i'd need 10 worker scripts? One for each tube all running concurrently with basically the same code except for the string literal in the watch() function.

  3. If I run that script as a daemon, how does that work? Will it constantly be executing the worker.php script? That script loops until the queue is empty theoretically, so shouldn't it only be kicked off once? How does the daemon decide how often to execute worker.php? Is that just a setting?

Thanks!

Answer

Alister Bulman picture Alister Bulman · Oct 11, 2011
  1. If the worker isn't taking too long to fetch the feed, it will be fine. You can run multiple workers if required to process more than one at a time. I've got a system (currently using Amazon SQS, but I've done similar with BeanstalkD before), with up to 200 (or more) workers pulling from the queue.
  2. A single worker script (the same script running multiple times) should be fine - the script can watch multiple tubes at the same time, and the first one available will be reserved. You can also use the job-stat command to see where a particular $job came from (which tube), or put some meta-information into the message if you need to tell each type from another.
  3. A good example of running a worker is described here. I've also added supervisord (also, a useful post to get started) to easily start and keep running a number of workers per machine (I run shell scripts, as in the first link). I would limit the number of times it loops, and also put a number into the reserve() to have it wait for a few seconds, or more, for the next job the become available without spinning out of control in a tight loop that does not pause at all - even if there was nothing to do.

Addendum:

  1. The shell script would be run as many times as you need. (the link show how to have it re-run as required with exec $@). Whenever the php script exits, it re-runs the PHP.
  2. Apparently there's a Djanjo app to show some stats, but it's trivial enough to connect to the daemon, get a list of tubes, and then get the stats for each tube - or just counts.