Decided to write about queues this month because I was having a conversation about useful software practices that aren't usually taught in school, and queueing came up. Totally a delight and relatively simple to set up!

By the end of this post, a relative beginner to programming should understand the general concept, have implemented a basic queue, and have a list of material for further reading.

What is a task queue?

A task queue (or batch queue, job queue, work queue) is a list of tasks maintained and executed by a scheduler. If you've ever run a cron job before, you're halfway to understanding how people use queues.

Developers use task queues to asynchroneously handle slower tasks. After a user makes a request, a scheduler creates a task and puts it on a queue. There are workers monitoring that queue and executing one task at a time. Once that particular task is finished, the result will be stored. Task queues run on the server, and tasks are typically cued up by a request made from a script or a webapp.

For example, a student clicks a button on a web interface to upload some homework to be autograded. That cues up an upload tasks on the queue. The student sees a view (defined by the webapp) that informs her "You will get an email when your homework has been graded." She then goes off, checks Facebook, and does more homework. Or more realistically, procrastinates on Facebook. While that's happening, a worker monitoring that queue finishes all pending tasks (her classmates' homework), grades her homework, stores the results in a database, and fires off an email to the student telling her that she got 95%.

Task queues are useful for sending SMS/email notifications, running a classification model on some new data, uploading files, processing emails, or running reports.

What are the pieces of a task queue?

  • Tasks - A task represents a particular function that you want to run in the background. Any python function could be a task. A task contains a reference to the function and the arguments you want to run it with.
  • Queues - A queue is simply a list that keeps track of the tasks you want to execute, and the order in which they arrived. You can have multiple queues and assign priorities (high, medium, low) to queues.
  • Workers - A worker is a python process that reads jobs from the queue, and executes them one at a time. You can have multiple workers monitoring multiple queues.

Let's try it!

We'll be using Python-RQ and Redis to create a minimal example of a task queue. This is a very bare bones example, and more details on the actual usage of queues can be found on the resources list below.

Step 0: Install Redis

Redis backs the workers, queues, and tasks for RQ. That means all the data for the queues and results is stored in Redis. There are other options for brokering tasks and storing results! See the Celery resources to learn more.

Open up terminal, set up a new virtualenv, and get redis installed with these commands.

wget http://download.redis.io/redis-stable.tar.gz  
tar xvzf redis-stable.tar.gz  
cd redis-stable  
make  

Check to see if build works by typing the ff into terminal.

make test  
sudo make install  

Start the server with redis-server. You should see the line, "The server is now ready to accept connections on port 6379."

In a separate terminal window, run redis-cli to start the redis command line interface. You should be connected to redis and able to run commands like SET testkey "hello"

NOTE: Make sure the redis server is running on another window while you're working on the rest of this mini-tutorial!

Step 1: Set Up Your Python Environment

We'll be using Python to create the queue and interact with redis. Create a virtualenv to isolate your environment. mkdir rqtest to create a folder for the project.

Run pip install redis rq in your environment to install redis-py, python-rq and their dependencies.

Here's my pip freeze results for comparison.

Step 1: Create a Task

Inside rqtest, create a file called tasks.py and create a task. A task is just any old function! Below we're creating a task that checks a url to see if the page status is OK Response Code 200 or not.

tasks.py

import requests

def is_page_ok(url):  
    response = requests.get(url)
    if response.status_code == 200:
        return "{0} is up".format(url)
    else:
        return "{0} is not OK. Status {1}".format(url, response.status_code)

A task is just any old function!

Step 2: Put the Task on a Queue

Save tasks.py and create a new file addqueue.py inside the same rqtest directory.

addqueue.py

from rq import Queue  
from redis import Redis  
from tasks import is_page_ok 

redis_con = Redis()  
# creates a new queue named "important" 
q = Queue('important', connection=redis_con)

# put a task on the queue 
task = q.enqueue(is_page_ok,"http://stefsy.com")  
print "noted!"  

Run this file with python addqueue.py a few times to add it to the queue a few times. You should see "noted!" every time the script runs.

Step 3: Run the Worker

Now that we've queued up a few tasks onto our Redis queue, we'll want to actually start executing these tasks. Open up a new terminal tab. Make sure the rqtest environment is running, and that you're in the rqtest project folder.

To start up a worker, run rqworker important. This starts up a worker and assigns it to the 'important' queue that we created earlier. You should see the following as the worker does the tasks one at a time and shows you the results. Voila!

Step 4: Play Around

While the terminal window with the rqworker is running, edit addqueue.py and change

task = q.enqueue(is_page_ok,"http://stefsy.com")

to this

tasks = q.enqueue(is_page_ok,raw_input())

Now when you run addqueue.py, it will prompt you for a url to parse. Try adding a bunch of urls to the queue and watch how rqworker attacks those tasks.

Resources

We covered creating tasks from running Python scripts locally, but there's a lot more to it! See the resources below to learn how to use message brokers, how to schedule tasks, how to get to Django app to make a request to a queue, and so on!