123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382 |
- ===================================================
- celery - Distributed Task Queue for Django/Python
- ===================================================
- :Version: 0.7.0
- Introduction
- ============
- **NOTE:** See the FAQ for information about using celery outside of Django.
- ``celery`` is a distributed task queue framework for Django/Python.
- It is used for executing tasks *asynchronously*, routed to one or more
- worker servers, running concurrently using multiprocessing.
- It is designed to solve certain problems related to running websites
- demanding high-availability and performance.
- It is perfect for filling caches, posting updates to twitter, mass
- downloading data like syndication feeds or web scraping. Use-cases are
- plentiful. Implementing these features asynchronously using ``celery`` is
- easy and fun, and the performance improvements can make it more than
- worthwhile.
- Overview
- ========
- This is a high level overview of the architecture.
- .. image:: http://cloud.github.com/downloads/ask/celery/Celery-Overview-v4.jpg
- The broker is an AMQP server pushing tasks to the worker servers.
- A worker server is a networked machine running ``celeryd``. This can be one or
- more machines, depending on the workload. See `A look inside the worker`_ to
- see how the worker server works.
- The result of the task can be stored for later retrieval (called its
- "tombstone").
- Features
- ========
- * Uses AMQP messaging (RabbitMQ, ZeroMQ, Qpid) to route tasks to the
- worker servers. Experimental support for STOMP (ActiveMQ) is also
- available.
- * You can run as many worker servers as you want, and still
- be *guaranteed that the task is only executed once.*
- * Tasks are executed *concurrently* using the Python 2.6
- ``multiprocessing`` module (also available as a back-port
- to older python versions)
- * Supports *periodic tasks*, which makes it a (better) replacement
- for cronjobs.
- * When a task has been executed, the return value can be stored using
- either a MySQL/Oracle/PostgreSQL/SQLite database, Memcached,
- or Tokyo Tyrant back-end. For high-performance you can also use
- AMQP to publish results.
- * If the task raises an exception, the exception instance is stored,
- instead of the return value.
- * All tasks has a Universally Unique Identifier (UUID), which is the
- task id, used for querying task status and return values.
- * Tasks can be retried if they fail, with a configurable maximum number
- of retries.
- * Tasks can be configured to run at a specific time and date in the
- future (ETA) or you can set a countdown in seconds for when the
- task should be executed.
- * Supports *task-sets*, which is a task consisting of several sub-tasks.
- You can find out how many, or if all of the sub-tasks has been executed.
- Excellent for progress-bar like functionality.
- * Has a ``map`` like function that uses tasks, called ``dmap``.
- * However, you rarely want to wait for these results in a web-environment.
- You'd rather want to use Ajax to poll the task status, which is
- available from a URL like ``celery/<task_id>/status/``. This view
- returns a JSON-serialized data structure containing the task status,
- and the return value if completed, or exception on failure.
- * The worker can collect statistics, like, how many tasks has been
- executed by type, and the time it took to process them. Very useful
- for monitoring and profiling.
- * Pool workers are supervised, so if for some reason a worker crashes
- it is automatically replaced by a new worker.
- * Can be configured to send e-mails to the administrators when a task
- fails.
- API Reference Documentation
- ===========================
- The `API Reference`_ is hosted at Github
- (http://ask.github.com/celery)
- .. _`API Reference`: http://ask.github.com/celery/
- Installation
- =============
- You can install ``celery`` either via the Python Package Index (PyPI)
- or from source.
- To install using ``pip``,::
- $ pip install celery
- To install using ``easy_install``,::
- $ easy_install celery
- Downloading and installing from source
- --------------------------------------
- Download the latest version of ``celery`` from
- http://pypi.python.org/pypi/celery/
- You can install it by doing the following,::
- $ tar xvfz celery-0.0.0.tar.gz
- $ cd celery-0.0.0
- $ python setup.py build
- # python setup.py install # as root
- Using the development version
- ------------------------------
- You can clone the repository by doing the following::
- $ git clone git://github.com/ask/celery.git
- Usage
- =====
- Installing RabbitMQ
- -------------------
- See `Installing RabbitMQ`_ over at RabbitMQ's website. For Mac OS X
- see `Installing RabbitMQ on OS X`_.
- .. _`Installing RabbitMQ`: http://www.rabbitmq.com/install.html
- .. _`Installing RabbitMQ on OS X`:
- http://playtype.net/past/2008/10/9/installing_rabbitmq_on_osx/
- Setting up RabbitMQ
- -------------------
- To use celery we need to create a RabbitMQ user, a virtual host and
- allow that user access to that virtual host::
- $ rabbitmqctl add_user myuser mypassword
- $ rabbitmqctl add_vhost myvhost
- From RabbitMQ version 1.6.0 and onward you have to use the new ACL features
- to allow access::
- $ rabbitmqctl set_permissions -p myvhost myuser "" ".*" ".*"
- See the RabbitMQ `Admin Guide`_ for more information about `access control`_.
- .. _`Admin Guide`: http://www.rabbitmq.com/admin-guide.html
- .. _`access control`: http://www.rabbitmq.com/admin-guide.html#access-control
- If you are still using version 1.5.0 or below, please use ``map_user_vhost``::
- $ rabbitmqctl map_user_vhost myuser myvhost
- Configuring your Django project to use Celery
- ---------------------------------------------
- You only need three simple steps to use celery with your Django project.
- 1. Add ``celery`` to ``INSTALLED_APPS``.
- 2. Create the celery database tables::
- $ python manage.py syncdb
- 3. Configure celery to use the AMQP user and virtual host we created
- before, by adding the following to your ``settings.py``::
- AMQP_SERVER = "localhost"
- AMQP_PORT = 5672
- AMQP_USER = "myuser"
- AMQP_PASSWORD = "mypassword"
- AMQP_VHOST = "myvhost"
- That's it.
- There are more options available, like how many processes you want to process
- work in parallel (the ``CELERY_CONCURRENCY`` setting), and the backend used
- for storing task statuses. But for now, this should do. For all of the options
- available, please consult the `API Reference`_
- **Note**: If you're using SQLite as the Django database back-end,
- ``celeryd`` will only be able to process one task at a time, this is
- because SQLite doesn't allow concurrent writes.
- Running the celery worker server
- --------------------------------
- To test this we'll be running the worker server in the foreground, so we can
- see what's going on without consulting the logfile::
- $ python manage.py celeryd
- However, in production you probably want to run the worker in the
- background, as a daemon::
- $ python manage.py celeryd --detach
- For a complete listing of the command line arguments available, with a short
- description, you can use the help command::
- $ python manage.py help celeryd
- Defining and executing tasks
- ----------------------------
- **Please note** All of these tasks has to be stored in a real module, they can't
- be defined in the python shell or ipython/bpython. This is because the celery
- worker server needs access to the task function to be able to run it.
- So while it looks like we use the python shell to define the tasks in these
- examples, you can't do it this way. Put them in the ``tasks`` module of your
- Django application. The worker server will automatically load any ``tasks.py``
- file for all of the applications listed in ``settings.INSTALLED_APPS``.
- Executing tasks using ``delay`` and ``apply_async`` can be done from the
- python shell, but keep in mind that since arguments are pickled, you can't
- use custom classes defined in the shell session.
- While you can use regular functions, the recommended way is to define
- a task class. This way you can cleanly upgrade the task to use the more
- advanced features of celery later.
- This is a task that basically does nothing but take some arguments,
- and return a value:
- >>> from celery.task import Task
- >>> from celery.registry import tasks
- >>> class MyTask(Task):
- ... def run(self, some_arg, **kwargs):
- ... logger = self.get_logger(**kwargs)
- ... logger.info("Did something: %s" % some_arg)
- ... return 42
- >>> tasks.register(MyTask)
- Now if we want to execute this task, we can use the ``delay`` method of the
- task class (this is a handy shortcut to the ``apply_async`` method which gives
- you greater control of the task execution).
- >>> from myapp.tasks import MyTask
- >>> MyTask.delay(some_arg="foo")
- At this point, the task has been sent to the message broker. The message
- broker will hold on to the task until a celery worker server has successfully
- picked it up.
- *Note* If everything is just hanging when you execute ``delay``, please check
- that RabbitMQ is running, and that the user/password has access to the virtual
- host you configured earlier.
- Right now we have to check the celery worker logfiles to know what happened with
- the task. This is because we didn't keep the ``AsyncResult`` object returned
- by ``delay``.
- The ``AsyncResult`` lets us find the state of the task, wait for the task to
- finish and get its return value (or exception if the task failed).
- So, let's execute the task again, but this time we'll keep track of the task:
- >>> result = MyTask.delay("do_something", some_arg="foo bar baz")
- >>> result.ready() # returns True if the task has finished processing.
- False
- >>> result.result # task is not ready, so no return value yet.
- None
- >>> result.get() # Waits until the task is done and return the retval.
- 42
- >>> result.result
- 42
- >>> result.successful() # returns True if the task didn't end in failure.
- True
- If the task raises an exception, the ``result.success()`` will be ``False``,
- and ``result.result`` will contain the exception instance raised.
- Auto-discovery of tasks
- -----------------------
- ``celery`` has an auto-discovery feature like the Django Admin, that
- automatically loads any ``tasks.py`` module in the applications listed
- in ``settings.INSTALLED_APPS``. This autodiscovery is used by the celery
- worker to find registered tasks for your Django project.
- Periodic Tasks
- ---------------
- Periodic tasks are tasks that are run every ``n`` seconds.
- Here's an example of a periodic task:
- >>> from celery.task import PeriodicTask
- >>> from celery.registry import tasks
- >>> from datetime import timedelta
- >>> class MyPeriodicTask(PeriodicTask):
- ... run_every = timedelta(seconds=30)
- ...
- ... def run(self, **kwargs):
- ... logger = self.get_logger(**kwargs)
- ... logger.info("Running periodic task!")
- ...
- >>> tasks.register(MyPeriodicTask)
- **Note:** Periodic tasks does not support arguments, as this doesn't
- really make sense.
- A look inside the worker
- ========================
- .. image:: http://cloud.github.com/downloads/ask/celery/InsideTheWorker-v2.jpg
- Getting Help
- ============
- Mailing list
- ------------
- For discussions about the usage, development, and future of celery,
- please join the `celery-users`_ mailing list.
- .. _`celery-users`: http://groups.google.com/group/celery-users/
- IRC
- ---
- Come chat with us on IRC. The `#celery`_ channel is located at the `Freenode`_
- network.
- .. _`#celery`: irc://irc.freenode.net/celery
- .. _`Freenode`: http://freenode.net
- Bug tracker
- ===========
- If you have any suggestions, bug reports or annoyances please report them
- to our issue tracker at http://github.com/ask/celery/issues/
- Contributing
- ============
- Development of ``celery`` happens at Github: http://github.com/ask/celery
- You are highly encouraged to participate in the development
- of ``celery``. If you don't like Github (for some reason) you're welcome
- to send regular patches.
- License
- =======
- This software is licensed under the ``New BSD License``. See the ``LICENSE``
- file in the top distribution directory for the full license text.
- .. # vim: syntax=rst expandtab tabstop=4 shiftwidth=4 shiftround
|