|
@@ -5,21 +5,211 @@
|
|
|
.. contents::
|
|
|
:local:
|
|
|
|
|
|
+Introduction
|
|
|
+============
|
|
|
+
|
|
|
+There are several tools available to monitor and inspect Celery clusters.
|
|
|
+This document describes some of these, as as well as
|
|
|
+features related to monitoring, like events and broadcast commands.
|
|
|
+
|
|
|
+
|
|
|
+Monitoring and Inspecting Workers
|
|
|
+=================================
|
|
|
+
|
|
|
+celeryctl
|
|
|
+---------
|
|
|
+
|
|
|
+* Listing active nodes in the cluster
|
|
|
+ ::
|
|
|
+
|
|
|
+ $ celeryctl status
|
|
|
+
|
|
|
+* Show the result of a task
|
|
|
+ ::
|
|
|
+
|
|
|
+ $ celeryctl -t tasks.add 4e196aa4-0141-4601-8138-7aa33db0f577
|
|
|
+
|
|
|
+ Note that you can omit the name of the task as long as the
|
|
|
+ task doesn't use a custom result backend.
|
|
|
+
|
|
|
+* Listing all tasks that are currently being executed
|
|
|
+ ::
|
|
|
+
|
|
|
+ $ celeryctl inspect active
|
|
|
+
|
|
|
+* Listing scheduled ETA tasks
|
|
|
+ ::
|
|
|
+
|
|
|
+ $ celeryctl inspect scheduled
|
|
|
+
|
|
|
+ These are tasks reserved by the worker because they have the
|
|
|
+ ``eta`` or ``countdown`` argument set.
|
|
|
+
|
|
|
+* Listing reserved tasks
|
|
|
+ ::
|
|
|
+
|
|
|
+ $ celeryctl inspect reserved
|
|
|
+
|
|
|
+ This will list all tasks that have been prefetched by the worker,
|
|
|
+ and is currently waiting to be executed (does not include tasks
|
|
|
+ with an eta).
|
|
|
+
|
|
|
+* Listing the history of revoked tasks
|
|
|
+ ::
|
|
|
+
|
|
|
+ $ celeryctl inspect revoked
|
|
|
+
|
|
|
+* Show registered tasks
|
|
|
+ ::
|
|
|
+
|
|
|
+ $ celeryctl inspect registered_tasks
|
|
|
+
|
|
|
+* Showing statistics
|
|
|
+ ::
|
|
|
+
|
|
|
+ $ celeryctl inspect stats
|
|
|
+
|
|
|
+* Diagnosing the worker pools
|
|
|
+ ::
|
|
|
+
|
|
|
+ $ celeryctl inspect diagnose
|
|
|
+
|
|
|
+ This will verify that the workers pool processes are available
|
|
|
+ to do work, note that this will not work if the worker is busy.
|
|
|
+
|
|
|
+* Enabling/disabling events
|
|
|
+ ::
|
|
|
+
|
|
|
+ $ celeryctl inspect enable_events
|
|
|
+ $ celeryctl inspect disable_events
|
|
|
+
|
|
|
+
|
|
|
+By default the inspect commands operates on all workers.
|
|
|
+You can specify a single, or a list of workers by using the
|
|
|
+``--destination`` argument::
|
|
|
+
|
|
|
+ $ celeryctl inspect -d w1,w2 reserved
|
|
|
+
|
|
|
+
|
|
|
+:Note: All ``inspect`` commands supports the ``--timeout`` argument,
|
|
|
+ which is the number of seconds to wait for responses.
|
|
|
+ You may have to increase this timeout If you're getting empty responses
|
|
|
+ due to latency.
|
|
|
+
|
|
|
+Django Admin
|
|
|
+------------
|
|
|
+
|
|
|
+TODO
|
|
|
+
|
|
|
+celeryev
|
|
|
+--------
|
|
|
+
|
|
|
+TODO
|
|
|
+
|
|
|
+celerymon
|
|
|
+---------
|
|
|
+
|
|
|
+TODO
|
|
|
+
|
|
|
+Monitoring and inspecting RabbitMQ
|
|
|
+==================================
|
|
|
+
|
|
|
+To manage a Celery cluster it is important to know how
|
|
|
+RabbitMQ can be monitored.
|
|
|
+
|
|
|
+RabbitMQ ships with the `rabbitmqctl(1)`_ command,
|
|
|
+with this you can list queues, exchanges, bindings,
|
|
|
+queue lenghts, the memory usage of each queue, as well
|
|
|
+as manage users, virtual hosts and their permissions.
|
|
|
+
|
|
|
+:Note: The default virtual host (``"/"``) is used in these
|
|
|
+ examples, if you use a custom virtual host you have to add
|
|
|
+ the ``-p`` argument to the command, e.g:
|
|
|
+ ``rabbitmqctl list_queues -p my_vhost ....``
|
|
|
+
|
|
|
+
|
|
|
+.. _`rabbitmqctl(1)`: http://www.rabbitmq.com/man/rabbitmqctl.1.man.html
|
|
|
+
|
|
|
+Inspecting queues
|
|
|
+-----------------
|
|
|
+
|
|
|
+Finding the number of tasks in a queue::
|
|
|
+
|
|
|
+
|
|
|
+ $ rabbitmqctl list_queues name messages messages_ready \
|
|
|
+ messages_unacknowlged
|
|
|
+
|
|
|
+
|
|
|
+Here ``messages_ready`` is the number of messages ready
|
|
|
+for delivery (sent but not received), ``messages_unacknowledged``
|
|
|
+is the number of messages that has been received by a worker but
|
|
|
+not acknowledged yet (meaning it is in progress, or has been reserved).
|
|
|
+``messages`` is the sum of ready and unacknowledged messages combined.
|
|
|
+
|
|
|
+
|
|
|
+Finding the number of workers currently consuming from a queue::
|
|
|
+
|
|
|
+ $ rabbitmqctl list_queues name consumers
|
|
|
+
|
|
|
+Finding the amount of memory allocated to a queue::
|
|
|
+
|
|
|
+ $ rabbitmqctl list_queues name memory
|
|
|
+
|
|
|
+:Tip: Adding the ``-q`` option to `rabbitmqctl(1)`_ makes the output
|
|
|
+ easier to parse.
|
|
|
+
|
|
|
+Munin
|
|
|
+=====
|
|
|
+
|
|
|
+This is a list of known Munin plugins that can be useful when
|
|
|
+maintaining a Celery cluster.
|
|
|
+
|
|
|
+* rabbitmq-munin: Munin-plugins for RabbitMQ.
|
|
|
+
|
|
|
+ http://github.com/ask/rabbitmq-munin
|
|
|
+
|
|
|
+* celery_tasks: Monitors the number of times each task type has
|
|
|
+ been executed (requires ``celerymon``).
|
|
|
+
|
|
|
+ http://exchange.munin-monitoring.org/plugins/celery_tasks-2/details
|
|
|
+
|
|
|
+* celery_task_states: Monitors the number of tasks in each state
|
|
|
+ (requires ``celerymon``).
|
|
|
+
|
|
|
+ http://exchange.munin-monitoring.org/plugins/celery_tasks/details
|
|
|
+
|
|
|
Events
|
|
|
======
|
|
|
|
|
|
-Describe events
|
|
|
-
|
|
|
+The worker has the ability to send a message whenever some event
|
|
|
+happens. These events are then captured by tools like ``celerymon`` and
|
|
|
+``celeryev`` to monitor the cluster.
|
|
|
|
|
|
Snapshots
|
|
|
---------
|
|
|
|
|
|
-Describe snapshots
|
|
|
+Even a single worker can produce a huge amount of events, so storing
|
|
|
+the history of these events on disk may be hard.
|
|
|
+
|
|
|
+A sequence of events describes the cluster state in that time period,
|
|
|
+by taking periodic snapshots of this state we can capture all interesting
|
|
|
+information, but only periodically write it to disk.
|
|
|
+
|
|
|
+To take snapshots you need a Camera class, with this you can define
|
|
|
+what should happen every time the state is captured. You can
|
|
|
+write it to a database, send it by e-mail or something else entirely).
|
|
|
+
|
|
|
+``celeryev`` is then used to take snapshots with the camera,
|
|
|
+for example if you want to capture state every 2 seconds using the
|
|
|
+camera ``myapp.Camera`` you run ``celeryev`` with the following arguments::
|
|
|
|
|
|
+ $ celeryev -c myapp.Camera --frequency=2.0
|
|
|
|
|
|
Custom Camera
|
|
|
~~~~~~~~~~~~~
|
|
|
|
|
|
+Here is an example camera that is simply dumping the snapshot to the screen:
|
|
|
+
|
|
|
.. code-block:: python
|
|
|
|
|
|
from pprint import pformat
|
|
@@ -59,33 +249,56 @@ Or you can use it programatically like this::
|
|
|
if __name__ == "__main__":
|
|
|
main()
|
|
|
|
|
|
+Event Reference
|
|
|
+---------------
|
|
|
|
|
|
+This list contains the events sent by the worker, and their arguments.
|
|
|
|
|
|
+Task Events
|
|
|
+~~~~~~~~~~~
|
|
|
|
|
|
-Tools
|
|
|
-=====
|
|
|
+* ``task-received(uuid, name, args, kwargs, retries, eta, hostname,
|
|
|
+ timestamp)``
|
|
|
|
|
|
-celerymon
|
|
|
-=========
|
|
|
+ Sent when the worker receives a task.
|
|
|
|
|
|
-Describe celerymon
|
|
|
+* ``task-started(uuid, hostname, timestamp)``
|
|
|
|
|
|
-celeryev
|
|
|
-========
|
|
|
+ Sent just before the worker executes the task.
|
|
|
|
|
|
-Describe celeryev
|
|
|
+* ``task-succeeded(uuid, result, runtime, hostname, timestamp)``
|
|
|
|
|
|
-RabbitMQ
|
|
|
-========
|
|
|
+ Sent if the task executed successfully.
|
|
|
+ Runtime is the time it took to execute the task using the pool.
|
|
|
+ (Time starting from the task is sent to the pool, and ending when the
|
|
|
+ pool result handlers callback is called).
|
|
|
|
|
|
-Describe rabbitmq tools. rabbitmqctl, Alice, etc...
|
|
|
+* ``task-failed(uuid, exception, traceback, hostname, timestamp)``
|
|
|
|
|
|
-Django Admin
|
|
|
-============
|
|
|
+ Sent if the execution of the task failed.
|
|
|
|
|
|
-Describe the snapshot camera django-celery ships with.
|
|
|
+* ``task-revoked(uuid)``
|
|
|
|
|
|
-Munin
|
|
|
-=====
|
|
|
+ Sent if the task has been revoked (Note that this is likely
|
|
|
+ to be sent by more than one worker)
|
|
|
+
|
|
|
+* ``task-retried(uuid, exception, traceback, hostname, delay, timestamp)``
|
|
|
+
|
|
|
+ Sent if the task failed, but will be retried in the future.
|
|
|
+ (**NOT IMPLEMENTED**)
|
|
|
+
|
|
|
+Worker Events
|
|
|
+~~~~~~~~~~~~~
|
|
|
+
|
|
|
+* ``worker-online(hostname, timestamp)``
|
|
|
+
|
|
|
+ The worker has connected to the broker and is online.
|
|
|
+
|
|
|
+* ``worker-heartbeat(hostname, timestamp)``
|
|
|
+
|
|
|
+ Sent every minute, if the worker has not sent a heartbeat in 2 minutes,
|
|
|
+ it is considered to be offline.
|
|
|
+
|
|
|
+* ``worker-offline(hostname, timestamp)``
|
|
|
|
|
|
-Maintain a list of related munin plugins
|
|
|
+ The worker has disconnected from the broker.
|