monitoring.rst 8.2 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338
  1. .. _guide-monitoring:
  2. ==================
  3. Monitoring Guide
  4. ==================
  5. .. contents::
  6. :local:
  7. Introduction
  8. ============
  9. There are several tools available to monitor and inspect Celery clusters.
  10. This document describes some of these, as as well as
  11. features related to monitoring, like events and broadcast commands.
  12. .. _monitoring-workers:
  13. Monitoring and Inspecting Workers
  14. =================================
  15. .. _monitoring-celeryctl:
  16. celeryctl
  17. ---------
  18. * Listing active nodes in the cluster
  19. ::
  20. $ celeryctl status
  21. * Show the result of a task
  22. ::
  23. $ celeryctl -t tasks.add 4e196aa4-0141-4601-8138-7aa33db0f577
  24. Note that you can omit the name of the task as long as the
  25. task doesn't use a custom result backend.
  26. * Listing all tasks that are currently being executed
  27. ::
  28. $ celeryctl inspect active
  29. * Listing scheduled ETA tasks
  30. ::
  31. $ celeryctl inspect scheduled
  32. These are tasks reserved by the worker because they have the
  33. ``eta`` or ``countdown`` argument set.
  34. * Listing reserved tasks
  35. ::
  36. $ celeryctl inspect reserved
  37. This will list all tasks that have been prefetched by the worker,
  38. and is currently waiting to be executed (does not include tasks
  39. with an eta).
  40. * Listing the history of revoked tasks
  41. ::
  42. $ celeryctl inspect revoked
  43. * Show registered tasks
  44. ::
  45. $ celeryctl inspect registered_tasks
  46. * Showing statistics
  47. ::
  48. $ celeryctl inspect stats
  49. * Diagnosing the worker pools
  50. ::
  51. $ celeryctl inspect diagnose
  52. This will verify that the workers pool processes are available
  53. to do work, note that this will not work if the worker is busy.
  54. * Enabling/disabling events
  55. ::
  56. $ celeryctl inspect enable_events
  57. $ celeryctl inspect disable_events
  58. By default the inspect commands operates on all workers.
  59. You can specify a single, or a list of workers by using the
  60. ``--destination`` argument::
  61. $ celeryctl inspect -d w1,w2 reserved
  62. :Note: All ``inspect`` commands supports the ``--timeout`` argument,
  63. which is the number of seconds to wait for responses.
  64. You may have to increase this timeout If you're getting empty responses
  65. due to latency.
  66. .. _monitoring-django-admin:
  67. Django Admin
  68. ------------
  69. TODO
  70. .. _monitoring-celeryev:
  71. celeryev
  72. --------
  73. TODO
  74. .. _monitoring-celerymon:
  75. celerymon
  76. ---------
  77. TODO
  78. .. _monitoring-rabbitmq:
  79. Monitoring and inspecting RabbitMQ
  80. ==================================
  81. To manage a Celery cluster it is important to know how
  82. RabbitMQ can be monitored.
  83. RabbitMQ ships with the `rabbitmqctl(1)`_ command,
  84. with this you can list queues, exchanges, bindings,
  85. queue lenghts, the memory usage of each queue, as well
  86. as manage users, virtual hosts and their permissions.
  87. :Note: The default virtual host (``"/"``) is used in these
  88. examples, if you use a custom virtual host you have to add
  89. the ``-p`` argument to the command, e.g:
  90. ``rabbitmqctl list_queues -p my_vhost ....``
  91. .. _`rabbitmqctl(1)`: http://www.rabbitmq.com/man/rabbitmqctl.1.man.html
  92. .. _monitoring-rmq-queues:
  93. Inspecting queues
  94. -----------------
  95. Finding the number of tasks in a queue::
  96. $ rabbitmqctl list_queues name messages messages_ready \
  97. messages_unacknowlged
  98. Here ``messages_ready`` is the number of messages ready
  99. for delivery (sent but not received), ``messages_unacknowledged``
  100. is the number of messages that has been received by a worker but
  101. not acknowledged yet (meaning it is in progress, or has been reserved).
  102. ``messages`` is the sum of ready and unacknowledged messages combined.
  103. Finding the number of workers currently consuming from a queue::
  104. $ rabbitmqctl list_queues name consumers
  105. Finding the amount of memory allocated to a queue::
  106. $ rabbitmqctl list_queues name memory
  107. :Tip: Adding the ``-q`` option to `rabbitmqctl(1)`_ makes the output
  108. easier to parse.
  109. .. _monitoring-munin:
  110. Munin
  111. =====
  112. This is a list of known Munin plugins that can be useful when
  113. maintaining a Celery cluster.
  114. * rabbitmq-munin: Munin-plugins for RabbitMQ.
  115. http://github.com/ask/rabbitmq-munin
  116. * celery_tasks: Monitors the number of times each task type has
  117. been executed (requires ``celerymon``).
  118. http://exchange.munin-monitoring.org/plugins/celery_tasks-2/details
  119. * celery_task_states: Monitors the number of tasks in each state
  120. (requires ``celerymon``).
  121. http://exchange.munin-monitoring.org/plugins/celery_tasks/details
  122. .. _monitoring-events:
  123. Events
  124. ======
  125. The worker has the ability to send a message whenever some event
  126. happens. These events are then captured by tools like ``celerymon`` and
  127. ``celeryev`` to monitor the cluster.
  128. .. _monitoring-snapshots:
  129. Snapshots
  130. ---------
  131. Even a single worker can produce a huge amount of events, so storing
  132. history of events on disk may be very expensive.
  133. A sequence of events describes the cluster state in that time period,
  134. by taking periodic snapshots of this state we can keep all history, but
  135. still only periodically write it to disk.
  136. To take snapshots you need a Camera class, with this you can define
  137. what should happen every time the state is captured. You can
  138. write it to a database, send it by e-mail or something else entirely).
  139. ``celeryev`` is then used to take snapshots with the camera,
  140. for example if you want to capture state every 2 seconds using the
  141. camera ``myapp.Camera`` you run ``celeryev`` with the following arguments::
  142. $ celeryev -c myapp.Camera --frequency=2.0
  143. .. _monitoring-camera:
  144. Custom Camera
  145. ~~~~~~~~~~~~~
  146. Here is an example camera, dumping the snapshot to the screen:
  147. .. code-block:: python
  148. from pprint import pformat
  149. from celery.events.snapshot import Polaroid
  150. class DumpCam(Polaroid):
  151. def shutter(self, state):
  152. if not state.event_count:
  153. # No new events since last snapshot.
  154. return
  155. print("Workers: %s" % (pformat(state.workers, indent=4), ))
  156. print("Tasks: %s" % (pformat(state.tasks, indent=4), ))
  157. print("Total: %s events, %s tasks" % (
  158. state.event_count, state.task_count))
  159. Now you can use this cam with ``celeryev`` by specifying
  160. it with the ``-c`` option::
  161. $ celeryev -c myapp.DumpCam --frequency=2.0
  162. Or you can use it programatically like this::
  163. from celery.events import EventReceiver
  164. from celery.messaging import establish_connection
  165. from celery.events.state import State
  166. from myapp import DumpCam
  167. def main():
  168. state = State()
  169. with establish_connection() as connection:
  170. recv = EventReceiver(connection, handlers={"*": state.event})
  171. with DumpCam(state, freq=1.0):
  172. recv.capture(limit=None, timeout=None)
  173. if __name__ == "__main__":
  174. main()
  175. .. _event-reference:
  176. Event Reference
  177. ---------------
  178. This list contains the events sent by the worker, and their arguments.
  179. .. _event-reference-task:
  180. Task Events
  181. ~~~~~~~~~~~
  182. * ``task-received(uuid, name, args, kwargs, retries, eta, hostname,
  183. timestamp)``
  184. Sent when the worker receives a task.
  185. * ``task-started(uuid, hostname, timestamp)``
  186. Sent just before the worker executes the task.
  187. * ``task-succeeded(uuid, result, runtime, hostname, timestamp)``
  188. Sent if the task executed successfully.
  189. Runtime is the time it took to execute the task using the pool.
  190. (Time starting from the task is sent to the pool, and ending when the
  191. pool result handlers callback is called).
  192. * ``task-failed(uuid, exception, traceback, hostname, timestamp)``
  193. Sent if the execution of the task failed.
  194. * ``task-revoked(uuid)``
  195. Sent if the task has been revoked (Note that this is likely
  196. to be sent by more than one worker)
  197. * ``task-retried(uuid, exception, traceback, hostname, delay, timestamp)``
  198. Sent if the task failed, but will be retried in the future.
  199. (**NOT IMPLEMENTED**)
  200. .. _event-reference-worker:
  201. Worker Events
  202. ~~~~~~~~~~~~~
  203. * ``worker-online(hostname, timestamp)``
  204. The worker has connected to the broker and is online.
  205. * ``worker-heartbeat(hostname, timestamp)``
  206. Sent every minute, if the worker has not sent a heartbeat in 2 minutes,
  207. it is considered to be offline.
  208. * ``worker-offline(hostname, timestamp)``
  209. The worker has disconnected from the broker.