123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393394395396397398399400401402403404405406407408409410411412413414415416417418419420421422423424425426427428429430431432433434435436437438439440441442443444445446447448449450451452453454455456457458459460461462463464465466467468469470471472473474475476477478479480481482483484485486487488489490491492493494495496497498499500501502503504505506507508509510511512513514515516517518519520521522523524525526527528529530531532533534535536537538539540541542543544545546547548549550551552553554555556557558559560561562563564565566567568569570571572573574575576577578579580581582583584585586587588589590591592593594595596597598599600601602603604605606 |
- .. _guide-monitoring:
- =================================
- Monitoring and Management Guide
- =================================
- .. contents::
- :local:
- Introduction
- ============
- There are several tools available to monitor and inspect Celery clusters.
- This document describes some of these, as as well as
- features related to monitoring, like events and broadcast commands.
- .. _monitoring-workers:
- Workers
- =======
- .. _monitoring-celeryctl:
- ``celery``: Management Command-line Utility
- -------------------------------------------
- .. versionadded:: 2.1
- :program:`celery` can also be used to inspect
- and manage worker nodes (and to some degree tasks).
- To list all the commands available do::
- $ celery help
- or to get help for a specific command do::
- $ celery <command> --help
- Commands
- ~~~~~~~~
- * **shell**: Drop into a Python shell.
- The locals will include the ``celery`` variable, which is the current app.
- Also all known tasks will be automatically added to locals (unless the
- ``--without-tasks`` flag is set).
- Uses Ipython, bpython, or regular python in that order if installed.
- You can force an implementation using ``--force-ipython|-I``,
- ``--force-bpython|-B``, or ``--force-python|-P``.
- * **status**: List active nodes in this cluster
- ::
- $ celery status
- * **result**: Show the result of a task
- ::
- $ celery result -t tasks.add 4e196aa4-0141-4601-8138-7aa33db0f577
- Note that you can omit the name of the task as long as the
- task doesn't use a custom result backend.
- * **purge**: Purge messages from all configured task queues.
- ::
- $ celery purge
- .. warning::
- There is no undo for this operation, and messages will
- be permanently deleted!
- * **inspect active**: List active tasks
- ::
- $ celery inspect active
- These are all the tasks that are currently being executed.
- * **inspect scheduled**: List scheduled ETA tasks
- ::
- $ celery inspect scheduled
- These are tasks reserved by the worker because they have the
- `eta` or `countdown` argument set.
- * **inspect reserved**: List reserved tasks
- ::
- $ celery inspect reserved
- This will list all tasks that have been prefetched by the worker,
- and is currently waiting to be executed (does not include tasks
- with an eta).
- * **inspect revoked**: List history of revoked tasks
- ::
- $ celery inspect revoked
- * **inspect registered**: List registered tasks
- ::
- $ celery inspect registered
- * **inspect stats**: Show worker statistics
- ::
- $ celery inspect stats
- * **inspect enable_events**: Enable events
- ::
- $ celery inspect enable_events
- * **inspect disable_events**: Disable events
- ::
- $ celery inspect disable_events
- * **migrate**: Migrate tasks from one broker to another (**EXPERIMENTAL**).
- ::
- $ celery migrate redis://localhost amqp://localhost
- This command will migrate all the tasks on one broker to another.
- As this command is new and experimental you should be sure to have
- a backup of the data before proceeding.
- .. note::
- All ``inspect`` commands supports a ``--timeout`` argument,
- This is the number of seconds to wait for responses.
- You may have to increase this timeout if you're not getting a response
- due to latency.
- .. _celeryctl-inspect-destination:
- Specifying destination nodes
- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~
- By default the inspect commands operates on all workers.
- You can specify a single, or a list of workers by using the
- `--destination` argument::
- $ celery inspect -d w1,w2 reserved
- .. _monitoring-django-admin:
- Django Admin Monitor
- --------------------
- .. versionadded:: 2.1
- When you add `django-celery`_ to your Django project you will
- automatically get a monitor section as part of the Django admin interface.
- This can also be used if you're not using Celery with a Django project.
- *Screenshot*
- .. figure:: ../images/djangoceleryadmin2.jpg
- .. _`django-celery`: http://pypi.python.org/pypi/django-celery
- .. _monitoring-django-starting:
- Starting the monitor
- ~~~~~~~~~~~~~~~~~~~~
- The Celery section will already be present in your admin interface,
- but you won't see any data appearing until you start the snapshot camera.
- The camera takes snapshots of the events your workers sends at regular
- intervals, storing them in your database (See :ref:`monitoring-snapshots`).
- To start the camera run::
- $ python manage.py celerycam
- If you haven't already enabled the sending of events you need to do so::
- $ python manage.py celery inspect enable_events
- :Tip: You can enable events when the worker starts using the `-E` argument.
- Now that the camera has been started, and events have been enabled
- you should be able to see your workers and the tasks in the admin interface
- (it may take some time for workers to show up).
- The admin interface shows tasks, worker nodes, and even
- lets you perform some actions, like revoking and rate limiting tasks,
- or shutting down worker nodes.
- .. _monitoring-django-frequency:
- Shutter frequency
- ~~~~~~~~~~~~~~~~~
- By default the camera takes a snapshot every second, if this is too frequent
- or you want to have higher precision, then you can change this using the
- ``--frequency`` argument. This is a float describing how often, in seconds,
- it should wake up to check if there are any new events::
- $ python manage.py celerycam --frequency=3.0
- The camera also supports rate limiting using the ``--maxrate`` argument.
- While the frequency controls how often the camera thread wakes up,
- the rate limit controls how often it will actually take a snapshot.
- The rate limits can be specified in seconds, minutes or hours
- by appending `/s`, `/m` or `/h` to the value.
- Example: ``--maxrate=100/m``, means "hundred writes a minute".
- The rate limit is off by default, which means it will take a snapshot
- for every ``--frequency`` seconds.
- The events also expire after some time, so the database doesn't fill up.
- Successful tasks are deleted after 1 day, failed tasks after 3 days,
- and tasks in other states after 5 days.
- .. _monitoring-nodjango:
- Using outside of Django
- ~~~~~~~~~~~~~~~~~~~~~~~
- `django-celery` also installs the :program:`djcelerymon` program. This
- can be used by non-Django users, and runs both a web server and a snapshot
- camera in the same process.
- **Installing**
- Using :program:`pip`::
- $ pip install -U django-celery
- or using :program:`easy_install`::
- $ easy_install -U django-celery
- **Running**
- :program:`djcelerymon` reads configuration from your Celery configuration
- module, and sets up the Django environment using the same settings::
- $ djcelerymon
- Database tables will be created the first time the monitor is run.
- By default an `sqlite3` database file named
- :file:`djcelerymon.db` is used, so make sure this file is writeable by the
- user running the monitor.
- If you want to store the events in a different database, e.g. MySQL,
- then you can configure the `DATABASE*` settings directly in your Celery
- config module. See http://docs.djangoproject.com/en/dev/ref/settings/#databases
- for more information about the database options available.
- You will also be asked to create a superuser (and you need to create one
- to be able to log into the admin later)::
- Creating table auth_permission
- Creating table auth_group_permissions
- [...]
- You just installed Django's auth system, which means you don't
- have any superusers defined. Would you like to create
- one now? (yes/no): yes
- Username (Leave blank to use 'username'): username
- Email address: me@example.com
- Password: ******
- Password (again): ******
- Superuser created successfully.
- [...]
- Django version 1.2.1, using settings 'celeryconfig'
- Development server is running at http://127.0.0.1:8000/
- Quit the server with CONTROL-C.
- Now that the service is started you can visit the monitor
- at http://127.0.0.1:8000, and log in using the user you created.
- For a list of the command line options supported by :program:`djcelerymon`,
- please see ``djcelerymon --help``.
- .. _monitoring-celeryev:
- celery events: Curses Monitor
- -----------------------------
- .. versionadded:: 2.0
- `celery events` is a simple curses monitor displaying
- task and worker history. You can inspect the result and traceback of tasks,
- and it also supports some management commands like rate limiting and shutting
- down workers.
- Starting::
- $ celery events
- You should see a screen like:
- .. figure:: ../images/celeryevshotsm.jpg
- `celery events` is also used to start snapshot cameras (see
- :ref:`monitoring-snapshots`::
- $ celery events --camera=<camera-class> --frequency=1.0
- and it includes a tool to dump events to :file:`stdout`::
- $ celery events --dump
- For a complete list of options use ``--help``::
- $ celery events --help
- .. _monitoring-celerymon:
- celerymon: Web monitor
- ----------------------
- `celerymon`_ is the ongoing work to create a web monitor.
- It's far from complete yet, and does currently only support
- a JSON API. Help is desperately needed for this project, so if you,
- or someone you know would like to contribute templates, design, code
- or help this project in any way, please get in touch!
- :Tip: The Django admin monitor can be used even though you're not using
- Celery with a Django project. See :ref:`monitoring-nodjango`.
- .. _`celerymon`: http://github.com/celery/celerymon/
- .. _monitoring-rabbitmq:
- RabbitMQ
- ========
- To manage a Celery cluster it is important to know how
- RabbitMQ can be monitored.
- RabbitMQ ships with the `rabbitmqctl(1)`_ command,
- with this you can list queues, exchanges, bindings,
- queue lengths, the memory usage of each queue, as well
- as manage users, virtual hosts and their permissions.
- .. note::
- The default virtual host (``"/"``) is used in these
- examples, if you use a custom virtual host you have to add
- the ``-p`` argument to the command, e.g:
- ``rabbitmqctl list_queues -p my_vhost ....``
- .. _`rabbitmqctl(1)`: http://www.rabbitmq.com/man/rabbitmqctl.1.man.html
- .. _monitoring-rmq-queues:
- Inspecting queues
- -----------------
- Finding the number of tasks in a queue::
- $ rabbitmqctl list_queues name messages messages_ready \
- messages_unacknowledged
- Here `messages_ready` is the number of messages ready
- for delivery (sent but not received), `messages_unacknowledged`
- is the number of messages that has been received by a worker but
- not acknowledged yet (meaning it is in progress, or has been reserved).
- `messages` is the sum of ready and unacknowledged messages.
- Finding the number of workers currently consuming from a queue::
- $ rabbitmqctl list_queues name consumers
- Finding the amount of memory allocated to a queue::
- $ rabbitmqctl list_queues name memory
- :Tip: Adding the ``-q`` option to `rabbitmqctl(1)`_ makes the output
- easier to parse.
- .. _monitoring-redis:
- Redis
- =====
- If you're using Redis as the broker, you can monitor the Celery cluster using
- the `redis-cli(1)` command to list lengths of queues.
- .. _monitoring-redis-queues:
- Inspecting queues
- -----------------
- Finding the number of tasks in a queue::
- $ redis-cli -h HOST -p PORT -n DATABASE_NUMBER llen QUEUE_NAME
- The default queue is named `celery`. To get all available queues, invoke::
- $ redis-cli -h HOST -p PORT -n DATABASE_NUMBER keys \*
- .. note::
- If a list has no elements in Redis, it doesn't exist. Hence it won't show up
- in the `keys` command output. `llen` for that list returns 0 in that case.
- On the other hand, if you're also using Redis for other purposes, the output
- of the `keys` command will include unrelated values stored in the database.
- The recommended way around this is to use a dedicated `DATABASE_NUMBER` for
- Celery.
- .. _monitoring-munin:
- Munin
- =====
- This is a list of known Munin plug-ins that can be useful when
- maintaining a Celery cluster.
- * rabbitmq-munin: Munin plug-ins for RabbitMQ.
- http://github.com/ask/rabbitmq-munin
- * celery_tasks: Monitors the number of times each task type has
- been executed (requires `celerymon`).
- http://exchange.munin-monitoring.org/plugins/celery_tasks-2/details
- * celery_task_states: Monitors the number of tasks in each state
- (requires `celerymon`).
- http://exchange.munin-monitoring.org/plugins/celery_tasks/details
- .. _monitoring-events:
- Events
- ======
- The worker has the ability to send a message whenever some event
- happens. These events are then captured by tools like :program:`celerymon`
- and :program:`celery events` to monitor the cluster.
- .. _monitoring-snapshots:
- Snapshots
- ---------
- .. versionadded:: 2.1
- Even a single worker can produce a huge amount of events, so storing
- the history of all events on disk may be very expensive.
- A sequence of events describes the cluster state in that time period,
- by taking periodic snapshots of this state we can keep all history, but
- still only periodically write it to disk.
- To take snapshots you need a Camera class, with this you can define
- what should happen every time the state is captured; You can
- write it to a database, send it by email or something else entirely.
- :program:`celery events` is then used to take snapshots with the camera,
- for example if you want to capture state every 2 seconds using the
- camera ``myapp.Camera`` you run :program:`celery events` with the following
- arguments::
- $ celery events -c myapp.Camera --frequency=2.0
- .. _monitoring-camera:
- Custom Camera
- ~~~~~~~~~~~~~
- Here is an example camera, dumping the snapshot to screen:
- .. code-block:: python
- from pprint import pformat
- from celery.events.snapshot import Polaroid
- class DumpCam(Polaroid):
- def on_shutter(self, state):
- if not state.event_count:
- # No new events since last snapshot.
- return
- print("Workers: %s" % (pformat(state.workers, indent=4), ))
- print("Tasks: %s" % (pformat(state.tasks, indent=4), ))
- print("Total: %s events, %s tasks" % (
- state.event_count, state.task_count))
- See the API reference for :mod:`celery.events.state` to read more
- about state objects.
- Now you can use this cam with :program:`celery events` by specifying
- it with the `-c` option::
- $ celery events -c myapp.DumpCam --frequency=2.0
- Or you can use it programmatically like this::
- from celery.events import EventReceiver
- from celery.messaging import establish_connection
- from celery.events.state import State
- from myapp import DumpCam
- def main():
- state = State()
- with establish_connection() as connection:
- recv = EventReceiver(connection, handlers={"*": state.event})
- with DumpCam(state, freq=1.0):
- recv.capture(limit=None, timeout=None)
- if __name__ == "__main__":
- main()
- .. _event-reference:
- Event Reference
- ---------------
- This list contains the events sent by the worker, and their arguments.
- .. _event-reference-task:
- Task Events
- ~~~~~~~~~~~
- * ``task-sent(uuid, name, args, kwargs, retries, eta, expires, queue)``
- Sent when a task message is published and
- the :setting:`CELERY_SEND_TASK_SENT_EVENT` setting is enabled.
- * ``task-received(uuid, name, args, kwargs, retries, eta, hostname,
- timestamp)``
- Sent when the worker receives a task.
- * ``task-started(uuid, hostname, timestamp, pid)``
- Sent just before the worker executes the task.
- * ``task-succeeded(uuid, result, runtime, hostname, timestamp)``
- Sent if the task executed successfully.
- Runtime is the time it took to execute the task using the pool.
- (Starting from the task is sent to the worker pool, and ending when the
- pool result handler callback is called).
- * ``task-failed(uuid, exception, traceback, hostname, timestamp)``
- Sent if the execution of the task failed.
- * ``task-revoked(uuid)``
- Sent if the task has been revoked (Note that this is likely
- to be sent by more than one worker).
- * ``task-retried(uuid, exception, traceback, hostname, timestamp)``
- Sent if the task failed, but will be retried in the future.
- .. _event-reference-worker:
- Worker Events
- ~~~~~~~~~~~~~
- * ``worker-online(hostname, timestamp, freq, sw_ident, sw_ver, sw_sys)``
- The worker has connected to the broker and is online.
- * `hostname`: Hostname of the worker.
- * `timestamp`: Event timestamp.
- * `freq`: Heartbeat frequency in seconds (float).
- * `sw_ident`: Name of worker software (e.g. ``py-celery``).
- * `sw_ver`: Software version (e.g. 2.2.0).
- * `sw_sys`: Operating System (e.g. Linux, Windows, Darwin).
- * ``worker-heartbeat(hostname, timestamp, freq, sw_ident, sw_ver, sw_sys)``
- Sent every minute, if the worker has not sent a heartbeat in 2 minutes,
- it is considered to be offline.
- * ``worker-offline(hostname, timestamp, freq, sw_ident, sw_ver, sw_sys)``
- The worker has disconnected from the broker.
|