123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262 |
- .. _guide-optimizing:
- ============
- Optimizing
- ============
- Introduction
- ============
- The default configuration makes a lot of compromises. It's not optimal for
- any single case, but works well enough for most situations.
- There are optimizations that can be applied based on specific use cases.
- Optimizations can apply to different properties of the running environment,
- be it the time tasks take to execute, the amount of memory used, or
- responsiveness at times of high load.
- Ensuring Operations
- ===================
- In the book `Programming Pearls`_, Jon Bentley presents the concept of
- back-of-the-envelope calculations by asking the question;
- ❝ How much water flows out of the Mississippi River in a day? ❞
- The point of this exercise [*]_ is to show that there's a limit
- to how much data a system can process in a timely manner.
- Back of the envelope calculations can be used as a means to plan for this
- ahead of time.
- In Celery; If a task takes 10 minutes to complete,
- and there are 10 new tasks coming in every minute, the queue will never
- be empty. This is why it's very important
- that you monitor queue lengths!
- A way to do this is by :ref:`using Munin <monitoring-munin>`.
- You should set up alerts, that'll notify you as soon as any queue has
- reached an unacceptable size. This way you can take appropriate action
- like adding new worker nodes, or revoking unnecessary tasks.
- .. _`Programming Pearls`: http://www.cs.bell-labs.com/cm/cs/pearls/
- .. _`The back of the envelope`:
- http://books.google.com/books?id=kse_7qbWbjsC&pg=PA67
- .. _optimizing-general-settings:
- General Settings
- ================
- .. _optimizing-librabbitmq:
- librabbitmq
- -----------
- If you're using RabbitMQ (AMQP) as the broker then you can install the
- :pypi:`librabbitmq` module to use an optimized client written in C:
- .. code-block:: console
- $ pip install librabbitmq
- The 'amqp' transport will automatically use the librabbitmq module if it's
- installed, or you can also specify the transport you want directly by using
- the ``pyamqp://`` or ``librabbitmq://`` prefixes.
- .. _optimizing-connection-pools:
- Broker Connection Pools
- -----------------------
- The broker connection pool is enabled by default since version 2.5.
- You can tweak the :setting:`broker_pool_limit` setting to minimize
- contention, and the value should be based on the number of
- active threads/green-threads using broker connections.
- .. _optimizing-transient-queues:
- Using Transient Queues
- ----------------------
- Queues created by Celery are persistent by default. This means that
- the broker will write messages to disk to ensure that the tasks will
- be executed even if the broker is restarted.
- But in some cases it's fine that the message is lost, so not all tasks
- require durability. You can create a *transient* queue for these tasks
- to improve performance:
- .. code-block:: python
- from kombu import Exchange, Queue
- task_queues = (
- Queue('celery', routing_key='celery'),
- Queue('transient', Exchange('transient', delivery_mode=1),
- routing_key='transient', durable=False),
- )
- or by using :setting:`task_routes`:
- .. code-block:: python
- task_routes = {
- 'proj.tasks.add': {'queue': 'celery', 'delivery_mode': 'transient'}
- }
- The ``delivery_mode`` changes how the messages to this queue are delivered.
- A value of one means that the message won't be written to disk, and a value
- of two (default) means that the message can be written to disk.
- To direct a task to your new transient queue you can specify the queue
- argument (or use the :setting:`task_routes` setting):
- .. code-block:: python
- task.apply_async(args, queue='transient')
- For more information see the :ref:`routing guide <guide-routing>`.
- .. _optimizing-worker-settings:
- Worker Settings
- ===============
- .. _optimizing-prefetch-limit:
- Prefetch Limits
- ---------------
- *Prefetch* is a term inherited from AMQP that's often misunderstood
- by users.
- The prefetch limit is a **limit** for the number of tasks (messages) a worker
- can reserve for itself. If it is zero, the worker will keep
- consuming messages, not respecting that there may be other
- available worker nodes that may be able to process them sooner [*]_,
- or that the messages may not even fit in memory.
- The workers' default prefetch count is the
- :setting:`worker_prefetch_multiplier` setting multiplied by the number
- of concurrency slots [*]_ (processes/threads/green-threads).
- If you have many tasks with a long duration you want
- the multiplier value to be *one*: meaning it'll only reserve one
- task per worker process at a time.
- However -- If you have many short-running tasks, and throughput/round trip
- latency is important to you, this number should be large. The worker is
- able to process more tasks per second if the messages have already been
- prefetched, and is available in memory. You may have to experiment to find
- the best value that works for you. Values like 50 or 150 might make sense in
- these circumstances. Say 64, or 128.
- If you have a combination of long- and short-running tasks, the best option
- is to use two worker nodes that are configured separately, and route
- the tasks according to the run-time (see :ref:`guide-routing`).
- Reserve one task at a time
- --------------------------
- The task message is only deleted from the queue after the task is
- :term:`acknowledged`, so if the worker crashes before acknowledging the task,
- it can be redelivered to another worker (or the same after recovery).
- When using the default of early acknowledgment, having a prefetch multiplier setting
- of *one*, means the worker will reserve at most one extra task for every
- worker process: or in other words, if the worker is started with
- :option:`-c 10 <celery worker -c>`, the worker may reserve at most 20
- tasks (10 acknowledged tasks executing, and 10 unacknowledged reserved
- tasks) at any time.
- Often users ask if disabling "prefetching of tasks" is possible, but what
- they really mean by that, is to have a worker only reserve as many tasks as
- there are worker processes (10 unacknowledged tasks for
- :option:`-c 10 <celery worker -c>`)
- That's possible, but not without also enabling
- :term:`late acknowledgment`. Using this option over the
- default behavior means a task that's already started executing will be
- retried in the event of a power failure or the worker instance being killed
- abruptly, so this also means the task must be :term:`idempotent`
- .. seealso::
- Notes at :ref:`faq-acks_late-vs-retry`.
- You can enable this behavior by using the following configuration options:
- .. code-block:: python
- task_acks_late = True
- worker_prefetch_multiplier = 1
- .. _prefork-pool-prefetch:
- Prefork pool prefetch settings
- ------------------------------
- The prefork pool will asynchronously send as many tasks to the processes
- as it can and this means that the processes are, in effect, prefetching
- tasks.
- This benefits performance but it also means that tasks may be stuck
- waiting for long running tasks to complete::
- -> send task T1 to process A
- # A executes T1
- -> send task T2 to process B
- # B executes T2
- <- T2 complete sent by process B
- -> send task T3 to process A
- # A still executing T1, T3 stuck in local buffer and won't start until
- # T1 returns, and other queued tasks won't be sent to idle processes
- <- T1 complete sent by process A
- # A executes T3
- The worker will send tasks to the process as long as the pipe buffer is
- writable. The pipe buffer size varies based on the operating system: some may
- have a buffer as small as 64KB but on recent Linux versions the buffer
- size is 1MB (can only be changed system wide).
- You can disable this prefetching behavior by enabling the
- :option:`-O fair <celery worker -O>` worker option:
- .. code-block:: console
- $ celery -A proj worker -l info -O fair
- With this option enabled the worker will only write to processes that are
- available for work, disabling the prefetch behavior::
- -> send task T1 to process A
- # A executes T1
- -> send task T2 to process B
- # B executes T2
- <- T2 complete sent by process B
- -> send T3 to process B
- # B executes T3
- <- T3 complete sent by process B
- <- T1 complete sent by process A
- .. rubric:: Footnotes
- .. [*] The chapter is available to read for free here:
- `The back of the envelope`_. The book is a classic text. Highly
- recommended.
- .. [*] RabbitMQ and other brokers deliver messages round-robin,
- so this doesn't apply to an active system. If there's no prefetch
- limit and you restart the cluster, there will be timing delays between
- nodes starting. If there are 3 offline nodes and one active node,
- all messages will be delivered to the active node.
- .. [*] This is the concurrency setting; :setting:`worker_concurrency` or the
- :option:`celery worker -c` option.
|