| 123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262 | .. _guide-optimizing:============ Optimizing============Introduction============The default configuration makes a lot of compromises. It's not optimal forany single case, but works well enough for most situations.There are optimizations that can be applied based on specific use cases.Optimizations can apply to different properties of the running environment,be it the time tasks take to execute, the amount of memory used, orresponsiveness at times of high load.Ensuring Operations===================In the book `Programming Pearls`_, Jon Bentley presents the concept ofback-of-the-envelope calculations by asking the question;    ❝ How much water flows out of the Mississippi River in a day? ❞The point of this exercise [*]_ is to show that there's a limitto how much data a system can process in a timely manner.Back of the envelope calculations can be used as a means to plan for thisahead of time.In Celery; If a task takes 10 minutes to complete,and there are 10 new tasks coming in every minute, the queue will neverbe empty. This is why it's very importantthat you monitor queue lengths!A way to do this is by :ref:`using Munin <monitoring-munin>`.You should set up alerts, that'll notify you as soon as any queue hasreached an unacceptable size. This way you can take appropriate actionlike adding new worker nodes, or revoking unnecessary tasks... _`Programming Pearls`: http://www.cs.bell-labs.com/cm/cs/pearls/.. _`The back of the envelope`:    http://books.google.com/books?id=kse_7qbWbjsC&pg=PA67.. _optimizing-general-settings:General Settings================.. _optimizing-librabbitmq:librabbitmq-----------If you're using RabbitMQ (AMQP) as the broker then you can install the:pypi:`librabbitmq` module to use an optimized client written in C:.. code-block:: console    $ pip install librabbitmqThe 'amqp' transport will automatically use the librabbitmq module if it'sinstalled, or you can also specify the transport you want directly by usingthe ``pyamqp://`` or ``librabbitmq://`` prefixes... _optimizing-connection-pools:Broker Connection Pools-----------------------The broker connection pool is enabled by default since version 2.5.You can tweak the :setting:`broker_pool_limit` setting to minimizecontention, and the value should be based on the number ofactive threads/green-threads using broker connections... _optimizing-transient-queues:Using Transient Queues----------------------Queues created by Celery are persistent by default. This means thatthe broker will write messages to disk to ensure that the tasks willbe executed even if the broker is restarted.But in some cases it's fine that the message is lost, so not all tasksrequire durability. You can create a *transient* queue for these tasksto improve performance:.. code-block:: python    from kombu import Exchange, Queue    task_queues = (        Queue('celery', routing_key='celery'),        Queue('transient', Exchange('transient', delivery_mode=1),              routing_key='transient', durable=False),    )or by using :setting:`task_routes`:.. code-block:: python    task_routes = {        'proj.tasks.add': {'queue': 'celery', 'delivery_mode': 'transient'}    }The ``delivery_mode`` changes how the messages to this queue are delivered.A value of one means that the message won't be written to disk, and a valueof two (default) means that the message can be written to disk.To direct a task to your new transient queue you can specify the queueargument (or use the :setting:`task_routes` setting):.. code-block:: python    task.apply_async(args, queue='transient')For more information see the :ref:`routing guide <guide-routing>`... _optimizing-worker-settings:Worker Settings===============.. _optimizing-prefetch-limit:Prefetch Limits---------------*Prefetch* is a term inherited from AMQP that's often misunderstoodby users.The prefetch limit is a **limit** for the number of tasks (messages) a workercan reserve for itself. If it is zero, the worker will keepconsuming messages, not respecting that there may be otheravailable worker nodes that may be able to process them sooner [*]_,or that the messages may not even fit in memory.The workers' default prefetch count is the:setting:`worker_prefetch_multiplier` setting multiplied by the numberof concurrency slots [*]_ (processes/threads/green-threads).If you have many tasks with a long duration you wantthe multiplier value to be *one*: meaning it'll only reserve onetask per worker process at a time.However -- If you have many short-running tasks, and throughput/round triplatency is important to you, this number should be large. The worker isable to process more tasks per second if the messages have already beenprefetched, and is available in memory. You may have to experiment to findthe best value that works for you. Values like 50 or 150 might make sense inthese circumstances. Say 64, or 128.If you have a combination of long- and short-running tasks, the best optionis to use two worker nodes that are configured separately, and routethe tasks according to the run-time (see :ref:`guide-routing`).Reserve one task at a time--------------------------The task message is only deleted from the queue after the task is:term:`acknowledged`, so if the worker crashes before acknowledging the task,it can be redelivered to another worker (or the same after recovery).When using the default of early acknowledgment, having a prefetch multiplier settingof *one*, means the worker will reserve at most one extra task for everyworker process: or in other words, if the worker is started with:option:`-c 10 <celery worker -c>`, the worker may reserve at most 20tasks (10 unacknowledged tasks executing, and 10 unacknowledged reservedtasks) at any time.Often users ask if disabling "prefetching of tasks" is possible, but whatthey really mean by that, is to have a worker only reserve as many tasks asthere are worker processes (10 unacknowledged tasks for:option:`-c 10 <celery worker -c>`)That's possible, but not without also enabling:term:`late acknowledgment`. Using this option over thedefault behavior means a task that's already started executing will beretried in the event of a power failure or the worker instance being killedabruptly, so this also means the task must be :term:`idempotent`.. seealso::    Notes at :ref:`faq-acks_late-vs-retry`.You can enable this behavior by using the following configuration options:.. code-block:: python    task_acks_late = True    worker_prefetch_multiplier = 1.. _prefork-pool-prefetch:Prefork pool prefetch settings------------------------------The prefork pool will asynchronously send as many tasks to the processesas it can and this means that the processes are, in effect, prefetchingtasks.This benefits performance but it also means that tasks may be stuckwaiting for long running tasks to complete::    -> send task T1 to process A    # A executes T1    -> send task T2 to process B    # B executes T2    <- T2 complete sent by process B    -> send task T3 to process A    # A still executing T1, T3 stuck in local buffer and won't start until    # T1 returns, and other queued tasks won't be sent to idle processes    <- T1 complete sent by process A    # A executes T3The worker will send tasks to the process as long as the pipe buffer iswritable. The pipe buffer size varies based on the operating system: some mayhave a buffer as small as 64KB but on recent Linux versions the buffersize is 1MB (can only be changed system wide).You can disable this prefetching behavior by enabling the:option:`-Ofair <celery worker -O>` worker option:.. code-block:: console    $ celery -A proj worker -l info -OfairWith this option enabled the worker will only write to processes that areavailable for work, disabling the prefetch behavior::    -> send task T1 to process A    # A executes T1    -> send task T2 to process B    # B executes T2    <- T2 complete sent by process B    -> send T3 to process B    # B executes T3    <- T3 complete sent by process B    <- T1 complete sent by process A.. rubric:: Footnotes.. [*] The chapter is available to read for free here:       `The back of the envelope`_. The book is a classic text. Highly       recommended... [*] RabbitMQ and other brokers deliver messages round-robin,       so this doesn't apply to an active system. If there's no prefetch       limit and you restart the cluster, there will be timing delays between       nodes starting. If there are 3 offline nodes and one active node,       all messages will be delivered to the active node... [*] This is the concurrency setting; :setting:`worker_concurrency` or the       :option:`celery worker -c` option.
 |