optimizing.rst 8.7 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262
  1. .. _guide-optimizing:
  2. ============
  3. Optimizing
  4. ============
  5. Introduction
  6. ============
  7. The default configuration makes a lot of compromises. It's not optimal for
  8. any single case, but works well enough for most situations.
  9. There are optimizations that can be applied based on specific use cases.
  10. Optimizations can apply to different properties of the running environment,
  11. be it the time tasks take to execute, the amount of memory used, or
  12. responsiveness at times of high load.
  13. Ensuring Operations
  14. ===================
  15. In the book `Programming Pearls`_, Jon Bentley presents the concept of
  16. back-of-the-envelope calculations by asking the question;
  17. ❝ How much water flows out of the Mississippi River in a day? ❞
  18. The point of this exercise [*]_ is to show that there is a limit
  19. to how much data a system can process in a timely manner.
  20. Back of the envelope calculations can be used as a means to plan for this
  21. ahead of time.
  22. In Celery; If a task takes 10 minutes to complete,
  23. and there are 10 new tasks coming in every minute, the queue will never
  24. be empty. This is why it's very important
  25. that you monitor queue lengths!
  26. A way to do this is by :ref:`using Munin <monitoring-munin>`.
  27. You should set up alerts, that will notify you as soon as any queue has
  28. reached an unacceptable size. This way you can take appropriate action
  29. like adding new worker nodes, or revoking unnecessary tasks.
  30. .. _`Programming Pearls`: http://www.cs.bell-labs.com/cm/cs/pearls/
  31. .. _`The back of the envelope`:
  32. http://books.google.com/books?id=kse_7qbWbjsC&pg=PA67
  33. .. _optimizing-general-settings:
  34. General Settings
  35. ================
  36. .. _optimizing-librabbitmq:
  37. librabbitmq
  38. -----------
  39. If you're using RabbitMQ (AMQP) as the broker then you can install the
  40. :pypi:`librabbitmq` module to use an optimized client written in C:
  41. .. code-block:: console
  42. $ pip install librabbitmq
  43. The 'amqp' transport will automatically use the librabbitmq module if it's
  44. installed, or you can also specify the transport you want directly by using
  45. the ``pyamqp://`` or ``librabbitmq://`` prefixes.
  46. .. _optimizing-connection-pools:
  47. Broker Connection Pools
  48. -----------------------
  49. The broker connection pool is enabled by default since version 2.5.
  50. You can tweak the :setting:`broker_pool_limit` setting to minimize
  51. contention, and the value should be based on the number of
  52. active threads/green-threads using broker connections.
  53. .. _optimizing-transient-queues:
  54. Using Transient Queues
  55. ----------------------
  56. Queues created by Celery are persistent by default. This means that
  57. the broker will write messages to disk to ensure that the tasks will
  58. be executed even if the broker is restarted.
  59. But in some cases it's fine that the message is lost, so not all tasks
  60. require durability. You can create a *transient* queue for these tasks
  61. to improve performance:
  62. .. code-block:: python
  63. from kombu import Exchange, Queue
  64. task_queues = (
  65. Queue('celery', routing_key='celery'),
  66. Queue('transient', Exchange('transient', delivery_mode=1),
  67. routing_key='transient', durable=False),
  68. )
  69. or by using :setting:`task_routes`:
  70. .. code-block:: python
  71. task_routes = {
  72. 'proj.tasks.add': {'queue': 'celery', 'delivery_mode': 'transient'}
  73. }
  74. The ``delivery_mode`` changes how the messages to this queue are delivered.
  75. A value of 1 means that the message will not be written to disk, and a value
  76. of 2 (default) means that the message can be written to disk.
  77. To direct a task to your new transient queue you can specify the queue
  78. argument (or use the :setting:`task_routes` setting):
  79. .. code-block:: python
  80. task.apply_async(args, queue='transient')
  81. For more information see the :ref:`routing guide <guide-routing>`.
  82. .. _optimizing-worker-settings:
  83. Worker Settings
  84. ===============
  85. .. _optimizing-prefetch-limit:
  86. Prefetch Limits
  87. ---------------
  88. *Prefetch* is a term inherited from AMQP that is often misunderstood
  89. by users.
  90. The prefetch limit is a **limit** for the number of tasks (messages) a worker
  91. can reserve for itself. If it is zero, the worker will keep
  92. consuming messages, not respecting that there may be other
  93. available worker nodes that may be able to process them sooner [*]_,
  94. or that the messages may not even fit in memory.
  95. The workers' default prefetch count is the
  96. :setting:`worker_prefetch_multiplier` setting multiplied by the number
  97. of concurrency slots[*]_ (processes/threads/green-threads).
  98. If you have many tasks with a long duration you want
  99. the multiplier value to be 1, which means it will only reserve one
  100. task per worker process at a time.
  101. However -- If you have many short-running tasks, and throughput/round trip
  102. latency is important to you, this number should be large. The worker is
  103. able to process more tasks per second if the messages have already been
  104. prefetched, and is available in memory. You may have to experiment to find
  105. the best value that works for you. Values like 50 or 150 might make sense in
  106. these circumstances. Say 64, or 128.
  107. If you have a combination of long- and short-running tasks, the best option
  108. is to use two worker nodes that are configured separately, and route
  109. the tasks according to the run-time. (see :ref:`guide-routing`).
  110. Reserve one task at a time
  111. --------------------------
  112. The task message is only deleted from the queue after the task is
  113. :term:`acknowledged`, so if the worker crashes before acknowledging the task,
  114. it can be redelivered to another worker (or the same after recovery).
  115. When using the default of early acknowledgment, having a prefetch multiplier setting
  116. of 1, means the worker will reserve at most one extra task for every
  117. worker process: or in other words, if the worker is started with
  118. :option:`-c 10 <celery worker -c>`, the worker may reserve at most 20
  119. tasks (10 unacknowledged tasks executing, and 10 unacknowledged reserved
  120. tasks) at any time.
  121. Often users ask if disabling "prefetching of tasks" is possible, but what
  122. they really mean by that is to have a worker only reserve as many tasks as
  123. there are worker processes (10 unacknowledged tasks for
  124. :option:`-c 10 <celery worker -c>`)
  125. That is possible, but not without also enabling
  126. :term:`late acknowledgment`. Using this option over the
  127. default behavior means a task that has already started executing will be
  128. retried in the event of a power failure or the worker instance being killed
  129. abruptly, so this also means the task must be :term:`idempotent`
  130. .. seealso::
  131. Notes at :ref:`faq-acks_late-vs-retry`.
  132. You can enable this behavior by using the following configuration options:
  133. .. code-block:: python
  134. task_acks_late = True
  135. worker_prefetch_multiplier = 1
  136. .. _prefork-pool-prefetch:
  137. Prefork pool prefetch settings
  138. ------------------------------
  139. The prefork pool will asynchronously send as many tasks to the processes
  140. as it can and this means that the processes are, in effect, prefetching
  141. tasks.
  142. This benefits performance but it also means that tasks may be stuck
  143. waiting for long running tasks to complete::
  144. -> send task T1 to process A
  145. # A executes T1
  146. -> send task T2 to process B
  147. # B executes T2
  148. <- T2 complete sent by process B
  149. -> send task T3 to process A
  150. # A still executing T1, T3 stuck in local buffer and will not start until
  151. # T1 returns, and other queued tasks will not be sent to idle processes
  152. <- T1 complete sent by process A
  153. # A executes T3
  154. The worker will send tasks to the process as long as the pipe buffer is
  155. writable. The pipe buffer size varies based on the operating system: some may
  156. have a buffer as small as 64KB but on recent Linux versions the buffer
  157. size is 1MB (can only be changed system wide).
  158. You can disable this prefetching behavior by enabling the
  159. :option:`-Ofair <celery worker -O>` worker option:
  160. .. code-block:: console
  161. $ celery -A proj worker -l info -Ofair
  162. With this option enabled the worker will only write to processes that are
  163. available for work, disabling the prefetch behavior::
  164. -> send task T1 to process A
  165. # A executes T1
  166. -> send task T2 to process B
  167. # B executes T2
  168. <- T2 complete sent by process B
  169. -> send T3 to process B
  170. # B executes T3
  171. <- T3 complete sent by process B
  172. <- T1 complete sent by process A
  173. .. rubric:: Footnotes
  174. .. [*] The chapter is available to read for free here:
  175. `The back of the envelope`_. The book is a classic text. Highly
  176. recommended.
  177. .. [*] RabbitMQ and other brokers deliver messages round-robin,
  178. so this doesn't apply to an active system. If there is no prefetch
  179. limit and you restart the cluster, there will be timing delays between
  180. nodes starting. If there are 3 offline nodes and one active node,
  181. all messages will be delivered to the active node.
  182. .. [*] This is the concurrency setting; :setting:`worker_concurrency` or the
  183. :option:`celery worker -c` option.