clickcounter.rst 8.0 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238
  1. ============================================================
  2. Tutorial: Creating a click counter using carrot and celery
  3. ============================================================
  4. Introduction
  5. ============
  6. A click counter should be easy, right? Just a simple view that increments
  7. a click in the DB and forwards you to the real destination.
  8. This would work well for most sites, but when traffic starts to increase,
  9. you are likely to bump into problems. One database write for every click is
  10. not good if you have millions of clicks a day.
  11. So what can you do? In this tutorial we will send the individual clicks as
  12. messages using ``carrot``, and then process them later with a ``celery``
  13. periodic task.
  14. Celery and carrot is excellent in tandem, and while this might not be
  15. the perfect example, you'll at least see one example how of they can be used
  16. to solve a task.
  17. The model
  18. =========
  19. The model is simple, ``Click`` has the URL as primary key and a number of
  20. clicks for that URL. Its manager, ``ClickManager`` implements the
  21. ``increment_clicks`` method, which takes a URL and by how much to increment
  22. its count by.
  23. *clickmuncher/models.py*:
  24. .. code-block:: python
  25. from django.db import models
  26. from django.utils.translation import ugettext_lazy as _
  27. class ClickManager(models.Manager):
  28. def increment_clicks(self, for_url, increment_by=1):
  29. """Increment the click count for an URL.
  30. >>> Click.objects.increment_clicks("http://google.com", 10)
  31. """
  32. click, created = self.get_or_create(url=for_url,
  33. defaults={"click_count": increment_by})
  34. if not created:
  35. click.click_count += increment_by
  36. click.save()
  37. return click.click_count
  38. class Click(models.Model):
  39. url = models.URLField(_(u"URL"), verify_exists=False, unique=True)
  40. click_count = models.PositiveIntegerField(_(u"click_count"),
  41. default=0)
  42. objects = ClickManager()
  43. class Meta:
  44. verbose_name = _(u"URL clicks")
  45. verbose_name_plural = _(u"URL clicks")
  46. Using carrot to send clicks as messages
  47. ========================================
  48. The model is normal django stuff, nothing new there. But now we get on to
  49. the messaging. It has been a tradition for me to put the projects messaging
  50. related code in its own ``messaging.py`` module, and I will continue to do so
  51. here so maybe you can adopt this practice. In this module we have two
  52. functions:
  53. * ``send_increment_clicks``
  54. This function sends a simple message to the broker. The message body only
  55. contains the URL we want to increment as plain-text, so the exchange and
  56. routing key play a role here. We use an exchange called ``clicks``, with a
  57. routing key of ``increment_click``, so any consumer binding a queue to
  58. this exchange using this routing key will receive these messages.
  59. * ``process_clicks``
  60. This function processes all currently gathered clicks sent using
  61. ``send_increment_clicks``. Instead of issuing one database query for every
  62. click it processes all of the messages first, calculates the new click count
  63. and issues one update per URL. A message that has been received will not be
  64. deleted from the broker until it has been acknowledged by the receiver, so
  65. if the reciever dies in the middle of processing the message, it will be
  66. re-sent at a later point in time. This guarantees delivery and we respect
  67. this feature here by not acknowledging the message until the clicks has
  68. actually been written to disk.
  69. **Note**: This could probably be optimized further with
  70. some hand-written SQL, but it will do for now. Let's say it's an excersise
  71. left for the picky reader, albeit a discouraged one if you can survive
  72. without doing it.
  73. On to the code...
  74. *clickmuncher/messaging.py*:
  75. .. code-block:: python
  76. from carrot.connection import DjangoAMQPConnection
  77. from carrot.messaging import Publisher, Consumer
  78. from clickmuncher.models import Click
  79. def send_increment_clicks(for_url):
  80. """Send a message for incrementing the click count for an URL."""
  81. connection = DjangoAMQPConnection()
  82. publisher = Publisher(connection=connection,
  83. exchange="clicks",
  84. routing_key="increment_click",
  85. exchange_type="direct")
  86. publisher.send(for_url)
  87. publisher.close()
  88. connection.close()
  89. def process_clicks():
  90. """Process all currently gathered clicks by saving them to the
  91. database."""
  92. connection = DjangoAMQPConnection()
  93. consumer = Consumer(connection=connection,
  94. queue="clicks",
  95. exchange="clicks",
  96. routing_key="increment_click",
  97. exchange_type="direct")
  98. # First process the messages: save the number of clicks
  99. # for every URL.
  100. clicks_for_url = {}
  101. messages_for_url = {}
  102. for message in consumer.iterqueue():
  103. url = message.body
  104. clicks_for_url[url] = clicks_for_url.get(url, 0) + 1
  105. # We also need to keep the message objects so we can ack the
  106. # messages as processed when we are finished with them.
  107. if url in messages_for_url:
  108. messages_for_url[url].append(message)
  109. else:
  110. messages_for_url[url] = [message]
  111. # Then increment the clicks in the database so we only need
  112. # one UPDATE/INSERT for each URL.
  113. for url, click_count in clicks_for_urls.items():
  114. Click.objects.increment_clicks(url, click_count)
  115. # Now that the clicks has been registered for this URL we can
  116. # acknowledge the messages
  117. [message.ack() for message in messages_for_url[url]]
  118. consumer.close()
  119. connection.close()
  120. View and URLs
  121. =============
  122. This is also simple stuff, don't think I have to explain this code to you.
  123. The interface is as follows, if you have a link to http://google.com you
  124. would want to count the clicks for, you replace the URL with:
  125. http://mysite/clickmuncher/count/?u=http://google.com
  126. and the ``count`` view will send off an increment message and forward you to
  127. that site.
  128. *clickmuncher/views.py*:
  129. .. code-block:: python
  130. from django.http import HttpResponseRedirect
  131. from clickmuncher.messaging import send_increment_clicks
  132. def count(request):
  133. url = request.GET["u"]
  134. send_increment_clicks(url)
  135. return HttpResponseRedirect(url)
  136. *clickmuncher/urls.py*:
  137. .. code-block:: python
  138. from django.conf.urls.defaults import patterns, url
  139. from clickmuncher import views
  140. urlpatterns = patterns("",
  141. url(r'^$', views.count, name="clickmuncher-count"),
  142. )
  143. Creating the periodic task
  144. ==========================
  145. Processing the clicks every 30 minutes is easy using celery periodic tasks.
  146. *clickmuncher/tasks.py*:
  147. .. code-block:: python
  148. from celery.task import PeriodicTask
  149. from celery.registry import tasks
  150. from clickmuncher.messaging import process_clicks
  151. from datetime import timedelta
  152. class ProcessClicksTask(PeriodicTask):
  153. run_every = timedelta(minutes=30)
  154. def run(self, \*\*kwargs):
  155. process_clicks()
  156. tasks.register(ProcessClicksTask)
  157. We subclass from :class:`celery.task.base.PeriodicTask`, set the ``run_every``
  158. attribute and in the body of the task just call the ``process_clicks``
  159. function we wrote earlier. Finally, we register the task in the task registry
  160. so the celery workers is able to recognize and find it.
  161. Finishing
  162. =========
  163. There are still ways to improve this application. The URLs could be cleaned
  164. so the url http://google.com and http://google.com/ is the same. Maybe it's
  165. even possible to update the click count using a single UPDATE query?
  166. If you have any questions regarding this tutorial, please send a mail to the
  167. mailing-list or come join us in the #celery IRC channel at Freenode:
  168. http://celeryq.org/introduction.html#getting-help