clickcounter.rst 7.9 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238
  1. ============================================================
  2. Tutorial: Creating a click counter using carrot and celery
  3. ============================================================
  4. .. contents::
  5. :local:
  6. Introduction
  7. ============
  8. A click counter should be easy, right? Just a simple view that increments
  9. a click in the DB and forwards you to the real destination.
  10. This would work well for most sites, but when traffic starts to increase,
  11. you are likely to bump into problems. One database write for every click is
  12. not good if you have millions of clicks a day.
  13. So what can you do? In this tutorial we will send the individual clicks as
  14. messages using ``carrot``, and then process them later with a ``celery``
  15. periodic task.
  16. Celery and carrot is excellent in tandem, and while this might not be
  17. the perfect example, you'll at least see one example how of they can be used
  18. to solve a task.
  19. The model
  20. =========
  21. The model is simple, ``Click`` has the URL as primary key and a number of
  22. clicks for that URL. Its manager, ``ClickManager`` implements the
  23. ``increment_clicks`` method, which takes a URL and by how much to increment
  24. its count by.
  25. *clickmuncher/models.py*:
  26. .. code-block:: python
  27. from django.db import models
  28. from django.utils.translation import ugettext_lazy as _
  29. class ClickManager(models.Manager):
  30. def increment_clicks(self, for_url, increment_by=1):
  31. """Increment the click count for an URL.
  32. >>> Click.objects.increment_clicks("http://google.com", 10)
  33. """
  34. click, created = self.get_or_create(url=for_url,
  35. defaults={"click_count": increment_by})
  36. if not created:
  37. click.click_count += increment_by
  38. click.save()
  39. return click.click_count
  40. class Click(models.Model):
  41. url = models.URLField(_(u"URL"), verify_exists=False, unique=True)
  42. click_count = models.PositiveIntegerField(_(u"click_count"),
  43. default=0)
  44. objects = ClickManager()
  45. class Meta:
  46. verbose_name = _(u"URL clicks")
  47. verbose_name_plural = _(u"URL clicks")
  48. Using carrot to send clicks as messages
  49. ========================================
  50. The model is normal django stuff, nothing new there. But now we get on to
  51. the messaging. It has been a tradition for me to put the projects messaging
  52. related code in its own ``messaging.py`` module, and I will continue to do so
  53. here so maybe you can adopt this practice. In this module we have two
  54. functions:
  55. * ``send_increment_clicks``
  56. This function sends a simple message to the broker. The message body only
  57. contains the URL we want to increment as plain-text, so the exchange and
  58. routing key play a role here. We use an exchange called ``clicks``, with a
  59. routing key of ``increment_click``, so any consumer binding a queue to
  60. this exchange using this routing key will receive these messages.
  61. * ``process_clicks``
  62. This function processes all currently gathered clicks sent using
  63. ``send_increment_clicks``. Instead of issuing one database query for every
  64. click it processes all of the messages first, calculates the new click count
  65. and issues one update per URL. A message that has been received will not be
  66. deleted from the broker until it has been acknowledged by the receiver, so
  67. if the receiver dies in the middle of processing the message, it will be
  68. re-sent at a later point in time. This guarantees delivery and we respect
  69. this feature here by not acknowledging the message until the clicks has
  70. actually been written to disk.
  71. **Note**: This could probably be optimized further with
  72. some hand-written SQL, but it will do for now. Let's say it's an exercise
  73. left for the picky reader, albeit a discouraged one if you can survive
  74. without doing it.
  75. On to the code...
  76. *clickmuncher/messaging.py*:
  77. .. code-block:: python
  78. from celery.messaging import establish_connection
  79. from carrot.messaging import Publisher, Consumer
  80. from clickmuncher.models import Click
  81. def send_increment_clicks(for_url):
  82. """Send a message for incrementing the click count for an URL."""
  83. connection = establish_connection()
  84. publisher = Publisher(connection=connection,
  85. exchange="clicks",
  86. routing_key="increment_click",
  87. exchange_type="direct")
  88. publisher.send(for_url)
  89. publisher.close()
  90. connection.close()
  91. def process_clicks():
  92. """Process all currently gathered clicks by saving them to the
  93. database."""
  94. connection = establish_connection()
  95. consumer = Consumer(connection=connection,
  96. queue="clicks",
  97. exchange="clicks",
  98. routing_key="increment_click",
  99. exchange_type="direct")
  100. # First process the messages: save the number of clicks
  101. # for every URL.
  102. clicks_for_url = {}
  103. messages_for_url = {}
  104. for message in consumer.iterqueue():
  105. url = message.body
  106. clicks_for_url[url] = clicks_for_url.get(url, 0) + 1
  107. # We also need to keep the message objects so we can ack the
  108. # messages as processed when we are finished with them.
  109. if url in messages_for_url:
  110. messages_for_url[url].append(message)
  111. else:
  112. messages_for_url[url] = [message]
  113. # Then increment the clicks in the database so we only need
  114. # one UPDATE/INSERT for each URL.
  115. for url, click_count in clicks_for_urls.items():
  116. Click.objects.increment_clicks(url, click_count)
  117. # Now that the clicks has been registered for this URL we can
  118. # acknowledge the messages
  119. [message.ack() for message in messages_for_url[url]]
  120. consumer.close()
  121. connection.close()
  122. View and URLs
  123. =============
  124. This is also simple stuff, don't think I have to explain this code to you.
  125. The interface is as follows, if you have a link to http://google.com you
  126. would want to count the clicks for, you replace the URL with:
  127. http://mysite/clickmuncher/count/?u=http://google.com
  128. and the ``count`` view will send off an increment message and forward you to
  129. that site.
  130. *clickmuncher/views.py*:
  131. .. code-block:: python
  132. from django.http import HttpResponseRedirect
  133. from clickmuncher.messaging import send_increment_clicks
  134. def count(request):
  135. url = request.GET["u"]
  136. send_increment_clicks(url)
  137. return HttpResponseRedirect(url)
  138. *clickmuncher/urls.py*:
  139. .. code-block:: python
  140. from django.conf.urls.defaults import patterns, url
  141. from clickmuncher import views
  142. urlpatterns = patterns("",
  143. url(r'^$', views.count, name="clickmuncher-count"),
  144. )
  145. Creating the periodic task
  146. ==========================
  147. Processing the clicks every 30 minutes is easy using celery periodic tasks.
  148. *clickmuncher/tasks.py*:
  149. .. code-block:: python
  150. from celery.task import PeriodicTask
  151. from clickmuncher.messaging import process_clicks
  152. from datetime import timedelta
  153. class ProcessClicksTask(PeriodicTask):
  154. run_every = timedelta(minutes=30)
  155. def run(self, **kwargs):
  156. process_clicks()
  157. We subclass from :class:`celery.task.base.PeriodicTask`, set the ``run_every``
  158. attribute and in the body of the task just call the ``process_clicks``
  159. function we wrote earlier.
  160. Finishing
  161. =========
  162. There are still ways to improve this application. The URLs could be cleaned
  163. so the URL http://google.com and http://google.com/ is the same. Maybe it's
  164. even possible to update the click count using a single UPDATE query?
  165. If you have any questions regarding this tutorial, please send a mail to the
  166. mailing-list or come join us in the #celery IRC channel at Freenode:
  167. http://celeryq.org/introduction.html#getting-help