clickcounter.rst 7.9 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240
  1. .. _tut-clickcounter:
  2. ============================================================
  3. Tutorial: Creating a click counter using Kombu and celery
  4. ============================================================
  5. .. contents::
  6. :local:
  7. Introduction
  8. ============
  9. A click counter should be easy, right? Just a simple view that increments
  10. a click in the DB and forwards you to the real destination.
  11. This would work well for most sites, but when traffic starts to increase,
  12. you are likely to bump into problems. One database write for every click is
  13. not good if you have millions of clicks a day.
  14. So what can you do? In this tutorial we will send the individual clicks as
  15. messages using ``kombu``, and then process them later with a ``celery``
  16. periodic task.
  17. Celery and Kombu is excellent in tandem, and while this might not be
  18. the perfect example, you'll at least see one example how of they can be used
  19. to solve a task.
  20. The model
  21. =========
  22. The model is simple, ``Click`` has the URL as primary key and a number of
  23. clicks for that URL. Its manager, ``ClickManager`` implements the
  24. ``increment_clicks`` method, which takes a URL and by how much to increment
  25. its count by.
  26. *clickmuncher/models.py*:
  27. .. code-block:: python
  28. from django.db import models
  29. from django.utils.translation import ugettext_lazy as _
  30. class ClickManager(models.Manager):
  31. def increment_clicks(self, for_url, increment_by=1):
  32. """Increment the click count for an URL.
  33. >>> Click.objects.increment_clicks("http://google.com", 10)
  34. """
  35. click, created = self.get_or_create(url=for_url,
  36. defaults={"click_count": increment_by})
  37. if not created:
  38. click.click_count += increment_by
  39. click.save()
  40. return click.click_count
  41. class Click(models.Model):
  42. url = models.URLField(_(u"URL"), verify_exists=False, unique=True)
  43. click_count = models.PositiveIntegerField(_(u"click_count"),
  44. default=0)
  45. objects = ClickManager()
  46. class Meta:
  47. verbose_name = _(u"URL clicks")
  48. verbose_name_plural = _(u"URL clicks")
  49. Using Kombu to send clicks as messages
  50. ========================================
  51. The model is normal django stuff, nothing new there. But now we get on to
  52. the messaging. It has been a tradition for me to put the projects messaging
  53. related code in its own ``messaging.py`` module, and I will continue to do so
  54. here so maybe you can adopt this practice. In this module we have two
  55. functions:
  56. * ``send_increment_clicks``
  57. This function sends a simple message to the broker. The message body only
  58. contains the URL we want to increment as plain-text, so the exchange and
  59. routing key play a role here. We use an exchange called ``clicks``, with a
  60. routing key of ``increment_click``, so any consumer binding a queue to
  61. this exchange using this routing key will receive these messages.
  62. * ``process_clicks``
  63. This function processes all currently gathered clicks sent using
  64. ``send_increment_clicks``. Instead of issuing one database query for every
  65. click it processes all of the messages first, calculates the new click count
  66. and issues one update per URL. A message that has been received will not be
  67. deleted from the broker until it has been acknowledged by the receiver, so
  68. if the receiver dies in the middle of processing the message, it will be
  69. re-sent at a later point in time. This guarantees delivery and we respect
  70. this feature here by not acknowledging the message until the clicks has
  71. actually been written to disk.
  72. **Note**: This could probably be optimized further with
  73. some hand-written SQL, but it will do for now. Let's say it's an exercise
  74. left for the picky reader, albeit a discouraged one if you can survive
  75. without doing it.
  76. On to the code...
  77. *clickmuncher/messaging.py*:
  78. .. code-block:: python
  79. from celery.messaging import establish_connection
  80. from kombu.compat import Publisher, Consumer
  81. from clickmuncher.models import Click
  82. def send_increment_clicks(for_url):
  83. """Send a message for incrementing the click count for an URL."""
  84. connection = establish_connection()
  85. publisher = Publisher(connection=connection,
  86. exchange="clicks",
  87. routing_key="increment_click",
  88. exchange_type="direct")
  89. publisher.send(for_url)
  90. publisher.close()
  91. connection.close()
  92. def process_clicks():
  93. """Process all currently gathered clicks by saving them to the
  94. database."""
  95. connection = establish_connection()
  96. consumer = Consumer(connection=connection,
  97. queue="clicks",
  98. exchange="clicks",
  99. routing_key="increment_click",
  100. exchange_type="direct")
  101. # First process the messages: save the number of clicks
  102. # for every URL.
  103. clicks_for_url = {}
  104. messages_for_url = {}
  105. for message in consumer.iterqueue():
  106. url = message.body
  107. clicks_for_url[url] = clicks_for_url.get(url, 0) + 1
  108. # We also need to keep the message objects so we can ack the
  109. # messages as processed when we are finished with them.
  110. if url in messages_for_url:
  111. messages_for_url[url].append(message)
  112. else:
  113. messages_for_url[url] = [message]
  114. # Then increment the clicks in the database so we only need
  115. # one UPDATE/INSERT for each URL.
  116. for url, click_count in clicks_for_urls.items():
  117. Click.objects.increment_clicks(url, click_count)
  118. # Now that the clicks has been registered for this URL we can
  119. # acknowledge the messages
  120. [message.ack() for message in messages_for_url[url]]
  121. consumer.close()
  122. connection.close()
  123. View and URLs
  124. =============
  125. This is also simple stuff, don't think I have to explain this code to you.
  126. The interface is as follows, if you have a link to http://google.com you
  127. would want to count the clicks for, you replace the URL with:
  128. http://mysite/clickmuncher/count/?u=http://google.com
  129. and the ``count`` view will send off an increment message and forward you to
  130. that site.
  131. *clickmuncher/views.py*:
  132. .. code-block:: python
  133. from django.http import HttpResponseRedirect
  134. from clickmuncher.messaging import send_increment_clicks
  135. def count(request):
  136. url = request.GET["u"]
  137. send_increment_clicks(url)
  138. return HttpResponseRedirect(url)
  139. *clickmuncher/urls.py*:
  140. .. code-block:: python
  141. from django.conf.urls.defaults import patterns, url
  142. from clickmuncher import views
  143. urlpatterns = patterns("",
  144. url(r'^$', views.count, name="clickmuncher-count"),
  145. )
  146. Creating the periodic task
  147. ==========================
  148. Processing the clicks every 30 minutes is easy using celery periodic tasks.
  149. *clickmuncher/tasks.py*:
  150. .. code-block:: python
  151. from celery.task import PeriodicTask
  152. from clickmuncher.messaging import process_clicks
  153. from datetime import timedelta
  154. class ProcessClicksTask(PeriodicTask):
  155. run_every = timedelta(minutes=30)
  156. def run(self, **kwargs):
  157. process_clicks()
  158. We subclass from :class:`celery.task.base.PeriodicTask`, set the ``run_every``
  159. attribute and in the body of the task just call the ``process_clicks``
  160. function we wrote earlier.
  161. Finishing
  162. =========
  163. There are still ways to improve this application. The URLs could be cleaned
  164. so the URL http://google.com and http://google.com/ is the same. Maybe it's
  165. even possible to update the click count using a single UPDATE query?
  166. If you have any questions regarding this tutorial, please send a mail to the
  167. mailing-list or come join us in the #celery IRC channel at Freenode:
  168. http://celeryq.org/introduction.html#getting-help