Upstream Caches

The Golden Rule for Django Developers is “Don’t prematurely optimize code.” But when a website goes live, user demand can outweigh site functionality, meaning slower page loads and a downgraded user experience. Code optimization is no longer a waste of time—it’s required.

There are several ways to optimize, but scaling out horizontally to deal with growing demand gets expensive. Luckily, there’s still a way to salvage your code without breaking the bank: You can cache.

Upstream caches are services your servers talk to that live further upstream, away from the user

These caches can include computed data, content fragments, and syndicated content
Examples: Memcache and Redis

Downstream caches are services that users talk to that live further downstream from your servers, toward the user

These caches are at the page-level of a website
Examples: Cloudfront and other CDNs, or a Squid or Varnish proxy serverSo upstream cache’s purpose is to save the results of expensive calculations so that they don’t need to be done again only to provide the same result. To be clear, this isn’t denormalization but rather an entirely temporary hub of data.

Upstream caching is dependent on a definition of its longevity. You can set a cache key to live for a fixed lifetime or your code can invalidate a cache key when it’s no longer useful, depending on the purpose of your cache. This approach requires far less code complexity for situations where slightly stale data, like 1-hour old data, and content are acceptable to your application’s business requirements.

Invalidated cache values make it so that, despite the time that has passed, when new, more relevant values come along, they’ll simply replace the old cache. You can ensure an always-fresh message board by using an invalidated cache setup, for example, but you should be aware that invalidating cache keys is a complex and highly application-specific solution.

There are good places in your code to use upstream cache and then there are times that you should definitely reconsider it. Before you cache, consider the following:

Using Upstream Cache

Properties are good for caching. In Python, these are denoted with object methods decorated with the @property decorator. Since properties take no arguments yet perform calculations, per object caching of property values often saves you from needlessly repetitive calculations.
Compiled things, like templates, are good for caching because they can be pickled and serialized to avoid redundant computation steps.
Generated things, like video clips and thumbnails, are good candidates because, unless the original source has changed, there’s no need to generate them more than once.
Aggregated things, like comment counts on a blog post, are also good things to cache because you’ll avoid unnecessary iteration and counting.

Not to use Upstream Cache

Rarely accessed or highly volatile things are bad caching candidates because you’ll waste space or you’ll cache something that’s probably going to change before you need it again.
Computed values that need to be durably stored are bad to cache, too, because cache is temporary by nature and things that you need permanently will be safer living in your database.

Downstream Caches

So far, this chapter has focused on caching your own data. But another type of caching is relevant to Web development, too: caching performed by downstream caches. These are systems that cache pages for users even before the request reaches your Web site. Here are a few examples of downstream caches:

Your ISP may cache certain pages, so if you requested a page from http://example.com/,
your ISP would send you the page without having to access example.com directly. The maintainers of example.com have no knowledge of this caching; the ISP sits between example.com and your Web browser, handling all of the caching transparently.
Your Django Web site may sit behind a proxy cache, such as Squid Web Proxy Cache, that caches pages for performance. In this case, each request first would be handled by the proxy, and it would be passed to your application only if needed.
Your Web browser caches pages, too. If a Web page sends out the appropriate headers, your browser will use the local cached copy for subsequent requests to that page, without even contacting the Web page again to see whether it has changed.

Downstream caching is a nice efficiency boost, but there’s a danger to it: Many Web pages’ contents differ based on authentication and a host of other variables, and cache systems that blindly save pages based purely on URLs could expose incorrect or sensitive data to subsequent visitors to those pages.

For example, say you operate a Web email system, and the contents of the inbox page obviously depend on which user is logged in. If an ISP blindly cached your site, then the first user who logged in through that ISP would have their user-specific inbox page cached for subsequent visitors to the site. That’s not cool.

Fortunately, HTTP provides a solution to this problem. A number of HTTP headers exist to instruct downstream caches to differ their cache contents depending on designated variables, and to tell caching mechanisms not to cache particular pages. We’ll look at some of these headers in the sections that follow.
Using Vary Headers
The Vary header defines which request headers a cache mechanism should take into account when building its cache key. For example, if the contents of a Web page depend on a user’s language preference, the page is said to vary on language. By default, Django’s cache system creates its cache keys using the requested fully-qualified URL – e.g., http://www.example.com/stories/2005/?order_by=author.

This means every request to that URL will use the same cached version, regardless of user-agent differences such as cookies or language preferences. However, if this page produces different content based on some difference in request headers – such as a cookie, or a language, or a user-agent – you’ll need to use the Vary header to tell caching mechanisms that the page output depends on those things.

To do this in Django, use the convenient django.views.decorators.vary.vary_on_headers() view decorator, like so:

from django.views.decorators.vary import vary_on_headers

@vary_on_headers(‘User-Agent’)
def my_view(request):
# …

In this case, a caching mechanism (such as Django’s own cache middleware) will cache a separate version of the page for each unique user-agent. The advantage to using the vary_on_headers decorator rather than manually setting the Vary header (using something like “response[‘Vary’] = ‘user-agent’”) is that the decorator adds to the Vary header (which may already exist), rather than setting it from scratch and potentially overriding anything that was already in there. You can pass multiple headers to vary_on_headers():

@vary_on_headers(‘User-Agent’, ‘Cookie’)
def my_view(request):
# …

This tells downstream caches to vary on both, which means each combination of user-agent and cookie will get its own cache value. For example, a request with the user-agent “Mozilla” and the cookie value “foo=bar” will be considered different from a request with the user-agent “Mozilla” and the cookie value “foo=ham”. Because varying on cookie is so common, there’s a django.views.decorators.vary.vary_on_cookie() decorator. These two views are equivalent:

@vary_on_cookie
def my_view(request):
# …
@vary_on_headers(‘Cookie’)
def my_view(request):
# …

The headers you pass to vary_on_headers are not case sensitive; “User-Agent” is the same thing as “user-agent”. You can also use a helper function, django.utils.cache.patch_vary_headers(), directly. This function sets, or adds to, the Vary header. For example:

from django.utils.cache import patch_vary_headers

def my_view(request):
# …
response = render_to_response(‘template_name’, context)
patch_vary_headers(response, [‘Cookie’])
return response

patch_vary_headers takes an HttpResponse instance as its first argument and a list/tuple of case-insensitive header names as its second argument.

Controlling Cache: Using Other Headers

Other problems with caching are the privacy of data and the question of where data should be stored in a cascade of caches. A user usually faces two kinds of caches: their own browser cache (a private cache) and their provider’s cache (a public cache).

A public cache is used by multiple users and controlled by someone else. This poses problems with sensitive data – you don’t want, say, your bank account number stored in a public cache. So Web applications need a way to tell caches which data is private and which is public.

The solution is to indicate a page’s cache should be private. To do this in Django, use the cache_control view decorator. Example:

from django.views.decorators.cache import cache_control

@cache_control(private=True)
def my_view(request):
# …

This decorator takes care of sending out the appropriate HTTP header behind the scenes. Note that the cache control settings private and public are mutually exclusive. The decorator ensures that the public directive is removed if private should be set (and vice versa).

An example use of the two directives would be a blog site that offers both private and public entries. Public entries may be cached on any shared cache. The following code uses django.utils.cache.patch_cache_control(), the manual way to modify the cache control header (it is internally called by the cache_control decorator):

from django.views.decorators.cache import patch_cache_control
from django.views.decorators.vary import vary_on_cookie

@vary_on_cookie
def list_blog_entries_view(request):
if request.user.is_anonymous():
response = render_only_public_entries()
patch_cache_control(response, public=True)
else:
response = render_private_and_public_entries(request.user)
patch_cache_control(response, private=True)

return response

There are a few other ways to control cache parameters. For example, HTTP allows applications to do the following:

Define the maximum time a page should be cached.
Specify whether a cache should always check for newer versions, only delivering the cached content when there are no changes. (Some caches might deliver cached content even if the server page changed,simply because the cache copy isn’t yet expired.)

In Django, use the cache_control view decorator to specify these cache parameters. In this example, cache_control tells caches to revalidate the cache on every access and to store cached versions for, at most, 3,600 seconds:

from django.views.decorators.cache import cache_control

@cache_control(must_revalidate=True, max_age=3600)
def my_view(request):
# …

Any valid Cache-Control HTTP directive is valid in cache_control(). Here’s a full list:

public=True
private=True
no_cache=True
no_transform=True
must_revalidate=True
proxy_revalidate=True
max_age=num_seconds
s_maxage=num_seconds

Note that the caching middleware already sets the cache header’s max-age with the value of the CACHE_MIDDLEWARE_SECONDS setting. If you use a custom max_age in a cache_control decorator, the decorator will take precedence, and the header values will be merged correctly.

If you want to use headers to disable caching altogether, django.views.decorators.cache.never_cache is a view decorator that adds headers to ensure the response won’t be cached by browsers or other caches. For example:

from django.views.decorators.cache import never_cache

@never_cache
def myview(request):
# …

Back to Tutorial