Site icon Tutorial

Upstream Caches

The Golden Rule for Django Developers is “Don’t prematurely optimize code.” But when a website goes live, user demand can outweigh site functionality, meaning slower page loads and a downgraded user experience. Code optimization is no longer a waste of time—it’s required.

There are several ways to optimize, but scaling out horizontally to deal with growing demand gets expensive. Luckily, there’s still a way to salvage your code without breaking the bank: You can cache.

Upstream caches are services your servers talk to that live further upstream, away from the user

Downstream caches are services that users talk to that live further downstream from your servers, toward the user

Upstream caching is dependent on a definition of its longevity. You can set a cache key to live for a fixed lifetime or your code can invalidate a cache key when it’s no longer useful, depending on the purpose of your cache. This approach requires far less code complexity for situations where slightly stale data, like 1-hour old data, and content are acceptable to your application’s business requirements.

Invalidated cache values make it so that, despite the time that has passed, when new, more relevant values come along, they’ll simply replace the old cache. You can ensure an always-fresh message board by using an invalidated cache setup, for example, but you should be aware that invalidating cache keys is a complex and highly application-specific solution.

There are good places in your code to use upstream cache and then there are times that you should definitely reconsider it. Before you cache, consider the following:

Using Upstream Cache

Not to use Upstream Cache

Downstream Caches

So far, this chapter has focused on caching your own data. But another type of caching is relevant to Web development, too: caching performed by downstream caches. These are systems that cache pages for users even before the request reaches your Web site. Here are a few examples of downstream caches:

Downstream caching is a nice efficiency boost, but there’s a danger to it: Many Web pages’ contents differ based on authentication and a host of other variables, and cache systems that blindly save pages based purely on URLs could expose incorrect or sensitive data to subsequent visitors to those pages.

For example, say you operate a Web email system, and the contents of the inbox page obviously depend on which user is logged in. If an ISP blindly cached your site, then the first user who logged in through that ISP would have their user-specific inbox page cached for subsequent visitors to the site. That’s not cool.

Fortunately, HTTP provides a solution to this problem. A number of HTTP headers exist to instruct downstream caches to differ their cache contents depending on designated variables, and to tell caching mechanisms not to cache particular pages. We’ll look at some of these headers in the sections that follow.
Using Vary Headers
The Vary header defines which request headers a cache mechanism should take into account when building its cache key. For example, if the contents of a Web page depend on a user’s language preference, the page is said to vary on language. By default, Django’s cache system creates its cache keys using the requested fully-qualified URL – e.g., http://www.example.com/stories/2005/?order_by=author.

This means every request to that URL will use the same cached version, regardless of user-agent differences such as cookies or language preferences. However, if this page produces different content based on some difference in request headers – such as a cookie, or a language, or a user-agent – you’ll need to use the Vary header to tell caching mechanisms that the page output depends on those things.

To do this in Django, use the convenient django.views.decorators.vary.vary_on_headers() view decorator, like so:

from django.views.decorators.vary import vary_on_headers

@vary_on_headers(‘User-Agent’)
def my_view(request):
# …

In this case, a caching mechanism (such as Django’s own cache middleware) will cache a separate version of the page for each unique user-agent. The advantage to using the vary_on_headers decorator rather than manually setting the Vary header (using something like “response[‘Vary’] = ‘user-agent’”) is that the decorator adds to the Vary header (which may already exist), rather than setting it from scratch and potentially overriding anything that was already in there. You can pass multiple headers to vary_on_headers():

@vary_on_headers(‘User-Agent’, ‘Cookie’)
def my_view(request):
# …

This tells downstream caches to vary on both, which means each combination of user-agent and cookie will get its own cache value. For example, a request with the user-agent “Mozilla” and the cookie value “foo=bar” will be considered different from a request with the user-agent “Mozilla” and the cookie value “foo=ham”. Because varying on cookie is so common, there’s a django.views.decorators.vary.vary_on_cookie() decorator. These two views are equivalent:

@vary_on_cookie
def my_view(request):
# …
@vary_on_headers(‘Cookie’)
def my_view(request):
# …

The headers you pass to vary_on_headers are not case sensitive; “User-Agent” is the same thing as “user-agent”. You can also use a helper function, django.utils.cache.patch_vary_headers(), directly. This function sets, or adds to, the Vary header. For example:

from django.utils.cache import patch_vary_headers

def my_view(request):
# …
response = render_to_response(‘template_name’, context)
patch_vary_headers(response, [‘Cookie’])
return response

patch_vary_headers takes an HttpResponse instance as its first argument and a list/tuple of case-insensitive header names as its second argument.

Controlling Cache: Using Other Headers

Other problems with caching are the privacy of data and the question of where data should be stored in a cascade of caches. A user usually faces two kinds of caches: their own browser cache (a private cache) and their provider’s cache (a public cache).

A public cache is used by multiple users and controlled by someone else. This poses problems with sensitive data – you don’t want, say, your bank account number stored in a public cache. So Web applications need a way to tell caches which data is private and which is public.

The solution is to indicate a page’s cache should be private. To do this in Django, use the cache_control view decorator. Example:

from django.views.decorators.cache import cache_control

@cache_control(private=True)
def my_view(request):
# …

This decorator takes care of sending out the appropriate HTTP header behind the scenes. Note that the cache control settings private and public are mutually exclusive. The decorator ensures that the public directive is removed if private should be set (and vice versa).

An example use of the two directives would be a blog site that offers both private and public entries. Public entries may be cached on any shared cache. The following code uses django.utils.cache.patch_cache_control(), the manual way to modify the cache control header (it is internally called by the cache_control decorator):

from django.views.decorators.cache import patch_cache_control
from django.views.decorators.vary import vary_on_cookie

@vary_on_cookie
def list_blog_entries_view(request):
if request.user.is_anonymous():
response = render_only_public_entries()
patch_cache_control(response, public=True)
else:
response = render_private_and_public_entries(request.user)
patch_cache_control(response, private=True)

return response

There are a few other ways to control cache parameters. For example, HTTP allows applications to do the following:

In Django, use the cache_control view decorator to specify these cache parameters. In this example, cache_control tells caches to revalidate the cache on every access and to store cached versions for, at most, 3,600 seconds:

from django.views.decorators.cache import cache_control

@cache_control(must_revalidate=True, max_age=3600)
def my_view(request):
# …

Any valid Cache-Control HTTP directive is valid in cache_control(). Here’s a full list:

Note that the caching middleware already sets the cache header’s max-age with the value of the CACHE_MIDDLEWARE_SECONDS setting. If you use a custom max_age in a cache_control decorator, the decorator will take precedence, and the header values will be merged correctly.

If you want to use headers to disable caching altogether, django.views.decorators.cache.never_cache is a view decorator that adds headers to ensure the response won’t be cached by browsers or other caches. For example:

from django.views.decorators.cache import never_cache

@never_cache
def myview(request):
# …

Back to Tutorial

Exit mobile version