How to properly stream audio from your Plone + Varnish site

published Jul 17, 2012, last modified Jun 26, 2013

There are many quirks and oddities about serving audio properly. Here's how to tackle them from Plone through Varnish.

Use Plone 4 or newer

Do not use earlier versions.  Earlier versions do not have content types with BlobStorage support, so big audio files will make your Zope process balloon in memory, thrash your ZODB caches, and make your machine swap.  With BlobStorage, the files are read and streamed directly from the disk rather than loaded as enormous Python objects in RAM, so your site serves requests that much faster.

If you have old content from Plone 3, you may want to migrate it to BlobStorage.

Use Varnish 3

Varnish 3 has Range request support.  Enough said.  This means (provided a few other technical details) that clients can seek in audio files by approximating what range they need to request, and doing so, rather than having to wait until the entire file is downloaded.

Do cache the audio files in Varnish

In older versions of Varnish that did not have proper Vary and Range request support, the advice was to pipe the client into Plone.  This had the drawback of tying up one Zope thread per client for as long as the client was listening to the audio file.  For a hundred clients tying up your Zope server, that meant provisioning a hundred and one Zope threads (with all the terrible problems this caused) or simply a site completely unavailable for people browsing it.

In contrast, Varnish 3 has no problem handling 100 clients at the same time.

So do tell Varnish to cache the audio files for a long time.  On the Plone side, there's nothing special you need to do.  On the Varnish side, here's how you might go about that:

sub vcl_fetch {
       if (beresp.http.Content-Type ~ "audio/") {
               set beresp.ttl = 3600s;
       }
}

Combined with plone.app.caching configured to do purging (outside the scope of this tip grab bag, but available in the Varnish documentation), you should experience no issues with stale data in the cache whatsoever.

And, combined with ZEO and the Varnish feature of selecting different directors and backends upon vcl_recv, you can have a Zope server with few threads, dedicated exclusively for the purpose of filling up the Varnish cache with the big audio files, while you dedicate Zope servers serve regular text browser traffic.  But we're aiming for simplicity here, so I won't touch on that subject.

Not so big files, please

Before Varnish can start serving the audio files to the client, it needs to seed its cache.  This process can take around one second for each 20 megabytes of data served by Plone.

A two-hour radio show recorded in MP3 (mono 64kbps) is around 50 MB.  Do the math.

Use constant bit rate encoding

Players that can seek in a stream can only do so if you either serve them through RTMP, or serve them through HTTP but at a constant bit rate.  In that case, they fall back to using Range requests to fetch pieces of the file they need to buffer and start playback from a particular position in time.

Use HTML5

HTML5 audio is compatible with almost every single modern mobile browser.  It is also part of the HTML standard.  Furthermore, it's really easy to deploy it, requiring no Flash components on your site.  You may, however, want to provide a Flash-based fallback for browsers that aren't yet capable of HTML5 audio, or can't play the audio format you're using (most likely MP3, which is sadly the most compatible format throughout the currently installed base).

Do not gzip audio files

If you have enabled gzip compression in Plone, or auto-gzip in Varnish, ensure you disable it with the following VCL trick:

sub donotgzipaudio {
   if (bereq.url ~ "\.mp3($|?)") {
       set bereq.http.Accept-Encoding = "identity";
   }
}
sub vcl_miss { call donogtzipaudio; }
sub vcl_pass { call donotgzipaudio; }

All that means is Varnish will, before making the request to the backend (whether for pass or for miss), override any preference for compression that the client may have sent.

Also make sure that you are not doing beresp.use_gzip = true in vcl_fetch.

Avoid double caching of (big) audio files

The default behavior for the File content type in Plone is a bit of a drag.  The download link in the /view view (the link in item listings) appends a hideous /at_download/file that will cause double caching of the audio files.  To suppress it, tell Varnish to do the following:

sub vcl_recv {
    if (req.url ~ "/at_download/file($|?)") {
         set req.http.Location = regsub(req.url, "(.*)/at_download/file($|?)","\1\2");
         error 750 "Redirecting you to the canonical representation...";
     }
}

sub vcl_error {
    if (obj.status == 750) {
        set obj.http.Location = req.http.Location;
        set obj.status = 301;
        return (deliver);
    }
}

That will ensure that all download requests will redirect to the right download URL (which is the absolute URL to the file).

I will leave the job of supplementing that rule with the ability to eliminate spurioius query strings from file URLs as an exercise to the reader

Set the right Content-Disposition for streaming

Again, the default behavior of the File content type is to prompt users to download the file when they click on the download link (or access the absolute URL).  Plone accomplishes this by setting a Content-Disposition: attachment option in the headers where it informs the client of the file name it should save the file as.  This is a bit of a drag as browsers that support direct playback of audio files in-browser won't show the nice embedded audio player, and it isn't much of a win for the user since he can always right-click and download the file.

So we're going to surgically neuter that, right before we store the retrieved file in the cache:

sub vcl_fetch {
    if (beresp.http.Content-Type ~ "audio/") {
        set beresp.http.Content-Disposition = regsub(beresp.http.Content-Disposition, "attachment; ", "inline; ");
    }
}

Enable direct streaming for Android 2.3 and older devices

The Content-Disposition: inline header causes older Android devices to act funky -- if it is present at all, tapping on the file link does nothing at all.  To prevent this from happening, we are going to surgically alter the response headers on-the-fly, without altering the cached object in the proxy cache.  This is how:

sub vcl_deliver {
  if (resp.http.Content-Type ~ "audio/"
      && req.http.User-Agent ~ "Android (2.1|2.2|2.3)")
    {
          remove resp.http.Content-Disposition;
    }
}

Note how we alter the headers during the deliver phase rather than during the request or the backend response processing phases. This is what you want in this case.