Efficiently Serving CouchDB Attachments with Rails and nginx

I’m working on a project built in Rails where the data storage is being handled by CouchDB instead of a conventional SQL database. Thanks to Ruby’s dynamic nature, we can disable ActiveRecord and the rest of Rails is just fine seeing subclasses of CouchRest’s ExtendedDocument instead of ActiveRecord::Base.

One of the nice things about CouchDB, and one which is quite helpful for our project, is its ability to store file attachments in the database. This is less effort than mucking about keeping files in the filesystem. Unlike BLOB columns in SQL, the file attachments are not given to the application unless specifically requested, which reduces the odds of unnecessarily transferring a lot of ignored data from the database.

Serving Attachments

However, there is one important difficulty with storing these attachments in the database: serving them back to clients. With attachments stored in the filesystem, serving files is relatively easy: just point your front-end webserver at the files and give your users a link to click on or an <img/> tag.

Things are slightly more complicated in the case where the attachments are stored in CouchDB.

CouchDB HTTP

Since CouchDB speaks HTTP, one simple option would be to just provide the user with the URLs for the attachments:

   1  <a href="http://my.couchdb.server:5984/database/ff1ec7a4d174d60710a95e20294dec4c/file.pdf">Download PDF</a>

However, this would require us to open our CouchDB server up to the public internet. We would then either have a huge security hole allowing anyone to access the database directly or we would need to very carefully configure CouchDB permissions. Even if we did so, we’d lack the ability to effectively do per-object access control inside our application. So this just seems like the wrong approach to me: I’d like to firewall off CouchDB so the public internet can not get to it at all.

So, we need another option.

Rails send_data

Perhaps the next most straightforward option is to download the attachment in our Rails application and then send it along to the client:

   1  def serve_attachment_with_send_data(model, filename)
   2    metadata = model["_attachments"][filename]
   3    data = model.fetch_attachment(filename)
   4    send_data(data, {
   5      :filename    => filename,
   6      :type        => metadata['content_type'],
   7      :disposition => "inline"
   8    })
   9  end

This works well enough and allows us to do any sort of access control before serving the file, but it can have performance problems. First of all, a Rails application server process will be tied up downloading the attachment from CouchDB and sending it along. Additionally, the entire attachment will be read into memory. Both of these can present performance issues, especially with large attachments and/or heavily-trafficked sites.

X-Accel-Redirect

If we’re using nginx in front of our Rails processes, then we can take advantage of the X-Accel-Redirect functionality. This functionality allows our Rails application to direct nginx to grab a file and send it to the client, while the Rails process itself goes away and is available to do something else.

(Note: it’s possible that this approach could be instead built upon lighttpd’s similar X-Sendfile mechanism, but I haven’t looked into it.)

Normally, X-Accel-Redirect is used with files on disk, but nginx is sufficiently flexible that it supports proxying files from another upstream HTTP server—CouchDB, in our case—instead.

The first step in the process is to configure nginx to proxy certain requests to CouchDB. Here’s a snippet of our nginx.conf:

   1      upstream thin {
   2          server 127.0.0.1:9000;
   3          server 127.0.0.1:9001;
   4          server 127.0.0.1:9002;
   5      }
   6  
   7      server {
   8          listen       80;
   9          server_name  our.site.url;
  10  
  11          root /var/www/oursite/current/public;
  12  
  13          location / {
  14              proxy_set_header  X-Real-IP  $remote_addr;
  15              proxy_set_header  X-Forwarded-For $proxy_add_x_forwarded_for;
  16              proxy_set_header  Host $http_host;
  17              proxy_redirect    false;
  18              if (!-f $request_filename) {
  19                  proxy_pass http://thin;
  20                  break;
  21              }
  22          }
  23          
  24          location /couch/ {
  25              internal;
  26  
  27              proxy_set_header  X-Real-IP  $remote_addr;
  28              proxy_set_header  X-Forwarded-For $proxy_add_x_forwarded_for;
  29              proxy_pass http://localhost:5984/oursite-production/;
  30          }
  31      }

As you can see, most of the above configuration is pretty standard nginx-proxying-rails stuff. The CouchDB-specific part is the location /couch/ stanza at the bottom. One key thing to note is the internal; directive, which tells nginx to disallow direct requests for URLs starting with /couch/; if we did not include this directive, anyone with knowledge of our configuration would be able to directly access CouchDB, which would of course be a problem.

Next, we need to write some Rails code to direct nginx to serve files from CouchDB:

   1  def serve_attachment_with_nginx(model, filename)
   2    response.headers['Content-Type'] = model["_attachments"][filename]['content_type']
   3    response.headers['Content-Disposition'] = "inline; filename=#{filename}"
   4    response.headers['X-Accel-Redirect'] = "/couch/#{model.id}/#{filename}"
   5    render :nothing => true
   6  end

Note that we can, of course, do permissions checking or anything else before calling the serve_attachment method.

Tying it all together

There is only one problem with the above approach: it won’t work in development mode, when nginx is typically not in front of our Rails application servers.

So let’s create a generic serve_attachment method which can be used in either development or production mode:

   1  def serve_attachment(model, filename)
   2    case RAILS_ENV
   3    when 'development'
   4      redirect_to model.attachment_url(filename)
   5    when 'production'
   6      response.headers['Content-Type'] = model["_attachments"][filename]['content_type']
   7      response.headers['Content-Disposition'] = "inline; filename=#{filename}"
   8      response.headers['X-Accel-Redirect'] = "/couch/#{model.id}/#{filename}"
   9      render :nothing => true
  10    else
  11      metadata = model["_attachments"][filename]
  12      data = model.fetch_attachment(filename)
  13      send_data(data, {
  14        :filename    => filename,
  15        :type        => metadata['content_type'],
  16        :disposition => "inline"
  17      })
  18    end
  19  end

I stuck this method inside my ApplicationController, but you can put it somewhere else if you prefer.

As written, this method will issue an HTTP redirect directly to the CouchDB server in development mode, on the assumption that anyone looking at the development site will also have direct HTTP access to the development CouchDB server. If this is not the case, you can use the send_data method in development mode as well by removing the 'development' stanza from the case block.

So, we’ve now built a simple method which allows us to store attachments in the CouchDB database and serve them to the client without unnecessarily tying up any ruby resources while doing so. Additionally, we can still perform security, logging, or other checks in ruby before sending the attachment along.

Download

If you want to use this approach in your own code, you may be interested in downloading the sample nginx.conf snippet and the serve_attachment method.


blog comments powered by Disqus