Docker in production for the average DevOps

By Martin Rusev read

I have been using Docker since August 2013. For the most part - as a faster Vagrant, occasionally for running test suites on multiple distros and for third-party apps with complicated deployment - Kibana, Elasticsearch, Ghost, etc.

Until recently I haven't considered Docker seriously for deploying my own apps.

As someone who has automated every step of the provisioning and deployment process, I could not see the value in docker build, docker push docker pull vs fab deploy.

On top of that Docker is a really complicated piece of software in production. It is easy to get started on your dev machine, but once you cross the line between dev and production, you can immediately see the big gap in operational knowledge.

We have the big enterprise scale projects for running clusters of containers - Kubernetes, Docker Swarm, Apache Mesos, Marathon(built on top of Mesos), CoreOS, Deis(which runs on top of CoreOS), RancherOS, etc.

I spent some time experimenting with each one of those projects and the learning curve is very steep, even for an experienced DevOps.

I am yet to found a project that is making Docker easy to work with on a small-medium scale - a handful of servers and 10-30 containers. In this case you have to built everything by yourself with existing tools.

Photo by John Lester / CC BY | Photo by solarnu / CC BY-ND 2.0

Why would you use Docker?

Is it worth the trouble? Sure, if you are Google or some large corporation, containers makes a lot of sense. What about the general web population?

Reason 1 - Isolation

The majority of web apps out there are written in a scripting language - Python, Ruby, PHP, Javascript, etc. At this point, all these languages have good package managers. Pip, Bundler, npm, composer are stable and reliable, you can easily specify and install the exact versions of the libraries your app needs.

These packages still depend on the underlying distro and system libraries. An app could work on one system, but not on another. Sometimes it could stop working on the same machine with dist upgrade or even apt-get update. Docker solves that.

Reason 2 - Rollbacks

With Docker you always know the exact version and state of your application at any given time.

Switching between different environments like beta, staging and production or different revisions is a single line line of code.

Reason 3 - Third-party apps

There are some great self-hosted apps out there like Sentry, Kibana, Grafana, Ghost, etc that have complicated setup. Things could be even more difficult, if they are written in an unfamiliar language - you are a JS programmer and want to setup a PHP app.

Running/updating is always 1-2 lines of code. A word of caution - keep your database on the server itself. It is much easier to manage and debug that way.

Building containers

Once you decide to use Docker in production and you start pushing containers across the network - image size becomes a serious issue. In the beginning I was using the phusion/baseimage with everything, except the database bundled in the same container. For a "Hello World" Django or Rails application you will end up with a 700MB container.

Pushing a 700MB container to the Docker Hub and then pulling it on your servers takes a lot of time, regardless of your internet connection.

You could deploy faster by creating an image with a tag yourwebapp:latest and build your images on top of it:

FROM yourwebapp:latest
docker build - t yourwebapp:latest

In this case Docker will push/pull only the latest layer. This is exactly what I did in the beginning and you see these tags all over Docker Hub. That does not mean it is a good idea for web apps.

Docker has a limitation of 127 layers per image. This is not a well known fact and in most cases you will learn about it when you hit the limit. At that point you will start looking for third-party tools to "compress" your image and disregard layers. These tools and hacks might work or could break everything for you.

There are two major issues with layers and tagging your apps with yourwebapp:latest:

One way to solve this issue is having a base image and building all the images from this container with a github commit or human readable version for a tag.

This brings us back to image size.

Multiple processes per container

Deploying a monolithic Django/Rails/Nodejs app with everything running in a single container seems like a great idea at first. It is nice and isolated - you can push a container and deploy your app.

In practice - running multiple processes in a single container means multiple points of failure. Parts of your app could stop working and the container will continue running.

In a regular VM(most of our cloud servers are VMs), you can see what is happening by running top/htop or checking the log files. These trivial tasks are not possible or difficult in Docker.

A single process in a container will make your Docker experience so much better:

Automating builds

To build a container, you have to create a Dockerfile and then run docker build in the same directory.

The Dockerfile is a static entity - you can't set variables and it can't deal with anything outside the current directory. These 2 limitations make working with Docker really annoying.

To work around these issues, you can use two tools that have been around for a while - make and bash. Bash can get any variable from the distro or the environment. Make could be a one-liner for building and pushing: make build_container, make deploy

Deploying & Running containers

You have successfully packaged your app in a container. Now it is finally time to deploy it on your servers.

The good news is that every major provisioning tool has a docker module. I've tried Salt, Ansible and Puppet. The bad news is - they are not always up-to-date with the fast changing Docker API.

To solve this, I found that it is much easier to interact with the Docker daemon directly from a deployment tool like Fabric or Capistrano. This could have one more additional benefit for you - securing the Docker remote API is difficult, with Fabric you can ignore that hurdle.

A word on databases. Even for third-party apps, I still keep my db on the host. A db in the container means you have to deal with the 3 least pleasant parts of Docker - linking, networking and data. On top of that - you don't get any of the benefits. The databases are already isolated from the system, they run just fine across distros and a rollback is a data/schema migration, not a version/state issue.

Redis and Memcached are fine, but for a regular DB like PostgreSQL, MySQL and even MongoDB - I would love to be proven wrong and read a detailed analysis along the lines: "How we scaled to millions of users by running PostgreSQL in Docker and it was not was not possible to do so on a bare metal server"

Some of the examples I gave in this post might not work for you. It does not matter. The point I tried to make is - the tooling in Docker has much to be desired. It might improve at some point in the future. You don't have to wait - almost any limitation could be bypassed by using a well established tool - from Bash and Make to your favorite deployment and provisioning tools.