developing in docker

I've been job hunting recently and in keeping with tradition that means I've been working on some coding homework assignments. For one company which I was particularly hoping to impress I got a bit showy and put together a nice containerized environment to work in. I learned most of the techniques for this approach working with a large dev team who built extensive tooling around their containerized dev environment to support over a dozen custom apps and at least as many supporting service containers.

In the weeks since submitting this project I've had a couple friends mention that they would like to learn more about working in docker so I've extracted the good bits and put them in a public repo. Here I'll describe how it works and how to use it.

The Flask App and Configuration

I reached for Flask to build the web app as that's my go to framework and I wanted to build a strong submission, go with what you know as they say. I have replaced the business logic from the actual assignment so as not to provide reference material for future applicants but the structure is the same.

If you're looking to learn Flask there are better projects out there. I recommend Overholt an oldie but a goody. Cookiecutter Flask which is more geared towards full web apps than APIs. Or my favourite Flusk, a clean, fairly modern, and well organized Flask boilerplate.

The only thing worth mentioning with respect to docker is the way configuration is handled. There is a bit of extra logic in there to deal with the MySQL replicas (more on that below) but basically it just grabs the value of any environment variables prefixed with HELLO_ and hangs them off the Flask config object. I took this approach because environment variable injection is the baseline approach for passing config into a running container both in docker-compose and practically every container orchestration system. This gives us an easy on ramp to move from dev to prod.

If you find yourself baking config files into your containers, or having some script in your container fetch a config file from somewhere you're gonna have a bad time. In that case just go straight to making your app Consul or etcd aware and be done with it.

To make use of the read-only replica I used this flask-replicated extension which is a bit naive in that it uses the HTTP method rather than the database operation to decide which database to execute the query on. For example if you had some user.last_accessed_on datetime field that got updated on every page view this wouldn't cut it but it gets the job done for this simple app.

The Dockerfile

One thing of note regarding the dockerfile is the separation of the requirements.txt file (the python version of a Gemfile or a package.json file) from the rest of the app in terms of layers. Since each ADD statement creates a new layer in the image and minimizing the number of layers is best practice this may seem counter intuitive.

The idea here is to speed up build times. The build process will only rebuild those layers which have been modified since the last build, however it must then rebuild any layers built upon the modified layer. By placing the requirements.txt layer above the layer for the rest of the app code we ensure that rebuilding that layer (and the subsequent apt-get install ... pip install ... layer) only happens when the requirements change. Without this separation every single line of code we changed would mean a tedious rebuild of those layers.

The good news is that you don't need to rebuild the container every time you change the code though, next we'll look at how to hack in this environment.

The docker-compose and override files

This is where much of the development magic happens, using docker-compose we can stand up all the dependencies our app(s) rely on. In this case two MySQL containers in a master/replica configuration (courtesy of Tao Wang) and a memcached container.

There are a few things worth highlighting here. First is the use of healthcheck and restart directives. These will both let you know if your services become unreachable for some reason and try to restart them in case they stop. Useful in dev when trying weird stuff is a common occurrence.

Next and more important is the use of explicit app level configuration for connecting to the services in the supporting containers, in this specific case by providing environment variables for memcached host and MySQL URI strings but this could be any app level config.

When linking containers via the docker-compose depends-on mechanism the Hello app could simply default to looking for the hostname master or memcached which would resolve to the correct container. However the pattern of using code level dev-default values, be they service dependencies or feature flags or basically anything that might be different in production, creates a minefield of unknown unknowns when it comes time to ship your containers.

By explicitly specifying these configurations during development we have a roadmap to follow when we deploy to Kubernetes or ECS or whatever else down the road. Believe me when I say that reverse engineering this sort of config without a guide sucks.

Finally we should look at the docker-compose override file. By default docker-compose will parse the main docker-compose.yml file and then update the config it finds there with any additions or changes it finds in docker-compose.override.yml. We can leverage this mechanism to provide a nice developer experience by setting up the primary docker-compose file with the assumption that every container in the stack will behave normally (that is start running the app it hosts) when it comes up. Then we can use the override file to knock out any container we care to hack on by replacing the command directive so that rather than running the app it just keeps the container running indefinitely, and adding a volume directive so that rather than using the source baked into the container it reads our local copy on the host OS so we can hack using our preferred editor.

This turns the container into our dev system, hooked to all the dependency containers, isolated from our host OS, and fully loaded with all the libraries our app depends on.

We can then hop into the running container to interact with our code as we update it by using the docker exec -it <container_id> /bin/bash command. Or in this case we can use the make target for just this purpose and instead run: make dev.

The Makefile

This provides a lot of convenience tools for interacting with the dev environment. This could be done with any similar tool like rake or grunt or yarn or whatever the cool kids are using now.

Some useful patterns are things like the DB migration target make setup-db. This starts up the db containers then manually runs the app container with the necessary parameters to link with the databases and execute the initial migrations. This could be done with yet another docker-compose override file but those grow numerous quite quickly. Note that this pattern is the reason the docker network is created externally (by make setup) rather than implicitly by docker-compose, so that we can link stand alone containers to those running in docker-compose.

Other handy dev targets are make testdata for generating canned API calls to our app and make nuke to completely blow the environment away when we inevitably screw it all up.

Final Thoughts

As usual this is mostly an exercise in capturing my thoughts for future reference but hopefully someone besides future me will find this helpful. I intend to use this repo as boilerplate for new projects so it should see at least a bit of upkeep here and there as I hack on stuff. I have also been tinkering with redeploying some of my personal projects in containers so I will likely have a follow up post sooner or later about the trip from dev to prod.

Also, I got the job so I guess I must have done something right!