Nonuby's developer blog

Full stack polygot developer (js/python/c#/bash/clojure), working on a heroku clone in freetime

What Happens When You Push to Heroku?

| Comments

kindof, this is what happens when you push to openruko, an indevelopment MIT licensed open-source heroku implementation.

This blog post assumes some familiarity with git, ssh, linux, and ideally some previous experience with heroku or other cloud deployment platforms such as cloudfoundry or dotcloud, or at least the Twelve-Factor app methodology

When you git push to openruko, openruko’s git pipeline takes a number of interesting steps in preparing your code for execution on the openruko dyno infrastructure, in this blog post we’ll go through the process up to sending your code on the dyno infrastructure. I had to start blogging this now so I publicly commit to finishing it.

Getting started

Openruko relies on heroku client tools for management operations. Heroku command line interface uses HEROKU_HOST environment variable to know which API server to talk to. Setting this to tells heroku to talk to instead. We neatly wrap this functionality in a bash script somewhere on our path.

# save as 'openruko' somewhere in your PATH, and give +x permissions
export HEROKU_HOST=”” # if no protocol it prepends https://api.
export HEROKU_SSL_VERIFY=”disable” # I dont have a trusted cert yet
exec heroku $*

When you first login to openruko using openruko login, the heroku client tool uploads your public key to the openruko server. You can verify the keys stored, and remove and make additions using openruko keys:{list,add,remove} from the command line.

When you create a openruko application on top of a local git repository the client tools also add the correct remote target to your git repo configuration. After running openruko create you can see the openruko remote has been added with git remote -v.

When you git push to openruko, openruko’s ssh/git stack first has to authenticate you, when git connects to the remote repo it does this over ssh implicitly, to transport the git payload over the network it invokes something similar to this:

ssh git-receive-pack 'my-app.git'

Determining the real user when everyone connects with git@

Openruko uses an ssh stack that relies on public key rather than password authentication, it matches your initial authentication payload encrypted with your local private key with your corresponding public key entry in authorized_keys. This is a linear scan but extremely fast with less than 1,000 entries, it is understood that both Heroku and Github have optimized this part with a precomputed database lookup instead (how they do this I admit I dont yet understand, crypto newbie).

The matched public key entry contains a command directive (see man sshd) that overrides any command provided by the client, we use this to enforce all requests are dispatched to our git pipeline, the original command is available in the environment variable SSH_ORIGINAL_COMMAND.

command=”openruko-serve matt” ssh-rsa AAAAjfdsdfycdfjdfdse63qP= matt@laptop
command=”openruko-serve davesmith” ssh-rsa AAAAsd343xfdvvse63qb= dave@window7
command=”openruko-serve freddoe” ssh-rsa AAAAjfdoieorwpe88qx= freddoe@archlinux

In a real world setup, options would also contain additional directives to disallow tunnelling and disable pty allocation, openssh server configuration flexibility allows us to bootstrap a locked down environment from the start of the connection.

User/repo authorization

Now that we’ve passed control to our custom openroku-serve script, we seek authorization, in this stage openroku uses two pieces of information

  • the repository name at the end of SSH_ORIGINAL_COMMAND.
  • the unique user identifier passed as an argument in command directives

We verify this against our primary database by connecting to openroku’s restful API server using curl, and checking the return status code.

We avoid embedding effective repo-to-user information on the git server because this can be fairly dynamic data as collaborators are added and removed from projects, like on heroku repo to user is not a 1-to-1 mapping.

We might also desire to have distinct read and write permissions, the original command will determine if we need read or write access, if git-receive-pack is the target command then the client is doing a push and we need write, if git-upload-pack is the target command then the client is doing a pull and we need read (this might be confusing but consider that the client/server relationship really only exists at the SSH level). On heroku any collaborator has both read and write permissions so for simplicity we follow same rules.

If we fail to authenticate openroku prints to stderr a message such as “ ! not an authorized user“ and exits with an exit code of 1. The client display this errors, and does not update the local refs/remotes/openroku and we’re kicked out, game over.

We need more information, metadata.

The git nodes are pretty dumb and have very little knowledge of the apps. Once authenticated and authroized, openruko fetches the metadata for your repository from the api server, this metadata payload contains several interesting attributes.

  • a storage url to fetch the bare git repository
  • the lxc template to use for the build (we have only one template for now)
  • a storage url to put the repository when finished
  • a stroage url to put the compiled slug
  • environment variables relevant to the build
  • an API url to call to release the new code to the dynos

Heroku uses Amazon S3 for storage, the URL provided by the metadata already contains the id and auth signature, in openruko we use file:/// for now, but keeping the same principles enforced by Amazon S3, it should be trivial to swap out to OpenStack or S3 object storage when required.

Preparing the container

The builds need to run in an isolated environment, so we can regulate resources but also because of the potentially dangerous commands that could be executed with a custom buildpack. We use a lightweight virtualization technology called LXC which is already integrated into the Linux kernel, LXC containers can be started in under a second and under certain circumstances many hundreds can run efficiently on a single physical node.

LXC is better suited to our task than KVM/Xen due to not being a full paravirtualized solution, if the programs and libraries in the virtualized filesystem point the same underlying disk nodes the kernel is intelligent enough to map only that the read-only portion (the executable data) once, so in affect the footprint of a container is often only the private data memory plus some small overhead. We still use a non-privileged user within the container because there apparently exists a potential for root to escape (treat root within the container as dangerous as root on the host) and also because we really dont need root.

The LXC container is initialized using an lxc create script, the metadata we downloaded earlier tells us which lxc create script to use (only one at the moment, a YAGNI violation admittedly). In our setup each LXC has its own root filesystem, that shares /bin, /usr, /etc, /lib with that of the host root filesystem, these are mounted read-only.

Emulating Heroku’s setup, the writable portions are in /app and /tmp, specifically /app/tmp/repo.git, and the /tmp/tmp_build for the slug staging, on a multi-tenant setup it makes sense to restrict the space used during this process, but the options here are vast and require several blog posts in themselves, looking into LVM, BRTFS and ZFS for efficient lightweight partitioning.

To communicate with the container we also setup sshd inside container by generating a /app/tmp/ssh/sshd_config. and setting the container’s /sbin/init to launch sshd with config above, so when we start the container via lxc-start we have a running ssh instance in a few hundred milliseconds. Leveraging SSH internally in this manner allows the actual builds to run on different nodes than that of original connection.

    cat <<EOF > $rootfs/app/tmp/ssh/sshd_config
    Port $sshport
    Protocol 2
    AuthorizedKeysFile /app/tmp/ssh/authorized_keys
    PasswordAuthentication no
    ChallengeResponseAuthentication no
    UsePAM no
    PermitRootLogin no
    AllowUsers $builduser
    LoginGraceTime 20
    HostKey /etc/ssh/ssh_host_rsa_key
    UsePrivilegeSeparation no
    PermitUserEnvironment yes
    UseDNS no
    PrintLastLog no

Getting the repository

The bare git repository is stored in the storage service as a single tar.gz format file, this is downloaded and mounted inside the container, we also extract the bare repository tar.gz downloaded to /app/tmp/repo.git.

We then mount the git hooks directory on to the extracted repository, rather than persist the hooks with the repository we add these just-in-time, this allows us to update our pre-receive and other git hooks without friction as we evolve the platform.

Over to the container

The container is started using lxc-start and once ssh has bound to the assigned port we’re ready to go, we connect to the container over ssh.

ssh -i /host/containerkeys/con1234 -p7777 user1234@brigednetwork1.local git-receive-pack /app/tmp/repo.git

We specify a few other options in order to mask any strict host checking warnings corrupting the output stream, all the work before connecting to ssh is effectively silent because the git client isnt expecting any noise/pollution from processes we’ve sneaked in that happen before git-receive-pack runs.

The pre-receive hook

After the payload has been uploaded but before the refs are advanced the pre-receive hook runs. When a master ref is received, we do a real git checkout of that rev to /app/tmp_build since we’re currently working against a bare repository. At this stage it is a detached head because we havent updated the remote refs yet so we hide any warning about this since it is deliberate action.

We then need to determine what buildpack to use, if the environment variable BUILDPACK_URL, provided by the metadata, is present then the decision is already made, and we simply git clone BUILDPACK_URL to /app/tmp_buildpack otherwise we iterate through /app/buildpacks/ calling bin/detect on each buildpack until we get a 0 exit code.

If we havent failed and exited (i.e. no custom buildpack and no detection hits) then we have successfully selected a build pack

Now we run buildpack/compile passing in the directory to the checkout done earlier and another directory for caching of build dependencies.


The compilation (if required), downloading dependencies, compressing static resources etc… happens during the buildpack run. More information on build packs can be found at As mention earlier a custom buildpack can also be supplied by setting the BUILDPACK_URL property.

openruko config:add BUILDPACK_URL=

Wrapping up - Send to storage

After the buildpack has run, and a clean exit - no errors, we tar and gzip the prepared app to /tmp/slug.tar.gz, excluding patterns listed in any .slugignore and the .git directory. We validate to ensure the slug size is less 200mb, and print the size of slug so it shows in the git push output and upload it to the storage service using the pre-authenticate slug put URL provided in the metadata downloaded earlier.

We also tar and gzip the repository directory at /app/tmp/repo.git and upload this to the storage service, this may include new artifacts that have been stored into the cache directory during the buildpack run, they are stored with the repo but are not logically part of the repo, they are not versioned.

Note of interest, Heroku previously used squashfs as slug output but now it seems to use just a gzipped tar file, Im not sure reason why but presumably squashfs provides no tangible benefit vs the weight of bringing in somewhat exotic tooling.

Releasing code

We call the restful API server then with details of the process types detected in Procfile (or defaults provided by bin/release in the buildpack in absence of a ./Procfile). The openruko API server then co-ordinates with the dyno infrastructure to download the slug, launch the new code and terminate the processes based on old code, and updating the routing infrastructure (a fancy reverse proxy) to point to the new code.

All done

All done, exit 0. Our master reference now is advanced to rev received.


This is an overview of the push process of openroku, a deployment platform that aims to be compatible with heroku, its client tools and buildpacks, intended to allow teams of trusted members, and individuals to run their own small heroku-like clusters on commodity VPS, dedicated server and cloud providers. Openroku is an indevelopment heroku implementation, which will be freely available under MIT license and open sourced on github! Thanks for reading.

Also worth reading: * Quora: How does Heroku work * Twelve-Factor app methodology