Some config won’t vary between deploys (including your local deploy) and as such there is nothing to discourage a second level of configuration that is more application-centric,
for example an ecommerce store will have dozens, if not hundreds of settings related to aspects such as tax and delivery, it makes more sense to store these
in a database.
With the database example you still have flexibility as you can point to different databases instances via the env vars, or if preferred include something to swap schema
(DATABASE_SCHEMA env var) that can give you a finer ability to try different functions/views masking on the original. If you’re using
Postgres on Heroku you have even more luxuries such as fork and follow to explore.
In summary, Id argue it’s probably a design smell if you have more than 15 env var config values,
or in your application you have to parse the env var strings to integers/bools for anything other than binding port, location of dependencies such as RedisToGo and enabling/disabling debugging.
Still, it should be easier to copy environment variables, I think given that the Heroku toolbelt includes foreman it should be possible to
do a dump compatible with foreman, i.e. heroku config:dump > .env.
With regards to versioning, a great thing about heroku is that releases are immutable, you can inspect the environment variables of
past releases with heroku releases:info v21 for example, and if someone is a collaborator on the project they already have access
to any config that has ever been deployed (which comes with it owns caveats) .
kindof, this is what happens when you push to openruko, an indevelopment MIT licensed open-source heroku implementation.
This blog post assumes some familiarity with git, ssh, linux, and ideally some previous experience with heroku or other cloud deployment platforms such as cloudfoundry or dotcloud, or at least
the Twelve-Factor app methodology
When you git push to openruko, openruko’s git pipeline takes a number of interesting steps in preparing your code for execution on the openruko dyno infrastructure, in this blog post we’ll go through the process up to sending your code on the dyno infrastructure. I had to start blogging this now so I publicly commit to finishing it.
Getting started
Openruko relies on heroku client tools for management operations. Heroku command line interface uses HEROKU_HOST environment variable to know which API server to talk to. Setting this to openruko.com tells heroku to talk to https://api.openruko.com instead. We neatly wrap this functionality in a bash script somewhere on our path.
123456
#!/bin/bash# save as 'openruko' somewhere in your PATH, and give +x permissionsexport HEROKU_HOST=”openruko.com” # if no protocol it prepends https://api.export HEROKU_STATUS_HOST=”openruko.com”
export HEROKU_SSL_VERIFY=”disable” # I dont have a trusted cert yetexec heroku $*
When you first login to openruko using openruko login, the heroku client tool uploads your public key to the openruko server. You can verify the keys stored, and remove and make additions using openruko keys:{list,add,remove} from the command line.
When you create a openruko application on top of a local git repository the client tools also add the correct remote target to your git repo configuration. After running openruko create you can see the openruko remote has been added with git remote -v.
When you git push to openruko, openruko’s ssh/git stack first has to authenticate you, when git connects to the remote repo it does this over ssh implicitly, to transport the git payload over the network it invokes something similar to this:
Determining the real user when everyone connects with git@
Openruko uses an ssh stack that relies on public key rather than password authentication, it matches your initial authentication payload encrypted with your local private key with your corresponding public key entry in authorized_keys. This is a linear scan but extremely fast with less than 1,000 entries, it is understood that both Heroku and Github have optimized this part with a precomputed database lookup instead (how they do this I admit I dont yet understand, crypto newbie).
The matched public key entry contains a command directive (see man sshd) that overrides any command provided by the client, we use this to enforce all requests are dispatched to our git pipeline, the original command is available in the environment variable SSH_ORIGINAL_COMMAND.
In a real world setup, options would also contain additional directives to disallow tunnelling and disable pty allocation, openssh server configuration flexibility allows us to bootstrap a locked down environment from the start of the connection.
User/repo authorization
Now that we’ve passed control to our custom openroku-serve script, we seek authorization, in this stage openroku uses two pieces of information
the repository name at the end of SSH_ORIGINAL_COMMAND.
the unique user identifier passed as an argument in command directives
We verify this against our primary database by connecting to openroku’s restful API server using curl, and checking the return status code.
We avoid embedding effective repo-to-user information on the git server because this can be fairly dynamic data as collaborators are added and removed from projects, like on heroku repo to user is not a 1-to-1 mapping.
We might also desire to have distinct read and write permissions, the original command will determine if we need read or write access, if git-receive-pack is the target command then the client is doing a push and we need write, if git-upload-pack is the target command then the client is doing a pull and we need read (this might be confusing but consider that the client/server relationship really only exists at the SSH level). On heroku any collaborator has both read and write permissions so for simplicity we follow same rules.
If we fail to authenticate openroku prints to stderr a message such as “ ! not an authorized user“ and exits with an exit code of 1. The client display this errors, and does not update the local refs/remotes/openroku and we’re kicked out, game over.
We need more information, metadata.
The git nodes are pretty dumb and have very little knowledge of the apps. Once authenticated and authroized, openruko fetches the metadata for your repository from the api server, this metadata payload contains several interesting attributes.
a storage url to fetch the bare git repository
the lxc template to use for the build (we have only one template for now)
a storage url to put the repository when finished
a stroage url to put the compiled slug
environment variables relevant to the build
an API url to call to release the new code to the dynos
Heroku uses Amazon S3 for storage, the URL provided by the metadata already contains the id and auth signature, in openruko we use file:/// for now, but keeping the same principles enforced by Amazon S3, it should be trivial to swap out to OpenStack or S3 object storage when required.
Preparing the container
The builds need to run in an isolated environment, so we can regulate resources but also because of the potentially dangerous commands that could be executed with a custom buildpack. We use a lightweight virtualization technology called LXC which is already integrated into the Linux kernel, LXC containers can be started in under a second and under certain circumstances many hundreds can run efficiently on a single physical node.
LXC is better suited to our task than KVM/Xen due to not being a full paravirtualized solution, if the programs and libraries in the virtualized filesystem point the same underlying disk nodes the kernel is intelligent enough to map only that the read-only portion (the executable data) once, so in affect the footprint of a container is often only the private data memory plus some small overhead. We still use a non-privileged user within the container because there apparently exists a potential for root to escape (treat root within the container as dangerous as root on the host) and also because we really dont need root.
The LXC container is initialized using an lxc create script, the metadata we downloaded earlier tells us which lxc create script to use (only one at the moment, a YAGNI violation admittedly). In our setup each LXC has its own root filesystem, that shares /bin, /usr, /etc, /lib with that of the host root filesystem, these are mounted read-only.
Emulating Heroku’s setup, the writable portions are in /app and /tmp, specifically /app/tmp/repo.git, and the /tmp/tmp_build for the slug staging, on a multi-tenant setup it makes sense to restrict the space used during this process, but the options here are vast and require several blog posts in themselves, looking into LVM, BRTFS and ZFS for efficient lightweight partitioning.
To communicate with the container we also setup sshd inside container by generating a /app/tmp/ssh/sshd_config. and setting the container’s /sbin/init to launch sshd with config above, so when we start the container via lxc-start we have a running ssh instance in a few hundred milliseconds. Leveraging SSH internally in this manner allows the actual builds to run on different nodes than that of original connection.
12345678910111213141516
cat <<EOF > $rootfs/app/tmp/ssh/sshd_config Port $sshport Protocol 2 AuthorizedKeysFile /app/tmp/ssh/authorized_keys PasswordAuthentication no ChallengeResponseAuthentication no UsePAM no PermitRootLogin no AllowUsers $builduser LoginGraceTime 20 HostKey /etc/ssh/ssh_host_rsa_key UsePrivilegeSeparation no PermitUserEnvironment yes UseDNS no PrintLastLog no EOF
Getting the repository
The bare git repository is stored in the storage service as a single tar.gz format file, this is downloaded and mounted inside the container, we also extract the bare repository tar.gz downloaded to /app/tmp/repo.git.
We then mount the git hooks directory on to the extracted repository, rather than persist the hooks with the repository we add these just-in-time, this allows us to update our pre-receive and other git hooks without friction as we evolve the platform.
Over to the container
The container is started using lxc-start and once ssh has bound to the assigned port we’re ready to go, we connect to the container over ssh.
We specify a few other options in order to mask any strict host checking warnings corrupting the output stream, all the work before connecting to ssh is effectively silent because the git client isnt expecting any noise/pollution from processes we’ve sneaked in that happen before git-receive-pack runs.
The pre-receive hook
After the payload has been uploaded but before the refs are advanced the pre-receive hook runs. When a master ref is received, we do a real git checkout of that rev to /app/tmp_build since we’re currently working against a bare repository. At this stage it is a detached head because we havent updated the remote refs yet so we hide any warning about this since it is deliberate action.
We then need to determine what buildpack to use, if the environment variable BUILDPACK_URL, provided by the metadata, is present then the decision is already made, and we simply git clone BUILDPACK_URL to /app/tmp_buildpack otherwise we iterate through /app/buildpacks/ calling bin/detect on each buildpack until we get a 0 exit code.
If we havent failed and exited (i.e. no custom buildpack and no detection hits) then we have successfully selected a build pack
Now we run buildpack/compile passing in the directory to the checkout done earlier and another directory for caching of build dependencies.
Buildpack
The compilation (if required), downloading dependencies, compressing static resources etc… happens during the buildpack run.
More information on build packs can be found at https://devcenter.heroku.com/articles/buildpack-api
As mention earlier a custom buildpack can also be supplied by setting the BUILDPACK_URL property.
After the buildpack has run, and a clean exit - no errors, we tar and gzip the prepared app to /tmp/slug.tar.gz, excluding patterns listed in any .slugignore and the .git directory. We validate to ensure the slug size is less 200mb, and print the size of slug so it shows in the git push output and upload it to the storage service using the pre-authenticate slug put URL provided in the metadata downloaded earlier.
We also tar and gzip the repository directory at /app/tmp/repo.git and upload this to the storage service, this may include new artifacts that have been stored into the cache directory during the buildpack run, they are stored with the repo but are not logically part of the repo, they are not versioned.
Note of interest, Heroku previously used squashfs as slug output but now it seems to use just a gzipped tar file, Im not sure reason why but presumably squashfs provides no tangible benefit vs the weight of bringing in somewhat exotic tooling.
Releasing code
We call the restful API server then with details of the process types detected in Procfile (or defaults provided by bin/release in the buildpack in absence of a ./Procfile). The openruko API server then co-ordinates with the dyno infrastructure to download the slug, launch the new code and terminate the processes based on old code, and updating the routing infrastructure (a fancy reverse proxy) to point to the new code.
All done
All done, exit 0. Our master reference now is advanced to rev received.
Summary
This is an overview of the push process of openroku, a deployment platform that aims to be compatible with heroku, its client tools and buildpacks, intended to allow teams of trusted members, and individuals to run their own small heroku-like clusters on commodity VPS, dedicated server and cloud providers. Openroku is an indevelopment heroku implementation, which will be freely available under MIT license and open sourced on github! Thanks for reading.