Making Puppet exec work

The exec resource from Puppet, the automation framework, is a mysterious beast. My notes on how to make it work for complex multi-exec configurations.

Posted: Wed 05 Oct, 2016, 17:15

Why use Puppet

Working in IT, it is part of the of the job to back the right horses. With the never ending profusion of languages and frameworks, it can be tricky to decide what to learn next. During my career to date I think I have done an alright job of picking what to learn, but I have definitely missed the boat a couple of times. So with this background, Puppet has increasingly been on my radar of technologies I should really understand. In a post-Agile BDD-supporting Java development world, the tail end of our development cycle remains a big weak point. The ability to auto-deploy in a reliable and supportable way has always been elusive. Hence the appeal of Puppet.

Puppet can make a very good claim to have solved this problem. It appears to be a front-runner, in a race that includes Ansible, Salt and Chef as other worthy contenders. For those that have not used it before, Puppet takes the declarative approach of allowing developers to define their installation end-state and the dependencies between the intervening stages. Absolutely everything should be re-runnable without impact, in other words it is idempotent. The thing that Puppet takes as a given, and I have to say I was very seriously questioning, is that is possible and desirable to define an installation in this way. My reason for doubting was that my use case did not seem hugely unusual, and yet after several days of work, an elegant solution was not evident. Puppet has a fallback of allowing users to develop custom types and providers in Ruby for very complex cases, but this seemed a bit extreme.

My use case was simply:

  • Retrieve a versioned zip for my package
  • Unzip it
  • Run the java application as a service

Easy right? Eh no. For a start, and this is nothing to do with Puppet, it is impossible to run a Java application via bat file as a Windows Service without wrapping it. The wrapper ensures the process conforms to the Windows Service protocol. This complexity meant my configuration used many more exec resources than usual, as the standard service resource would not work in my case.

So the starting point is that I had several exec resources with dependencies.

Dependencies are subtle

Puppet has two ways of specifying dependencies. Firstly there are the equivalent directives, require and before, which declare ordering. Either of them will ensure that if any two tasks run, they will execute in the stated order. The second way of specifying dependencies is with either of the notify or subscribe directives. They will ensure that dependent tasks will be notified of changes in upstream tasks, at which point they can take resource-specific action.

Puppet assumes however that all resources should always run. So that means that even tasks with notify/subscribe dependencies will always run irrespective of their supposed dependencies. This is very confusing to begin with but illuminates a more subtle point that many resources have two modes of action. Resources have their default mode of normal execution, and they have a secondary mode which is triggered on a refresh from notify/subscribe. Hence, the notify/subscribe dependencies declare behaviour on receipt of refreshes, which is a different and independent flow from the standard execution tree.

Confused? Then read on.

exec resources create refresh events after execution

This section used to say "exec do not create events", but this is actually incorrect. This made me go back and figure our why I had thought this. exec events do create refreshes. My confusion, which this is blindingly obvious now, is that the refresh is sent after execution, not before. For this reason, a notify/subscribe relationship is of no use for enforcing pre-execution dependencies, but is very useful for trigger post-execution dependencies.

These two dependency mechanisms therefore compliment each other, and serve slightly different purposes..

Refreshes do not propagate

I, very presumptuously, assumed that if B is dependent on A, and A itself receives a refresh from a notify dependency, then B would then also receive the refresh after A had processed it. Wrong. Events do not automatically propagate through the dependency tree, they only go to immediate dependents. It is therefore necessary to use a mixed configuation of notify/subscribe to ensure jobs are triggered, and then rely on require/before to take care of ordering. This can seem duplicative with several exec events declared with the same upstream subscription, but this is absolutely essential for required behaviour.

For example, say we have the following there -> means "runs before" and ~> means "refreshes" (actually, this is real Puppet syntax), and in this example all three steps are exec resources:

    package ~> install
        remove -> install
        stop  -> remove

This would mean that if the package resource changes, the install resource will be notified. install cannot run until remove, and remove cannot run until stop. This works fine for the updates, where there is an existing installation to stop and remove. But what about the first installation, where there is nothing to stop or remove? As a general rule in Puppet, we should not be running resources unless they are required.

refreshonly is your friend

To resolve this issue, it is possible to the control the two modes (discussed earlier) of the exec resource by defining normal behaviour and refresh only behaviour. There is a directive called refreshonly which will mean the resource will only run if a refresh event is received. So with this in mind a new configuration could be used with which only relies on the refreshes:

    package ~> install
        install ~> remove
        remove ~> stop

But this fails for two reasons. One is that exec resources never send refreshes so notify/subscibe is ignored and these resources will never run as they will never receive a refresh.

Use subscriptions for triggering and require for ordering

The trick, if you call it that, is to use the correct combination of subscriptions and ordering for you required flow. Say we moved to this configuration:

    package ~> install    These three dependencies take care of triggering
        package ~> stop
        package ~> remove
        remove  -> install     These two take care of sequencing and are set to run on refresh only
        stop    -> remove

This will ensure that all resources (install, stop and remove) are notified directly when they need to run because there is a new package. And the require dependency will take care of the ordering. This will run for the initial installation and for subsequent update installations, as only the required jobs will run. Likewise, if there is no change, there will be no refresh event and the as the downstream install/remove/stop are refresh only, they will not run.

Not quite the end of the story...

So we are done, right? No. We have fixed one use case and broken another, a very common development loop when developing Puppet configurations. The above will actually not work for fresh installations because the remove and stop resources should not run in this case. We need to restrict execution to cases with an existing installation. There are number of ways to achieve this but one of the simplest is using either the onlyif or the unless directive. onlyif will cause Puppet to only execute the resource if the given command returns zero. As per the exec documentation, the onlyif/unless exec command result will override the receipt of refresh. So on a fresh installation, a refresh event will be received but the resource will not run due to onlyif returning non zero. The exact onlyif command can be an apt command or a ps or anything that can be piped to grep or findstr to ensure they correct return code. On Windows you can check you return codes on the command line by doing echo %errorlevel%.

Idempotent execs finally

Thus with the correct combination of onlyif/unless, subscribe/require and refreshonly it is possible to ensure executions work for initial and subsequent runs in an idempotent way. Puppet documentation rightly advise that exec should be used with caution, but in many cases it is a necessary evil and it is good to know that they can work well. Understanding the sequencing, precedence and resource behviours helps to ensure Puppet can work as you want out of the box.