Download a file from a website

Challenge

You want to fetch files from a website (with wget)

Solution

Install a Puppetforge module -

# install the module and its dependencies
$ sudo /opt/puppetlabs/bin/puppet module install maestrodev-wget

...
Notice: Installing -- do not interrupt ...
/etc/puppetlabs/code/environments/production/modules
 - maestrodev-wget (v1.7.3)
...

and use wget to fetch a file:

class fetch_file {

  include ::wget

  wget::fetch { 'https://www.unixdaemon.net/index.xml':
    destination => '/tmp/unixdaemon-feed.xml',
    timeout     => 15,
    verbose     => true,
  }

}

# run puppet
...
Notice: /Stage[main]/Fetch_file/Wget::Fetch[https://www.unixdaemon.net/index.xml]
  /Exec[wget-https://www.unixdaemon.net/index.xml]/returns: executed successfull
...

$ ls -alh /tmp/unixdaemon-feed.xml
-rw-r--r--. 1 root root 79K Jun  2 15:59 /tmp/unixdaemon-feed.xml

Explanation

Sometimes, despite all the other tools and processes available, you just need to fetch a file from a website and put it on the local machine. While it's not the recommended way to manage things it's always nice to have it available as an option. In this example we'll use the wget puppet wrapper to download the file for us.

First, install the Puppetforge module:

# install the module and its dependencies
$ sudo /opt/puppetlabs/bin/puppet module install maestrodev-wget

...
Notice: Installing -- do not interrupt ...
/etc/puppetlabs/code/environments/production/modules
 - maestrodev-wget (v1.7.3)
...

Once you have the module you can download the file using it:

class fetch_file {

  include ::wget

  wget::fetch { 'https://www.unixdaemon.net/index.xml':
    destination => '/tmp/unixdaemon-feed.xml',
    timeout     => 15,
    verbose     => true,
  }

}

# run puppet
...
Notice: /Stage[main]/Fetch_file/Wget::Fetch[https://www.unixdaemon.net/index.xml]
  /Exec[wget-https://www.unixdaemon.net/index.xml]/returns: executed successfull
...

$ ls -alh /tmp/unixdaemon-feed.xml
-rw-r--r--. 1 root root 79K Jun  2 15:59 /tmp/unixdaemon-feed.xml

There are a few other use cases documented in the README that are worth understanding; especially local caching to ensure you're not constantly fetching the file just to discard it if it hasn't changed. One that provides a big benefit with very little effort is better resource naming. By specifying the URL in a source parameter you can put an actual descriptive name as the resource title, and then hopefully find the logs much easier to read.

class fetch_named_file {

  include ::wget

  wget::fetch { 'unixdaemon index file'
    source      => 'https://www.unixdaemon.net/index.xml',
    destination => '/tmp/unixdaemon-feed.xml',
    timeout     => 15,
  }

}

I also think you should always specify a timeout on all wget resources.

class resource_default {

  include ::wget

  Wget::Fetch {
    timeout => 15,
  }

}

Without being too judgemental I feel I should note a few things about using this approach in your Puppet code base. At the very least you should use the modules caching functionality to ensure your runs don't slow down while you constantly re-fetch the resource. There's also the lack of crypto support for the files, even those behind a username and password, to ensure they're what you expected. In general cases you'd be much better to either check the files into the puppet master, or even better, fpm package them and deploy the files via your normal package manager and enjoy all the benefits, checksum verification, file tracking etc. that ensue.

Puppet CookBook

A collection of task oriented solutions in Puppet

Download a file from a website

Challenge

Solution

Explanation

See also