I ran into something today with cloud-init that lead to a bit of a deep dive and I thought I’d share what I found.
One of the formats you can use for User Data is the include file format. It allows you to list our URLs or file paths. cloud-init will access the file or URL and execute it as a script. It’s great if there is a publicly accessible script you want to run. I’ve also seen cases where it is a list of pre-signed AWS S3 URLs.
The original issue was an AWS EC2 instance not spinning up to the point to pass health checks from the load balancer. I was able to narrow the issue down to the scripts listed in the User Data not running. When I looked through
/var/log/cloud-init-output.log nothing caught my eye, but
cloud-init.log was fairly long. I eventually made my way to
/var/lib/cloud and noticed a
/var/lib/cloud/data. Lo and behold it contained an error saying that the first URL in the user data had timed out.
That lead me on a deep dive into cloud-init’s source code where I found that cloud-init uses a 5 second timeout and will retry 10 times with a 1 second delay between attempts. There doesn’t appear to be any way to configure that either, you would have to change the values in cloud-init’s source.
Another important piece of information is that an exception is thrown if there is an error getting the file and the method running the include file user data does not catch it. This means if the first path or URL gets an error your remaining files will not be opened and executed. Since they most likely need to be run sequentially this makes sense, but is not documented anywhere.
Hopefully someone else will find this information useful. Here is the method that deals with running the include file format, and here is the method that reads the file or URL.