Cloud Bursting Startup Script

Create a script that is ran when the cloud node is burst.

Introduction

Your site will want to do some configuration to their cloud nodes after booting. For example, you may want to install some packages, add users, or start services. A startup script can be added to a bursting scenario that will be run when the instance boots to perform automated tasks. Startup scripts can perform many actions, such as installing software, performing updates, turning on services, and any other tasks defined in the script. You can use startup scripts to easily and programmatically customize your cloud instances.

Startup Script on Windows Platforms

On Windows platforms, the startup script must be a PowerShell script. The content of the PowerShell script should be enclosed in <powershell> and </powershell>. For more information about PowerShell see PowerShell Scripting.

Startup Script on Linux Platforms

On Linux platforms, a utility specifically designed for cloud instance initialization is cloud-init. The cloud-init program is a bootstrapping utility for pre-provisioned disk images that run in virtualized environments, usually cloud-oriented services. Basically, it sets up the server instance to be usable when it’s finished booting. You must install cloud-init on your cloud provider VM to simplify the task of configuring your instances on boot. For more information see cloud-init.

Several input types are supported by cloud-init.
  • Shell scripts
  • Cloud config files

The simplest way to configure an instance on boot is to use a shell script. The shell script must begin with #! in order for cloud-init to recognize it as a shell script.

Example of a cloud-init Script for a Linux Virtual Machine

Below are examples of configuration that should be done via the startup script after a node has been burst in the cloud. These examples are not intended to be copied and pasted as is, you must configure the startup script per your site's needs.

#!/bin/sh
# Map IP address to hostnames via /etc/hosts
echo "/etc/hosts setup"
rm -f /etc/hosts
echo "127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4" > /etc/hosts
echo "PBS_SERVER_IP_ADDR headnode headnode.DOMAINNAME" >> /etc/hosts    

# Disable NetworkManager so that it does not overwrite the /etc/resolv.conf file
systemctl disable NetworkManager
systemctl stop NetworkManager
systemctl enable network
systemctl start network

# Configure PBS via /etc/pbs.conf
echo "pbs setup"
systemctl stop pbs
rm -f /etc/pbs.conf
echo "PBS_EXEC=/opt/pbs/default" > /etc/pbs.conf
echo "PBS_HOME=/var/spool/PBS" >> /etc/pbs.conf
echo "PBS_START_SERVER=0" >> /etc/pbs.conf
echo "PBS_START_MOM=1" >> /etc/pbs.conf
echo "PBS_START_SCHED=0" >> /etc/pbs.conf
echo "PBS_START_COMM=0" >> /etc/pbs.conf
echo "PBS_SERVER=PBS_SERVER_HOSTNAME" >> /etc/pbs.conf
echo "PBS_CORE_LIMIT=unlimited" >> /etc/pbs.conf
echo "PBS_SCP=/bin/scp" >> /etc/pbs.conf
echo "PBS_LEAF_ROUTERS=HOSTNAME,HOSTNAME" >> /etc/pbs.conf

# Since Control 2019.1, DNS is no longer used for registering cloud nodes. Therefore,
# pbs.conf must be updated with the cloud node's IP address.
IP=$(ip addr show eth0 | grep "inet\b" | awk '{print $2}' | cut -d/ -f1)
echo "PBS_MOM_NODE_NAME=$IP" >> /etc/pbs.conf

# Configure the MoM
echo "mom config setup"
. /etc/pbs.conf
echo "\$clienthost $PBS_SERVER" >> /var/spool/pbs/mom_priv/config
echo "\$clienthost ${PBS_SERVER//.*}" >> /var/spool/pbs/mom_priv/config
echo "\$restrict_user_maxsysid 999" >> /var/spool/pbs/mom_priv/config

# Restart pbs
systemctl start pbs

An explanation for each section of the startup script is given below. For the below examples assume the following:

Fully qualified domain name (FQDN) of the PBS Server = pbs.altair.com

NIC address of the PBS Server is 10.0.0.5 on the 10.0.0.0/24 network.

Configure the Host File

Map hostnames to the PBS Server IP address by updating the /etc/hosts file.

# Map IP address to hostnames via /etc/hosts
echo "/etc/hosts setup"
rm -f /etc/hosts
echo "127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4" > /etc/hosts
echo "10.0.0.5    headnode headnode.pbs.altair.com" >> /etc/hosts 

Disable NetworkManager and Use Network Interface

Stop NetworkManager and use network interface so that the contents of /etc/resolv.conf do not get overwritten:
# Disable NetworkManager so that it does not overwrite the /etc/resolv.conf file
systemctl disable NetworkManager
systemctl stop NetworkManager
systemctl enable network
systemctl start network

Configure PBS

Update the PBS configuration file /etc/pbs.conf:
# Configure pbs.conf
echo "pbs setup"
systemctl stop pbs
rm -f /etc/pbs.conf
echo "PBS_EXEC=/opt/pbs" > /etc/pbs.conf
echo "PBS_HOME=/var/spool/pbs" >> /etc/pbs.conf
echo "PBS_START_SERVER=0" >> /etc/pbs.conf
echo "PBS_START_MOM=1" >> /etc/pbs.conf
echo "PBS_START_SCHED=0" >> /etc/pbs.conf
echo "PBS_START_COMM=0" >> /etc/pbs.conf
echo "PBS_SERVER=PBS_SERVER_HOSTNAME" >> /etc/pbs.conf
echo "PBS_CORE_LIMIT=unlimited" >> /etc/pbs.conf
echo "PBS_SCP=/bin/scp" >> /etc/pbs.conf
echo "PBS_LEAF_ROUTERS=HOSTNAME,HOSTNAME" >> /etc/pbs.conf

# Since Control 2019.1, DNS is no longer used for registering cloud nodes. Therefore, 
# pbs.conf must be updated with the cloud node's IP address.
IP=$(ip addr show eth0 | grep "inet\b" | awk '{print $2}' | cut -d/ -f1)
echo "PBS_MOM_NODE_NAME=$IP" >> /etc/pbs.conf
Where PBS_SERVER_HOSTNAME represents the hostname of the machine where the PBS Server is installed and HOSTNAME tells each endpoint which communication daemon it should talk to.
Note: If PBS is installed in non-default home and execution directory, then update the value of PBS_EXEC and PBS_HOME.

Configure the PBS MoM and Restart PBS

Update the PBS_HOME/mom_priv_config file to configure the MoM:

# Configure /var/spool/pbs/mom_priv/config
echo "mom config setup"
. /etc/pbs.conf
echo "\$clienthost $PBS_SERVER" >> /var/spool/pbs/mom_priv/config
echo "\$clienthost ${PBS_SERVER//.*}" >> /var/spool/pbs/mom_priv/config
echo "\$restrict_user_maxsysid 999" >> /var/spool/pbs/mom_priv/config

systemctl start pbs
Note: If PBS is installed in non-default directory, then change the path to the PBS home directory (i.e., /var/spool/pbs).

Optional Configuration

Use the startup script to configure filesystems (/etc/fstab), configure NIS (/etc/yp.conf), mount necessary filesystems, and any other configuration that your site requires.

Below are a few examples:

Creating Local Scratch Space

Create local scratch on a fast local disk and use it as default location to run jobs (use the PBS sandbox feature to place data in job scripts):

mkdir /scratch
chmod 1777 /scratch
echo "\$jobdir_root /scratch" >> /var/spool/pbs/mom_priv/config
14.13.1.4 Example of Setting Location for Creation of Staging and
Execution Directories
To make it so that jobs with sandbox=PRIVATE have their staging and execution directories created under /scratch, as /scratch/<job-specific_dir_name>, put the following line in MoM’s configuration file:
$jobdir_root /scratch

Mount a Directory for PBS Data Transfer

Mount /home so that it can be used for PBS data transfer and so that SSH keys stored in the user environment are accessible.
echo "PBS_SERVER_IP_ADDR headnode headnode.DOMAINNAME" >> /etc/hosts
…
…  
yum install -y nfs-utils
mount -t nfs headnode:/home /home

Configuring the MoM for Local Copy

Use the $usecp MoM configuration parameter to tell the MoM which local directories are mapped to mounted directories, so that the local copy mechanism can be used.
echo "PBS_SERVER_IP_ADDR headnode headnode.DOMAINNAME" >> /etc/hosts
…
…  
echo "\$usecp headnode:/home/ /home/" >> /var/spool/pbs/mom_priv/config

Example: Add a Custom Resources to a Cloud Node

Use the cloud-init script in conjunction with a PBS MoM Version 2 Configuration file to add PBS host level resources to burst compute nodes.

At the beginning of the cloud-init script placed the following line:

HOST=$(uname -n)
Adding the following lines to the end of the cloud-init script adds a custom resource ngpu to the cloud node. The custom resource must already be defined to PBS.
#Create a v2config file to add accelerators and custom resources to PBS
#Note: Use $HOST not $IP as mom will create a second vnode from IP but add 
#to the natural node via HOST
echo "\$configversion 2" > /root/v2config
echo "$HOST: resources_available.ngpu = 2" >> /root/v2config
/opt/pbs/sbin/pbs_mom -s insert v2config /root/v2config
systemctl restart pbs
After bursting the contents of the burst node v2config file in the PBS_HOME/mom_priv/config.d directory are:
$configversion 2
computea000000: resources_available.ngpu = 2
The qmgr entry for the node looks like this:
create node 172.17.0.4
set node 172.17.0.4 state = free
set node 172.17.0.4 resources_available.arch = linux
set node 172.17.0.4 resources_available.cloud_node_image = <IMAGENAME>
set node 172.17.0.4 resources_available.cloud_node_instance_type = Standard_A4
set node 172.17.0.4 resources_available.cloud_provisioned_time = 1589376781
set node 172.17.0.4 resources_available.cloud_scenario = <SCENARIO>
set node 172.17.0.4 resources_available.host = 172.17.0.4
set node 172.17.0.4 resources_available.mem = 14352224kb
set node 172.17.0.4 resources_available.ncpus = 8
set node 172.17.0.4 resources_available.ngpu = 2
set node 172.17.0.4 resources_available.vnode = 172.17.0.4
set node 172.17.0.4 resv_enable = False