Array Jobs

Automated Multiple Runs of a Job
If you plan on running many similar jobs (for example: MCMC on different data, optimization on a different set of inputs, etc.), instead of submitting dozens, or thousands of individual qsub commands, try an Array Job instead.

Univa Grid Engine’s ‘qsub -t’ Method

Univa Grid Engine’s ‘qsub’ command combined with the -t option allow for the submission of an ‘array’ of jobs. When a job is launched via ‘qsub -t n[-m[:s]]’ you will have the environment variables SGE_TASK_ID and SGE_TASK_FIRST, and if you add m you will have SGE_TASK_LAST, and if you add s you will have SGE_TASK_STEPSIZE. So -t sets the ‘index numbers’ associated with the job, like so:

  • n is the first index number
  • m is the last index number (optional)
  • s is the step size (optional, defaults to 1)

Example 1: Hello World

So for example, let’s run 5 ‘hostname’ jobs (not strictly “Hello World”, but actually more demonstrative), and see what we see via qstat and get in our output:

qsub -N ArrayTest1 -t 1-5 -j y -b y 'hostname; echo $SGE_TASK_ID; sleep 60'

That launches the job, you should see something like this:

Your job-array XXXXX.1-5:1 ("ArrayTest1") has been submitted

Notice how it defaulted the s (step) to :1. Now take a look at qstat:

job-ID  prior   name      user   state submit/start at     queue                      slots ja-task-ID
------------------------------------------------------------------------------------------------------------------------------------------------------------------
55557 0.55500 ArrayTest1 hughmac  r    03/21/2012 08:53:23 all.q@hpcc023.wharton.upenn.edu  1 1
55557 0.55500 ArrayTest1 hughmac  r    03/21/2012 08:53:23 all.q@hpcc017.wharton.upenn.edu  1 2
55557 0.55500 ArrayTest1 hughmac  r    03/21/2012 08:53:23 all.q@hpcc014.wharton.upenn.edu  1 3
55557 0.55500 ArrayTest1 hughmac  r    03/21/2012 08:53:23 all.q@hpcc015.wharton.upenn.edu  1 4
55557 0.55500 ArrayTest1 hughmac  r    03/21/2012 08:53:23 all.q@hpcc022.wharton.upenn.edu  1 5

Great! The job is running. When it completes, let’s look at the output:

cat ArrayTest1.o55557.*
hpcc023.wharton.upenn.edu
1
 hpcc017.wharton.upenn.edu
2
hpcc014.wharton.upenn.edu
3
 hpcc015.wharton.upenn.edu
4
 hpcc022.wharton.upenn.edu
5

So we had five jobs on 5 different hosts, and each had a different SGE_TASK_ID.

If we wanted to continue the next ‘set of 5’ jobs:

qsub -N ArrayTest1 -t 6-10 -j y -b y 'hostname; echo $SGE_TASK_ID; sleep 60'

Notice that we changed the -t to 6-10.

Example 2: 10 scripts

So how can we use this in our code? Consider the following task: run 10 R script files named mycode-1.R (or mycode-1.m, etc) through mycode-10.R. Create those 10 scripts (the painful part), now do:

qsub -N MyRArray1 -t 1-10 -b y 'R --no-save < mycode-$SGE_TASK_ID.R'

Example 3: 10 data files

While that’s moderately useful, let’s say you have 10 tab-separated text data files to evaluate with the same code … name the data files mydata-1.txt through mydata-10.txt, and a Matlab script file called ‘mydataread.m’:

dataFile = strcat('mydata-',getenv('SGE_TASK_ID'),'.txt')
myData = dlmread(dataFile,'t')

Then run:

qsub -N MyMatlabRead1 -t 1-10 -b y 'matlab -nodisplay < mydataread.m'

And the output from the number 3 data file (:)):

dataFile =

mydata-3.txt

>>
myData =

             3   6   9   12   15   18   21   24   27   30   0
             3   6   9   12   15   18   21   24   27   30   0
             3   6   9   12   15   18   21   24   27   30   0
             3   6   9   12   15   18   21   24   27   30   0
             3   6   9   12   15   18   21   24   27   30   0
             3   6   9   12   15   18   21   24   27   30   0
             3   6   9   12   15   18   21   24   27   30   0
             3   6   9   12   15   18   21   24   27   30   0
             3   6   9   12   15   18   21   24   27   30   0
             3   6   9   12   15   18   21   24   27   30   0

 

Will be different than the number 6 data file:

dataFile =

mydata-6.txt

>>
myData =

          6   12   18   24   30   36   42   48   54   60   0
          6   12   18   24   30   36   42   48   54   60   0
          6   12   18   24   30   36   42   48   54   60   0
          6   12   18   24   30   36   42   48   54   60   0
          6   12   18   24   30   36   42   48   54   60   0
          6   12   18   24   30   36   42   48   54   60   0
          6   12   18   24   30   36   42   48   54   60   0
          6   12   18   24   30   36   42   48   54   60   0
          6   12   18   24   30   36   42   48   54   60   0
          6   12   18   24   30   36   42   48   54   60   0

Pretty useful!

Shell Loop Method

Shell languages like Bash (which is the default in examples in this documentation) provide the while, for, and foreach constructs. Their discussion is outside the scope of this document, however here is a simple example (one of many ways) of how you can modify your job script for multiple runs:

#!/bin/bash

# this will run a R job 5 times and save output to a specified file per run
# there are many ways to specify loops, read the Bash (or your shell's) documentation!
for RUN in $(seq 1 5); do
    R --no-save < your-commands-file.R --option=$RUN > output.${RUN}.txt
done

Imbox_content

LOOP Warning: Make sure that you do not create an infinite loop by using while [[ 1 ]], or something that will always evaluate to true.