DCSC logo
 
ABOUT-DCSC
DCSC/SDU
DCSC/AU
DCSC/AAU
DCSC/DTU
DCSC/KU
 
+Open all         -Close all
 
 
 

 The Horseshoe cluster has in effect some limits on the amount
 of time a job can be running (wallclock time):

    200 hours in the queue workq

 However there may be jobs which have to run for longer times.
 The Horseshoe can accommodate these types of jobs if

 a) it's possible to stage the job as a sequence of runs, with
    a time requirement less than the maximum,

 b) and the program is able to save "state" information in state
    files between each (incremental) run.

 If these conditions are fullfilled there are 2 principal ways of 
 staging the sequencial runs:

 1) A fixed number of job scripts are prepared - they are submitted
    in such a way that one job cannot start before the previous has
    terminated - i.e. the scripts will be run in sequential order.
    Note: The scripts might all be the same script !

 2) A script is prepared which has the ability to resubmit itself.
    It's is thus crucial to have a condition of termination 
    such that the script will not keep looping in the queue.

 Method number 1.
 ================

 This is carried out by using the 'qsub' commands '-W depend=afterok'
 facility. Quoting from the man-page for 'qsub':

   afterok:jobid[:jobid...]

     This job may be scheduled for execution only after jobs jobid
     have terminated with no errors. 

 So 'qsub -W depend=afterok:XXXX some_pbs_script' would only allow 
 'some_pbs_script' to be run after the job with PBS job-id XXXX has
 terminated with no errors.

 Method number 2.
 ================

 This is carried out by adding some programming at the end of the
 job script - *after* the copying of state files to a safe place 
 have taken place.

 The essential part is: (! is the shell negation operator):

 if ( ! 'Termination condition' ) then
   sleep 30
   cd $PBS_O_WORKDIR
   /usr/local/bin/qsub job_script
 endif

 The termination condition may take many forms - two examples:

 ** a logfile has the sequence 'Job Done' which would uniquely
    signal that the calculation has terminated. The if-statement
    would then be (assuming the logfile is called "logfile"):

    if ( "`grep 'Job Done' logfile`" == "") then

 ** a certain file, "results", is only created by the program at
    the very end of the calculation. The if-statement would
    then be:

    if ( ! -e results ) then

 A complete PBS script based on the idea that a file is created 
 at the very end of the calculation is available.

 If you have questions please send email to

   admin@dcsc.sdu.dk