DCSC logo
 
ABOUT-DCSC
DCSC/SDU
DCSC/AU
DCSC/AAU
DCSC/DTU
DCSC/KU
 
+Open all         -Close all
 
 
 
 As explained in the overview of The Horseshoe compute cluster the execution
 hosts are under the management of PBS/TORQUE and MAUI. 

 In order to run a job on the cluster two things have to happen:

 a) You have to create a job script which can be send to the 
    batch queue system by the qsub command.

    You must specify a "walltime" which will tell the queueing 
    system for how long this job should be allowed to run. If
    none is specified a default of 10 minutes is assigned.

    The waltime can be specified in the job script as shown in 
    the examples, e.g. pbs_job_script.

    If you want to submit the job to the testing queue
    you must specify '-q express' as an option to 'qsub' command or you
    must include an option in the job-script file. Jobs will as a default
    be submitted to the workq.

    There are limits to the walltime which can be requested depending
    on the queue used: please read PBS queues.

    Please read /usr/local/lib/doc/MAUI for an explanation of how
    jobs are scheduled to run and why it's important to specify a
    walltime resource when submitting a job, either on the commandline
    or in the job script (see example script).

 b) You must have compiled an executable and prepared input files for 
    this executable.

 EXAMPLE:
 ========

 In a subdirectory called batch_job a user has the following files:

 ./a.out
 input
 pbs_job_script

 The job is send to the batch queue by issuing the 'qsub' command in this
 directory:

 qsub pbs_job_script

 PBS will respond with a job-id e.g. 313.server.dcsc.sdu.dk. You can follow
 the progress of your job by using the qstat command:
 
 qstat 313  

 results in:

Job id           Name             User             Time Use S Queue
---------------- ---------------- ---------------- -------- - -----
313.server       pbs_job_script   jeppesen         00:00:00 R workq

 The R under S (for state) indicated the job is running. 

 A running job can be terminated by qdel(*):

 qdel 313

 In any event after the job has terminated (either by itself or by qdel) PBS
 will deliver the result of 'standard error' and 'standard out' in two files
 named after the job id - in this example:

 pbs_job_script.e313   (std. error)
 pbs_job_script.o313   (std. out)

 Read those files if you have difficulties running on the cluster - they
 usually contain important clues which aide in resolving problems !

 Notice that the 'Name' column has the name of the job script - if you run many
 job it probably a good practice to use different name for each run.

 The commands qsub, qstat, and qdel are documented by man-pages on the 
 frontends.

 (*) In case the qdel command complains "qdel: Server could not connect to MOM",
     an administrator can issue the command 'qdel -p job-id' to force the job 
     out of the queue - please send e-mail to admin@dcsc.sdu.dk.